US20190028043A1 - Machine learning device, servo motor control device, servo motor control system, and machine learning method - Google Patents
Machine learning device, servo motor control device, servo motor control system, and machine learning method Download PDFInfo
- Publication number
- US20190028043A1 US20190028043A1 US16/021,447 US201816021447A US2019028043A1 US 20190028043 A1 US20190028043 A1 US 20190028043A1 US 201816021447 A US201816021447 A US 201816021447A US 2019028043 A1 US2019028043 A1 US 2019028043A1
- Authority
- US
- United States
- Prior art keywords
- servo motor
- motor control
- value
- machine learning
- control device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P29/00—Arrangements for regulating or controlling electric motors, appropriate for both AC and DC motors
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P6/00—Arrangements for controlling synchronous motors or other dynamo-electric motors using electronic commutation dependent on the rotor position; Electronic commutators therefor
- H02P6/08—Arrangements for controlling the speed or torque of a single motor
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B1/00—Comparing elements, i.e. elements for effecting comparison directly or indirectly between a desired value and existing or anticipated values
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- G06F15/18—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/06—Rotor flux based control involving the use of rotor position or rotor speed sensors
- H02P21/08—Indirect field-oriented control; Rotor flux feed-forward control
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P23/00—Arrangements or methods for the control of AC motors characterised by a control method other than vector control
- H02P23/0004—Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control
- H02P23/0031—Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control implementing a off line learning phase to determine and store useful data for on-line control
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P23/00—Arrangements or methods for the control of AC motors characterised by a control method other than vector control
- H02P23/0077—Characterised by the use of a particular software algorithm
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P23/00—Arrangements or methods for the control of AC motors characterised by a control method other than vector control
- H02P23/24—Controlling the direction, e.g. clockwise or counterclockwise
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P6/00—Arrangements for controlling synchronous motors or other dynamo-electric motors using electronic commutation dependent on the rotor position; Electronic commutators therefor
- H02P6/14—Electronic commutators
- H02P6/16—Circuit arrangements for detecting position
- H02P6/17—Circuit arrangements for detecting position and for generating speed information
Definitions
- the present invention relates to: a machine learning device that performs learning related to compensation coefficients in compensation of nonlinear friction, with respect to a servo motor control device that performs compensation to the nonlinear friction; the servo motor control device and a servo motor control system including the machine learning device; and a machine learning method.
- a motor control device disclosed in Patent Document 1 a servo control device disclosed in Patent Document 2, and a motor control device disclosed in Patent Document 3 are known.
- the motor control device disclosed in Patent Document 1 has: a velocity feedforward control unit that generates a velocity feedforward command for reducing position error on the basis of a position command; and a torque feedforward control unit that generates a torque feedforward command for reducing the position error on the basis of the position command.
- the servo control device disclosed in Patent Document 2 has a feedforward compensator that generates a feedforward command on the basis of a position command.
- the servo control device disclosed in Patent Document 2 has a friction compensation device for compensating control error due to influence of friction in a machine tool.
- the motor control device disclosed in Patent Document 3 has a compensation calculation unit for compensating stick motions on the basis of a friction torque estimated by disturbance observer and a torque command, and compensating lost motions on the basis of a velocity command.
- An object of the present invention is to provide: a machine learning device that performs compensation of non-linear friction to improve responsiveness of a servo system at the time of inversion motion of a servo motor; a servo motor control device; a servo motor control system; and a machine learning method.
- a machine learning device for example, a machine learning device 200 described later
- a machine learning device is configured to perform machine learning with respect to a servo motor control device (for example, a servo motor control device 100 described later) including a non-linear friction compensation unit (for example, a non-linear friction compensator 111 described later) configured to create a compensation value with respect to non-linear friction on the basis of a position command
- the machine learning device includes:
- a state information acquisition unit (for example, a state information acquisition unit 201 described later) configured to acquire state information from the servo motor control device by causing the servo motor control device to execute a predetermined program, the state information including a servo state including at least position error, and combination of compensation coefficients of the non-linear friction compensation unit; an action information output unit (for example, an action information output unit 203 described later) configured to output action information including adjustment information of the combination of the compensation coefficients included in the state information, to the servo motor control device; a reward output unit (for example, a reward output unit 2021 described later) configured to output a reward value in reinforcement learning, based on the position error included in the state information; and a value function updating unit (for example, a value function updating unit 2022 described later) configured to update an action value function on the basis of the reward value output by the reward output unit, the state information, and the action information.
- a state information acquisition unit for example, a state information acquisition unit 201 described later
- the reward output unit may output the reward value on the basis of an absolute value of the position error.
- the servo motor control device may further comprise a velocity feedforward calculation unit (for example, a velocity feedforward calculation unit 110 described later) configured to create a velocity feedforward value on the basis of the position command, and the non-linear friction compensation unit may be connected in parallel to the velocity feedforward calculation unit.
- a velocity feedforward calculation unit for example, a velocity feedforward calculation unit 110 described later
- the non-linear friction compensation unit may be connected in parallel to the velocity feedforward calculation unit.
- the machine learning device may further include an optimizing action information output unit (for example, an optimizing action information output unit 205 described later) configured to generate and output combination of compensation coefficients of the non-linear friction compensation unit, on the basis of a value function updated by the value function updating unit.
- an optimizing action information output unit for example, an optimizing action information output unit 205 described later
- a servo motor control system is a servo motor control system including: the machine learning device (for example, a machine learning device 200 described later) according to any of (1) to (4) described above; and a servo motor control device (for example, a servo motor control device 100 described later) that comprise a non-linear friction compensation unit configured to create a compensation value with respect to non-linear friction.
- the servo motor control device may further comprise a velocity feedforward calculation unit (for example, a velocity feedforward calculation unit 110 described later) configured to create a velocity feedforward value on the basis of a position command, and the non-linear friction compensation unit may be connected in parallel to a velocity feedforward calculation unit.
- a velocity feedforward calculation unit for example, a velocity feedforward calculation unit 110 described later
- the non-linear friction compensation unit may be connected in parallel to a velocity feedforward calculation unit.
- a servo motor control device is a servo motor control device including the machine learning device according to any of (1) to (4) described above, and a non-linear friction compensation unit configured to create a compensation value with respect to non-linear friction.
- the servo motor control device may further include a velocity feedforward calculation unit configured to create a velocity feedforward value on the basis of a position command, and the non-linear friction compensation unit may be connected in parallel to a velocity feedforward calculation unit.
- a machine learning method is a machine learning method of a machine learning device (for example, a machine learning device 200 described later) configured to perform machine learning with respect to a servo motor control device (for example, a servo motor control device 100 described later), including a non-linear friction compensation unit (for example, a non-linear friction compensator 111 described later) configured to create a compensation value with respect to non-linear friction on the basis of a position command, the machine learning method including:
- state information from the servo motor control device by causing the servo motor control device to execute a predetermined program, the state information including a servo state including at least position error, and a combination of compensation coefficients of the non-linear friction compensation unit; outputting action information including adjustment information of the combination of compensation coefficients included in the state information, to the servo motor control device; and updating action value function on the basis of a reward value in reinforcement learning, the state information, and the action information based on the position error included in the state information.
- responsiveness of a servo system at the time of inversion motion of a servo motor can be improved by compensating of non-linear friction.
- FIG. 1 is a block diagram showing a servo motor control system of a first embodiment of the present invention.
- FIG. 2 is a block diagram showing a set of a servo motor control device of the servo motor control system and a machine learning device, and a control target, of the first embodiment of the present invention.
- FIG. 3 is a characteristic diagram showing a relationship between a non-linear friction compensation value f( ⁇ ) and a motor speed w.
- FIG. 4 is a block diagram showing an example of a control target.
- FIG. 5 is a diagram for explaining motion of a servo motor of when a geometry is a circle.
- FIG. 6 is a diagram for explaining motion of the servo motor of when the geometry is a square.
- FIG. 7 is a diagram showing a state where a table included in the control target moves sinusoidally in an X axis direction or a Y axis direction.
- FIG. 8 is a diagram showing a state where the table included in the control target moves in triangular-wave shape, in the X axis direction or the Y axis direction.
- FIG. 9 is a diagram for explaining motion of the servo motor of when the geometry is a star shape.
- FIG. 10 is a block diagram showing a machine learning device of the first embodiment.
- FIG. 11 is a flowchart explaining motion of the machine learning device.
- FIG. 12 is a flowchart explaining motion of an optimizing action information output unit of the machine learning device.
- FIG. 13 is a diagram showing a movement path of the table before parameter adjustment of a non-linear friction compensator by machine learning.
- FIG. 14 is a diagram showing a movement path of the table after the parameter adjustment of the non-linear friction compensator by the machine learning.
- FIG. 1 is a block diagram showing a servo motor control system of a first embodiment of the present invention.
- a servo motor control system 10 includes n servo motor control devices 100 - 1 to 100 - n , n machine learning devices 200 - 1 to 200 - n , and a network 400 .
- n is an arbitrary natural number.
- the servo motor control device 100 - 1 and the machine learning device 200 - 1 are considered to be a set of one-to-one, and are communicatively connected.
- the servo motor control devices 100 - 2 to 100 - n and the machine learning devices 200 - 2 to 200 - n are connected as similar to the servo motor control device 100 - 1 and the machine learning device 200 - 1 .
- n sets of the servo motor control devices 100 - 1 to 100 - n and the machine learning devices 200 - 1 to 200 - n are connected via the network 400 .
- the servo motor control devices and the machine learning devices in each of the sets may be directly connected via a connection interface.
- a plurality of n sets of these servo motor control devices 100 - 1 to 100 - n , and the machine learning devices 200 - 1 to 200 - n may be, for example, installed in the same factory, or may be installed in different factories.
- the network 400 is, for example, a local area network (LAN) constructed in a factory, the Internet, a public telephone network, a direct connection via a connection interface, or combination thereof.
- LAN local area network
- Particular communication method in the network 400 which of wired connection and wireless connection is used, and the like, are not limited particularly.
- FIG. 2 is a block diagram showing a set of a servo motor control device of the servo motor control system and a machine learning device, and a control target, of the first embodiment of the present invention.
- the servo motor control device 100 and the machine learning device 200 in FIG. 2 correspond to, for example, the servo motor control device 100 - 1 and the machine learning device 200 - 1 shown in FIG. 1 .
- a control target 300 is, for example, a machine tool, a robot, an industrial machine, or the like including the servo motor.
- the servo motor control device 100 may be provided as a part of a machine tool, a robot, an industrial machine, or the like.
- the servo motor control device 100 includes a position command creation unit 101 , a subtractor 102 , a position control unit 103 , an adder 104 , a subtractor 105 , a velocity control unit 106 , an adder 107 , an integrator 108 , a position feedforward calculation unit 109 , a velocity feedforward calculation unit 110 , and a non-linear friction compensator 111 .
- the position command creation unit 101 creates a position command value for operating the servo motor included in the control target 300 , in accordance with a program input from a host control device, an external input device, or the like not shown, to output the created position command value to the subtractor 102 , and the position feedforward calculation unit 109 .
- the subtractor 102 determines a difference between the position command value and a detected position obtained by position feedback, to output the difference to the position control unit 103 , as position error, and transmit the difference to the machine learning device 200 .
- the position command creation unit 101 creates the position command value on the basis of a program that operates the servo motor included in the control target 300 .
- the control target 300 is, for example, a machine tool including the servo motor.
- the servo motor control device shown in FIG. 2 When the machine tool moves a table mounted with a workpiece in an X axis direction and a Y axis direction, and machines the workpiece, the servo motor control device shown in FIG. 2 is provided with respect to each of the X axis direction and the Y axis direction. When the machine tool moves the table in directions of three or more axes, the servo motor control device shown in FIG. 2 is provided with respect to each of the axis directions.
- the position command creation unit 101 creates the position command value by changing a pulse frequency so that a geometry specified by the program is obtained, and a speed of the servo motor is changed.
- the position control unit 103 outputs to the adder 104 a value obtained by multiplying a position gain Kp that is set in advance to, the position error, as a velocity command value.
- the position feedforward calculation unit 109 outputs to the adder 104 , the velocity feedforward calculation unit 110 , and the non-linear friction compensator 111 , a value obtained by differentiating the position command value and multiplying a feedforward coefficient.
- the adder 104 adds the velocity command value, and an output value of the position feedforward calculation unit 109 , to output to the subtractor 105 , the obtained value as a feedforward controlled velocity command value.
- the subtractor 105 determines difference between an output of the adder 104 , and the velocity detection value obtained by velocity feedback, to output the difference as velocity error, to the velocity control unit 106 .
- the velocity control unit 106 adds a value obtained by multiplying an integral gain K 1 v that is set in advance, to the velocity error and integrating, with a value obtained by multiplying a proportional gain K 2 v that is set in advance, to the velocity error, to output an obtained value as a torque command value, to the adder 107 .
- the velocity feedforward calculation unit 110 performs velocity feedforward calculation processing represented by a transfer function Gf(S) represented by, for example, formula 1 (shown as formula 1 below), on the basis of the output value of the position feedforward calculation unit 109 to output a calculation result to the adder 107 as a first torque compensation value.
- Coefficients a i , b j of the velocity feedforward calculation unit 110 are constants that are set in advance so that 0 ⁇ i ⁇ m for a i and 0 ⁇ j ⁇ n for b j are satisfied.
- the dimensions m, n are natural numbers which are set in advance.
- the non-linear friction compensator 111 determines a non-linear friction compensation value for compensating non-linear friction generated in the control target 300 on the basis of an output value of the position feedforward calculation unit 109 , and outputs the non-linear friction compensation value to the adder 107 as a second torque compensation value.
- the non-linear friction is, for example, mainly generated by a ball screw, or the like of a machine tool that is other than the servo motor, when the control target 300 is the machine tool including the servo motor. However, the non-linear friction is generated also in the servo motor.
- a non-linear friction compensation value f( ⁇ ) is shown by, for example, formula 2 (shown as formula 2 below), and can be determined by using a motor speed ⁇ .
- FIG. 3 is a characteristic diagram showing a relationship between a non-linear friction compensation value f( ⁇ ) and a motor speed ⁇ .
- optimal values of combination of compensation coefficients c, d in formula 2 are determined by using the machine learning device 200 .
- the adder 107 adds the torque command value, the output value of the velocity feedforward calculation unit 110 , and the output value of the non-linear friction compensator 111 , to output an obtained value to a servo motor of the control target 300 as a feedforward controlled torque command value.
- the control target 300 outputs a velocity detection value, and the velocity detection value is input to the subtractor 105 , as the velocity feedback.
- the velocity detection value is integrated by the integrator 108 to be a position detection value.
- the position detection value is input to the subtractor 102 , as position feedback.
- the servo motor control device 100 is configured as described above.
- FIG. 4 is a block diagram showing part of a machine tool including the servo motor, as an example of the control target 300 .
- the servo motor control device 100 moves a table 304 via a coupling mechanism 303 in a servo motor 302 .
- the machine tool rotates a spindle attached with a tool while moving the table 304 , to machine a workpiece mounted on the table 304 .
- the coupling mechanism 303 has a coupling 3031 coupled to the servo motor 302 , and a ball screw 3033 fixed to the coupling 3031 .
- a nut 3032 is screwed into the ball screw 3033 .
- the nut 3032 screwed into the ball screw 3033 is moved in an axis direction of the ball screw 3033 , by rotation drive of the servo motor 302 .
- the non-linear friction is generated in the coupling mechanism 303 including the coupling 3031 and the ball screw 3033 , the nut 3032 , and the like. However, the non-linear friction is generated also in the servo motor 302 .
- a rotation angle position of the servo motor 302 is detected by a rotary encoder 301 that is a position detection unit, associated with the servo motor 302 .
- a detected signal is utilized as the velocity feedback.
- the detected signal is integrated by the integrator 108 to be utilized as the position feedback.
- the machine tool may include a linear scale 305 that detects a moving distance of the ball screw 3033 , in an end portion of the ball screw 3033 .
- An output of the linear scale 305 can be used as the position feedback.
- the machine learning device 200 performs a preset evaluation program (hereinafter, also referred to as an “evaluation program”) to learn the compensation coefficients of the non-linear friction compensator 111 .
- a geometry specified by the evaluation program may be, for example, a circle, a square, or a star-shape when the inversion motion of the servo motor is evaluated.
- FIG. 5 is a diagram for explaining motion of the servo motor when the geometry is a circle.
- FIG. 6 is a diagram for explaining motion of the servo motor when the geometry is a square. In FIG. 5 and FIG. 6 , the table moves so that the workpiece is machined in a clockwise direction.
- the servo motor control device 100 controls the servo motor 302 so that the table included in the control target 300 moves in a sine wave shape or triangular-wave shape in at least one direction of the X axis direction and the Y axis direction.
- the evaluation program controls the frequency of the pulse output from the position command creation unit 101 of the servo motor control device 100 . This frequency controlling, a feed rate of the X axis direction or the Y axis direction of the table is controlled.
- the frequency of the pulse output from the position command creation unit 101 becomes high, rotation speed of the motor increases, and the feed rate increases.
- the servo motor control device 100 controls the servo motor of the X axis direction and the Y axis direction so that the table moves in a sine wave shape in the X axis direction as shown in FIG. 7 , and moves in a cosine-wave shape in the Y axis direction.
- the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction.
- the servo motor that moves the table in the X axis direction rotates at the same speed as speeds of before and after the position A 1 , and the table moves at the same speed as the speeds of before and after the position A 1 in the X axis direction.
- the position A 1 in which the table inverts in the Y axis direction corresponds to an inversion position of a positive direction shown in FIG. 7 .
- the table that moves at a constant speed in the X axis direction moves so that a phase of a wave form (sine wave shape) becomes a wave form (cosine wave) that is delayed or advances by 90 degrees.
- the position A 1 in which the table moves at a constant speed in the X axis direction corresponds to an intermediate position between an inversion position of the positive direction and an inversion position of a negative direction, shown in FIG. 7 .
- the servo motor control device 100 controls each servo motor so that the motion of the servo motor that moves the table in the X axis direction, and the motion of the servo motor that moves the table in the Y axis direction are opposite from the position A 1 . That is, at the position A 2 , the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction.
- the servo motor that moves the table in the Y axis direction rotates at the same speed as speeds of before and after the position A 2 , and the table moves at the same speed as speeds of before and after the position A 2 in the Y axis direction.
- the servo motor control device 100 controls the servo motor of the X axis direction and the Y axis direction so that the table moves in a triangular-wave shape in the X axis direction as shown in FIG. 8 , and a phase of a triangular wave shown in FIG. 8 moves in a triangular-wave shape that is delayed or advances by 90 degrees, in the Y axis direction.
- the rotation speed of the servo motor from a positive inversion position to a negative inversion position, and from the negative inversion position to the positive inversion position is constant.
- the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction.
- the servo motor that moves the table in the Y axis direction rotates at a constant speed, and the table moves in the Y axis direction at the constant speed.
- the position B 1 in which the table inverts in the X axis direction corresponds to an inversion position of the negative direction shown in FIG. 8 .
- the table that moves in the Y axis direction at a constant speed moves so that the phase of the triangular wave shown in FIG.
- the position B 1 in which the table moves at a constant speed in the Y axis direction corresponds to an intermediate position between the inversion position of the positive direction and the inversion position of the negative direction, shown in FIG. 8 .
- the servo motor control device 100 controls each servo motor so that the motion of the servo motor that moves the table in the X axis direction, and the motion of the servo motor that moves the table in the Y axis direction are opposite from the position B 1 .
- the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction.
- the servo motor that moves the table in the X axis direction rotates at a constant speed, and the table moves in the X axis direction at a constant speed.
- the position command creation unit 101 of the servo motor control device 100 When the evaluation program is executed, the position command creation unit 101 of the servo motor control device 100 outputs the position command value so that the geometry is a circle, or a square, sequentially.
- the position command creation unit 101 changes the feed rate for every geometry that is a circle or a square, so that the machine learning device 200 can learn the influence on a plurality of feed rates.
- the position command creation unit 101 may change the feed rate in a middle of moving of a shape of the geometry, for example, when the table passes a corner in moving the table into a square geometry.
- the machine learning device 200 can make a frequency high, or learn a pattern of gradually increasing a frequency.
- the geometry specified by the evaluation program may be a geometry, for example, a star shape as shown in FIG. 9 , with which both the rotation directions of two servo motors that move the table in the X axis direction and the Y axis direction invert.
- the machine learning device 200 may learn combination of the compensation coefficients c, d of the non-linear friction compensator 111 in such geometry.
- FIG. 9 is a diagram for explaining motion of the servo motor of when the geometry is a star shape.
- the servo motor control device 100 controls the servo motors of the X axis direction and the Y axis direction so that the table moves in a triangular-wave shape in the X axis direction and the Y axis direction, in a projection portion of four “ ⁇ ” shapes of the star shape.
- the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction.
- the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction. Accordingly, the inversion motion of the servo motor of when both the rotation directions of the two servo motors that move the table in the X axis direction and the Y axis direction invert is evaluated.
- the machine learning device 200 learns combination of the compensation coefficients c, d of the non-linear friction compensator 111 for reducing position error of when the control target 300 is driven on the basis of the evaluation program.
- An agent (corresponding to the machine learning device 200 in the present embodiment) observes an environmental state, and selects one action. Then, the environment changes on the basis of the action. The agent calculates some rewards according to the environmental change, to learn selection (decision) of better action. While learning with a teacher presents a complete correct, the reward in the reinforcement learning often presents a fragmental value based on change of part of the environment. Thus, the agent learns to select an action so that the total reward in the future is the maximum.
- the machine learning device 200 learns a suitable action in consideration of the mutual effect of the action with the environment, that is, an action for maximizing the reward to be obtained in the future. This represents that, in the present embodiment, the machine learning device 200 gains an action that affects the future, for example, selecting action information for reducing position error.
- the Q-learning is a method of learning a value function Q (S, A) of selecting an action A, under an environmental state S.
- An object of the Q-learning is to select the action A having the highest value function Q (S, A) as a suitable action, from among actions A that can be taken, in a state S.
- the agent selects various actions A under a state S, and with respect to the action A at that time, selects a better action on the basis of the given reward, to learn the correct value Q (S, A).
- E[ ] represents an expected value
- t represents time
- ⁇ represents a parameter called a discount rate described later
- r t is a reward at the time t
- ⁇ represents the total by the time t.
- the expected value in this formula is an expected value in a case where the state is changed according to the suitable action.
- the suitable action is not clear in a process of the Q-learning.
- An updating formula of such value function Q (S, A) can be represented by, for example, the following formula 3 (shown as formula 3 below).
- S t represents an environmental state at the time t
- a t represents an action at the time t.
- the state is changed to S t+1 by the action A t .
- r t+1 represents reward obtained by the state change.
- An item added with max is obtained by multiplying ⁇ to the Q value of when the action A having the highest Q value that has been identified at that time, is selected, under the state S t+1 .
- the ⁇ is a parameter of 0 ⁇ 1, and is called a discount rate.
- ⁇ is a learning coefficient, and is in a range of 0 ⁇ 1.
- the formula 3 described above represents a method of updating the value function Q (S t , A t ) of the action A t in the state S t , on the basis of the reward r t+1 sent back as a result of the action A t .
- This updating formula represents that the Q (S t , A t ) is set to be large when a value max a Q(S t+1 , A) of the best action in the next state S t+1 by the action A t is larger than the value Q (S t , A t ) of the action A t in the state S t , while, the Q (S t , A t ) is set to be small when the value max a Q(S t+1 , A) of the best action in the next state S t+1 by the action A t is smaller. That is, the updating formula represents that a value in an action in a state is approximated to a value of the best action in the next state by the action. Difference between them changes depending on the discount rate ⁇ and the reward r t+1 . However, basically, mechanism is such that a value of the best action in a state is propagated to a value of an action in a state that is one before that state.
- the Q-learning may utilize a known technique called a Deep Q-Network (DQN).
- DQN Deep Q-Network
- the Q-learning may configure a value function Q by using an appropriate neural network, and adjust a parameter of the neural network, to approximate the value function Q by the appropriate neural network, to calculate the value of the value function Q (S, A)
- DQN Deep Q-Network
- the time required for settling the Q-learning can be shorten.
- the DQN is described in detail, for example, in Non-Patent Document below.
- the Q-learning described above is performed by the machine learning device 200 .
- the machine learning device 200 learns the value function Q of selecting adjustment of the compensation coefficients c, d of the non-linear friction compensator 111 related to a state S, as the action A by setting values of the compensation coefficients c, d of the non-linear friction compensator 111 in the servo motor control device 100 , and a servo state such as a command and feedback to be the state S.
- the servo state includes position error information of the servo motor control device 100 acquired by executing the evaluation program.
- the machine learning device 200 observes the state information S including a servo state such as a command and feedback, including the position error information of the servo motor control device 100 acquired by executing the evaluation program on the basis of the compensation coefficients c, d of the non-linear friction compensator 111 , to determine the action A.
- the machine learning device 200 gives a reward every time the action A is performed.
- the machine learning device 200 for example, searches an optimal action A so that the total reward in the future is the maximum, through trial and error.
- the machine learning device 200 can select the optimal action A (that is, the optimal compensation coefficients c, d of the non-linear friction compensator 111 ) with respect to the state S including the servo state such as a command and feedback, including the position error information of the servo motor control device 100 obtained by executing the evaluation program on the basis of the compensation coefficients c, d of the non-linear friction compensator 111 .
- the machine learning device 200 selects the action A with which the value of the value function Q learned by the machine learning device 200 is the maximum, from among the actions A applied to the compensation coefficients c, d of the non-linear friction compensator 111 related to a state S, to select the action A with which the position error obtained by executing the evaluation program is the minimum (that is, the combination of the compensation coefficients c, d of the non-linear friction compensator 111 ).
- FIG. 10 is a block diagram showing the machine learning device 200 of the first embodiment of the present invention.
- the machine learning device 200 includes the state information acquisition unit 201 , a learning unit 202 , the action information output unit 203 , a value function storage unit 204 , and an optimizing action information output unit 205 .
- the state information acquisition unit 201 acquires a state S including the servo state such as the command and the feedback, including the position error information of the servo motor control device 100 acquired by executing the evaluation program, from the servo motor control device 100 on the basis of the compensation coefficients c, d of the non-linear friction compensator 111 in the servo control device 100 .
- This state information S corresponds to the environmental state S in the Q-learning.
- the state information acquisition unit 201 outputs the acquired state information S to the learning unit 202 .
- a user creates in advance, the compensation coefficients c, d of the non-linear friction compensator 111 at the time when the Q-learning starts for the first time. In the present embodiment, initial set values of the compensation coefficients c, d of the non-linear friction compensator 111 , created by the user is adjusted to be more optimal value by the reinforcement learning.
- the learning unit 202 is a unit that learns the value function Q (S, A) of when an action A is selected under an environmental state S. Particularly, the learning unit 202 includes the reward output unit 2021 , the value function updating unit 2022 , and the action information generation unit 2023 .
- the reward output unit 2021 is a unit that calculates the reward of when the action A is selected under a state S.
- a set of position error (position error set) that is a state variable number in the state S is represented by PD(S)
- a position error set that is a state variable number related to state information S′ that has changed from the state S due to action information A is represented by PD(S′).
- a value of the position error in the state S is a value calculated on the basis of an evaluation function f (PD(S)) that is set in advance.
- the evaluation function f for example,
- dt a function of calculating an integrated value by weighting the absolute value of the position error, with time ⁇ t
- evaluation function is not limited thereto. It is sufficient that the evaluation function is a function of appropriately evaluating the position error value in the state S on the basis of the position error set PD(S).
- the reward output unit 2021 sets a reward value to be a negative value.
- the reward output unit 2021 sets the reward value to be a positive value.
- the reward output unit 2021 sets the reward value to be zero, for example.
- the negative value of when the value f(PD(S′)) of the position error of the state S′ after performing of the action A, is larger than the value f(PD(S)) of the position error in the prior state S, may be larger according to a ratio. That is, the negative value may be larger according to the degree of increasing of the value of the position error.
- the positive value of when the value f(PD(S′)) of the position error of the state S′ of after performing of the action A is smaller than the value f(PD(S)) of the position error in the prior state S may be larger according to a ratio. That is, the positive value may be larger according to the degree of decreasing of the value of the position error.
- the value function updating unit 2022 performs Q-learning on the basis of the state S, the action A, the state S′ of when the action A is applied to the state S, and the reward value calculated as described above, to update a value function Q that the value function storage unit 204 stores. Updating of the value function Q may be performed by online learning, batch learning, or mini-batch learning.
- the online learning is a learning method of applying an action A to the current state S to update the value function Q immediately every time when the state S makes a transition to a new state S′.
- the batch learning is a learning method of applying an action A to the current state S to repeat the transition of the state S to the new state S′ to collect learning data and perform updating of the value function Q by using all the collected learning data.
- the mini-batch learning is an intermediate learning method between the online learning and the batch learning, and is a learning method of performing updating of the value function Q every time when certain pieces of learning data are accumulated.
- the action information generation unit 2023 selects the action A in a process of the Q-learning, with respect to the current state S.
- the action information generation unit 2023 generates the action information A in order to cause operation (corresponding to the action A in the Q-learning) of correcting each of the compensation coefficients c, d of the non-linear friction compensator 111 of the servo motor control device 100 in the process of the Q-learning to be performed, to output the generated action information A to the action information output unit 203 .
- the action information generation unit 2023 causes incremental adding or subtracting of each of the compensation coefficients c, d of the non-linear friction compensator 111 included in the action A with respect to each of the compensation coefficients of the non-linear friction compensator 111 included in the state S.
- the action information generation unit 2023 may take, as the next action A′, a measure of selecting the action A′ such that the value of the position error becomes smaller, such as incremental adding or subtracting as similar to the previous action, with respect to each of the compensation coefficients c, d of the non-linear friction compensator 111 .
- the action information generation unit 2023 may take, as the next action A′, for example, a measure of selecting the action A′ such that the position error is smaller than the previous value, such as incremental subtracting or adding on the contrary to the previous action, with respect to each of the compensation coefficients c, d of the non-linear friction compensator 111 .
- the action information generation unit 2023 may take a measure of selecting the action A′ by a known method such as the greedy method of selecting the action A′ having the highest value function Q (S, A) from among values of the action A currently estimated, or the ⁇ greedy method of randomly selecting the action A′ with a small probability ⁇ , and other than that, selecting the action A′ having the highest value function Q (S, A).
- a known method such as the greedy method of selecting the action A′ having the highest value function Q (S, A) from among values of the action A currently estimated, or the ⁇ greedy method of randomly selecting the action A′ with a small probability ⁇ , and other than that, selecting the action A′ having the highest value function Q (S, A).
- the action information output unit 203 is a unit that transmits the action information A output from the learning unit 202 to the servo motor control device 100 .
- the servo motor control device 100 slightly corrects the current state S, that is, each of the compensation coefficients c, d of the non-linear friction compensator 111 that are currently set on the basis of the action information, to make a transition to the next state S′ (that is, the corrected compensation coefficients of the non-linear friction compensator 111 ).
- the value function storage unit 204 is a storage device that stores the value function Q.
- the value function Q may be stored in a table (hereinafter, referred to as an action value table) for example, for every state S and every action A.
- the value function Q stored in the value function storage unit 204 is updated by the value function updating unit 2022 .
- the value function Q stored in the value function storage unit 204 may be shared with the other machine learning devices 200 . When the value function Q is shared among a plurality of machine learning devices 200 , distributed reinforcement learning can be performed by the machine learning devices 200 . Thus, efficiency of the reinforcement learning can be improved.
- the optimizing action information output unit 205 creates the action information A (hereinafter, referred to as “optimizing action information”) for causing the non-linear friction compensator 111 to perform operation with which the value function Q (S, A) is the maximum, on the basis of the value function Q updated by performing the Q-learning by the value function updating unit 2022 . More particularly, the optimizing action information output unit 205 acquires the value function Q stored in the value function storage unit 204 . This value function Q is updated by performing the Q-learning by the value function updating unit 2022 as described above.
- the optimizing action information output unit 205 creates the action information on the basis of the value function Q to output the created action information to the servo motor control device 100 (the non-linear friction compensator 111 ).
- This optimizing action information includes information of correcting each of the compensation coefficients c, d of the non-linear friction compensator 111 , as similar to the action information output in the process of the Q-learning by the action information output unit 203 .
- each of the compensation coefficients c, d of the non-linear friction compensator 111 are corrected on the basis of this action information. Accordingly, the servo motor control device 100 can operate to reduce the value of the position error. As described above, by utilizing the machine learning device 200 according to the present invention, the parameter adjustment of the non-linear friction compensator 111 of the servo motor control device 100 is simplified.
- each of the servo motor control device 100 and the machine learning device 200 include an operation processing device such as a central processing unit (CPU).
- Each of the servo motor control device 100 and the machine learning device 200 also include a sub storage device such as a hard disk drive (HDD) stored with various control programs such as application software and an operating system (OS), and a main storage device such as a random access memory (RAM) for storing data temporarily required for execution of the program by the operation processing device.
- HDD hard disk drive
- OS operating system
- RAM random access memory
- the operation processing device performs operation processing based on these application software and OS.
- various hardware included in each device is controlled.
- the function blocks of the present embodiment are realized. That is, the present embodiment can be realized by cooperation of the hardware and the software.
- the machine learning device 200 performs a large amount of operation associated with the machine learning.
- a personal computer is mounted with graphics processing units (GPUs), and the GPUs are utilized for the operation processing associated with the machine learning by a technique called general-purpose computing on graphics processing units (GPGPU).
- the machine learning device 200 can perform high speed processing by utilizing the GPU.
- a plurality of such computers mounted with the GPU may be used to construct a computer cluster, so that the machine learning device 200 performs parallel processing by the plurality of computers included in the computer cluster.
- step S 11 the state information acquisition unit 201 acquires the state information S from the servo motor control device 100 .
- the acquired state information is output to the value function updating unit 2022 and the action information generation unit 2023 .
- this state information S is information corresponding to a state in the Q-learning, and includes each of the compensation coefficients c, d of the non-linear friction compensator 111 of at the time of step S 11 . In this way, a set PD(S) of the position error corresponding to the geometry that is a circle of when the compensation coefficients are initial values, is obtained from the non-linear friction compensator 111 .
- the coefficients c, d of the non-linear friction compensator 111 in the initial state S 0 are, initially set by a user.
- the value PD(S 0 ) of the position error in the state S 0 of when the Q-learning starts for the first time, is obtained from the subtractor 102 by operating the servo motor control device 100 by the evaluation program.
- the position command creation unit 101 sequentially outputs the position command in a circle geometry specified by the evaluation program.
- the position command value corresponding to the geometry that is a circle is output from the position command creation unit 101 , and the subtractor 102 outputs a difference between the position command value, and a detection position output from the integrator 108 , to the machine learning device 200 , as the position error PD(S 0 ).
- step S 12 the action information generation unit 2023 generates new action information A, to output the generated new action information A to the servo motor control device 100 via the action information output unit 203 .
- the action information generation unit 2023 outputs the new action information A on the basis of the measure described above.
- the servo motor control device 100 that has received the action information A drives the machining tool including the servo motor, by the state S′ in which each of the compensation coefficients c, d of the non-linear friction compensator 111 related to the current state S are corrected on the basis of the received action information.
- this action information corresponds to the action A in the Q-learning.
- step S 13 the state information acquisition unit 201 acquires the position error PD(S′) in the new state S′ from the subtractor 102 , and the compensation coefficients c, d from the non-linear friction compensator 111 .
- the state information acquisition unit 201 acquires the set PD(S′) of the position error corresponding to the geometry that is a circle, of when compensation coefficients are the compensation coefficients c, d in the state S′, from the non-linear friction compensator 111 .
- the acquired state information is output to the reward output unit 2021 .
- step S 14 the reward output unit 2021 determines size relationship between the value f(PD(S′)) of the position error in the state S′, and the value f(PD(S)) of the position error in the state S.
- the reward output unit 2021 sets the reward to be a negative value in step S 15 .
- the reward output unit 2021 sets the reward to be a positive value, in step S 16 .
- the reward output unit 2021 sets the reward to be zero in step S 17 .
- the reward output unit 2021 may perform weighting with respect to the negative value and the positive value of the reward.
- step S 15 When any of step S 15 , step S 16 , and step S 17 ends, the value function updating unit 2022 updates the value function Q stored in the value function storage unit 204 on the basis of the reward value calculated in any of these steps, in step S 18 . Then, processing returns to step S 11 again, and processing described above is repeated. Thereby, the value function Q settles to a suitable value.
- the processing described above may end with a condition of being repeated for a predetermined number of times, or being repeated for predetermined time.
- online updating is exemplified for step S 18 , batch updating or mini batch updating may be performed instead of the online updating.
- the present embodiment exhibits an effect capable of obtaining a suitable value function for adjustment of the compensation coefficients c, d of the non-linear friction compensator 111 by utilizing the machine learning device 200 , to simplify optimization of the compensation coefficients c, d of the feedforward.
- operation at the time of generation of optimizing action information by the optimizing action information output unit 205 will be described with reference to a flowchart of FIG. 12 .
- the optimizing action information output unit 205 obtains the value function Q stored in the value function storage unit 204 .
- the value function Q has been updated by performing of the Q-learning by the value function updating unit 2022 as described above.
- step S 22 the optimizing action information output unit 205 generates the optimizing action information on the basis of the value function Q, to output the generated optimizing action information to the non-linear friction compensator 111 of the servo motor control device 100 .
- the optimizing action information is generated on the basis of the value function Q determined by learning by the machine learning device 200 , and adjustment of the compensation coefficient c, d of the non-linear friction compensator 111 currently set by the servo motor control device 100 can be simplified, on the basis of the optimizing action information, and the value of the position error can be reduced.
- FIG. 13 is a diagram showing a movement path of the table before parameter adjustment of a non-linear friction compensator by the machine learning.
- FIG. 14 is a diagram showing a movement path of the table after the parameter adjustment of the non-linear friction compensator by the machine learning. Both FIG. 13 and FIG. 14 show a movement path of when the table included in the control target moves in a sine-wave shape in the X axis direction or the Y axis direction as shown in FIG. 5 . As shown in FIG.
- the inversion delay due to the non-linear friction occurs when the rotation direction of the servo motor attempts to invert. That is, the inversion delay due to the non-linear friction occurs with respect to the movement path of the table of the triangular wave shape, that is set in the position command creation unit 101 , and the table does not instantly invert and move.
- the inversion delay due to the non-linear friction is removed, and the table moves in the movement path of the triangular-wave shape.
- the servo motor control unit of the servo motor control device described above, and each of components included in the machine learning device may be realized by hardware, software or combination thereof.
- the servo motor control method performed by cooperation of each of the components included in the servo motor control device described above, also may be realized by hardware, software, or combination thereof. Being realized by software means being realized by reading and executing a program by a computer.
- the program may be stored by using various types of non-transitory computer readable media, and supplied to the computer.
- the non-transitory computer readable media include various types of tangible storage media. Examples of the non-transitory computer readable media include a magnetic recording medium (for example, a hard disk drive), a magneto-optical recording medium (for example, a magneto-optical disk), a CD-ROM (read only memory), a CD-R, a CD-R/W, a semiconductor memory (for example, a mask ROM, a programmable ROM (PROM), an erasable PROM (EPROM), a flash ROM, and a random access memory (RAM).
- a magnetic recording medium for example, a hard disk drive
- a magneto-optical recording medium for example, a magneto-optical disk
- CD-ROM read only memory
- CD-R read only memory
- CD-R/W a CD-R/W
- semiconductor memory for example, a mask ROM, a
- the servo motor control device 100 is described as including the velocity feedforward calculation unit 110 , and having a configuration in which the non-linear friction compensator 111 is connected in parallel to the velocity feedforward calculation unit 110 , but not limited to this.
- the velocity feedforward calculation unit 110 may be an option, and the servo motor control device 100 may not include the velocity feedforward calculation unit 110 .
- the machine learning device 200 is a device that is different from the servo motor control device 100 . Part or all of the function of the machine learning device 200 may be realized by the servo motor control device 100 . That is, the servo motor control device 100 may include the machine learning device 200 .
- the machine learning device 200 and the servo motor control device 100 are communicatively connected as a set of one-to-one.
- one machine learning device 200 is communicatively connected with a plurality of servo motor control devices 100 via the network 400 to perform machine learning of each of the servo motor control devices 100 .
- respective functions of the machine learning device 200 may be realized by a distributed processing system in which the functions are distributed in a plurality of servers, as appropriate.
- the functions of the machine learning device 200 may be realized by utilizing a virtual server function, or the like, in a cloud.
- the servo motor control system may be configured to share learning results in the machine learning devices 200 - 1 to 200 - n . Thereby, more optimal model can be constructed.
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Automation & Control Theory (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Feedback Control In General (AREA)
- Numerical Control (AREA)
- Control Of Position Or Direction (AREA)
- Control Of Electric Motors In General (AREA)
Abstract
Description
- This application is based on and claims the benefit of priority from Japanese Patent Application No. 2017-138949, filed on 18 Jul. 2017, the content of which is incorporated herein by reference.
- The present invention relates to: a machine learning device that performs learning related to compensation coefficients in compensation of nonlinear friction, with respect to a servo motor control device that performs compensation to the nonlinear friction; the servo motor control device and a servo motor control system including the machine learning device; and a machine learning method.
- As conventional servo motor control devices, for example, a motor control device disclosed in Patent Document 1, a servo control device disclosed in Patent Document 2, and a motor control device disclosed in Patent Document 3 are known.
- The motor control device disclosed in Patent Document 1 has: a velocity feedforward control unit that generates a velocity feedforward command for reducing position error on the basis of a position command; and a torque feedforward control unit that generates a torque feedforward command for reducing the position error on the basis of the position command.
- The servo control device disclosed in Patent Document 2 has a feedforward compensator that generates a feedforward command on the basis of a position command. The servo control device disclosed in Patent Document 2 has a friction compensation device for compensating control error due to influence of friction in a machine tool.
- The motor control device disclosed in Patent Document 3 has a compensation calculation unit for compensating stick motions on the basis of a friction torque estimated by disturbance observer and a torque command, and compensating lost motions on the basis of a velocity command.
- Patent Document 1: Japanese Unexamined Patent Application, Publication No. 2016-101017
- Patent Document 2: Japanese Unexamined Patent Application, Publication No. 2015-018496
- Patent Document 3: Japanese Unexamined Patent Application, Publication No. 2004-280565
- As a factor of control error at the time of inversion motion of a servo motor in a servo motor control device, response delay of a servo system, elastic deformation of a machine system, and influence of friction are considered. The control error is greatly influenced by, particularly, non-linear friction among frictions. Compensating non-linear friction is important in servo performance improvement. An object of the present invention is to provide: a machine learning device that performs compensation of non-linear friction to improve responsiveness of a servo system at the time of inversion motion of a servo motor; a servo motor control device; a servo motor control system; and a machine learning method.
- (1) A machine learning device (for example, a
machine learning device 200 described later) according to the present invention is configured to perform machine learning with respect to a servo motor control device (for example, a servomotor control device 100 described later) including a non-linear friction compensation unit (for example, anon-linear friction compensator 111 described later) configured to create a compensation value with respect to non-linear friction on the basis of a position command, and the machine learning device includes: - a state information acquisition unit (for example, a state
information acquisition unit 201 described later) configured to acquire state information from the servo motor control device by causing the servo motor control device to execute a predetermined program, the state information including a servo state including at least position error, and combination of compensation coefficients of the non-linear friction compensation unit;
an action information output unit (for example, an actioninformation output unit 203 described later) configured to output action information including adjustment information of the combination of the compensation coefficients included in the state information, to the servo motor control device; a reward output unit (for example, areward output unit 2021 described later) configured to output a reward value in reinforcement learning, based on the position error included in the state information; and
a value function updating unit (for example, a valuefunction updating unit 2022 described later) configured to update an action value function on the basis of the reward value output by the reward output unit, the state information, and the action information. - (2) In the machine learning device according to (1) described above, the reward output unit may output the reward value on the basis of an absolute value of the position error.
- (3) In the machine learning device according to (1) or (2) described above, the servo motor control device may further comprise a velocity feedforward calculation unit (for example, a velocity
feedforward calculation unit 110 described later) configured to create a velocity feedforward value on the basis of the position command, and the non-linear friction compensation unit may be connected in parallel to the velocity feedforward calculation unit. - (4) In the machine learning device according to any of (1) to (3) described above, the machine learning device may further include an optimizing action information output unit (for example, an optimizing action
information output unit 205 described later) configured to generate and output combination of compensation coefficients of the non-linear friction compensation unit, on the basis of a value function updated by the value function updating unit. - (5) A servo motor control system according the present invention is a servo motor control system including: the machine learning device (for example, a
machine learning device 200 described later) according to any of (1) to (4) described above; and a servo motor control device (for example, a servomotor control device 100 described later) that comprise a non-linear friction compensation unit configured to create a compensation value with respect to non-linear friction. - (6) In the servo motor control system according to (5) described above, the servo motor control device may further comprise a velocity feedforward calculation unit (for example, a velocity
feedforward calculation unit 110 described later) configured to create a velocity feedforward value on the basis of a position command, and the non-linear friction compensation unit may be connected in parallel to a velocity feedforward calculation unit. - (7) A servo motor control device according to the present invention is a servo motor control device including the machine learning device according to any of (1) to (4) described above, and a non-linear friction compensation unit configured to create a compensation value with respect to non-linear friction.
- (8) The servo motor control device according to (7) described above, the servo motor control device may further include a velocity feedforward calculation unit configured to create a velocity feedforward value on the basis of a position command, and the non-linear friction compensation unit may be connected in parallel to a velocity feedforward calculation unit.
- (9) A machine learning method according to the present invention is a machine learning method of a machine learning device (for example, a
machine learning device 200 described later) configured to perform machine learning with respect to a servo motor control device (for example, a servomotor control device 100 described later), including a non-linear friction compensation unit (for example, anon-linear friction compensator 111 described later) configured to create a compensation value with respect to non-linear friction on the basis of a position command, the machine learning method including: - acquiring state information from the servo motor control device by causing the servo motor control device to execute a predetermined program, the state information including a servo state including at least position error, and a combination of compensation coefficients of the non-linear friction compensation unit;
outputting action information including adjustment information of the combination of compensation coefficients included in the state information, to the servo motor control device; and updating action value function on the basis of a reward value in reinforcement learning, the state information, and the action information based on the position error included in the state information. - According to the present invention, responsiveness of a servo system at the time of inversion motion of a servo motor can be improved by compensating of non-linear friction.
-
FIG. 1 is a block diagram showing a servo motor control system of a first embodiment of the present invention. -
FIG. 2 is a block diagram showing a set of a servo motor control device of the servo motor control system and a machine learning device, and a control target, of the first embodiment of the present invention. -
FIG. 3 is a characteristic diagram showing a relationship between a non-linear friction compensation value f(ω) and a motor speed w. -
FIG. 4 is a block diagram showing an example of a control target. -
FIG. 5 is a diagram for explaining motion of a servo motor of when a geometry is a circle. -
FIG. 6 is a diagram for explaining motion of the servo motor of when the geometry is a square.FIG. 7 is a diagram showing a state where a table included in the control target moves sinusoidally in an X axis direction or a Y axis direction. -
FIG. 8 is a diagram showing a state where the table included in the control target moves in triangular-wave shape, in the X axis direction or the Y axis direction. -
FIG. 9 is a diagram for explaining motion of the servo motor of when the geometry is a star shape. -
FIG. 10 is a block diagram showing a machine learning device of the first embodiment. -
FIG. 11 is a flowchart explaining motion of the machine learning device. -
FIG. 12 is a flowchart explaining motion of an optimizing action information output unit of the machine learning device. -
FIG. 13 is a diagram showing a movement path of the table before parameter adjustment of a non-linear friction compensator by machine learning. -
FIG. 14 is a diagram showing a movement path of the table after the parameter adjustment of the non-linear friction compensator by the machine learning. - Embodiments of the present invention will be described in detail below with reference to drawings.
-
FIG. 1 is a block diagram showing a servo motor control system of a first embodiment of the present invention. As shown inFIG. 1 , a servomotor control system 10 includes n servo motor control devices 100-1 to 100-n, n machine learning devices 200-1 to 200-n, and anetwork 400. Note that n is an arbitrary natural number. - The servo motor control device 100-1 and the machine learning device 200-1 are considered to be a set of one-to-one, and are communicatively connected. The servo motor control devices 100-2 to 100-n and the machine learning devices 200-2 to 200-n are connected as similar to the servo motor control device 100-1 and the machine learning device 200-1. In
FIG. 1 , n sets of the servo motor control devices 100-1 to 100-n and the machine learning devices 200-1 to 200-n are connected via thenetwork 400. However, for the n sets of the servo motor control devices 100-1 to 100-n and the machine learning devices 200-1 to 200-n, the servo motor control devices and the machine learning devices in each of the sets may be directly connected via a connection interface. A plurality of n sets of these servo motor control devices 100-1 to 100-n, and the machine learning devices 200-1 to 200-n may be, for example, installed in the same factory, or may be installed in different factories. - The
network 400 is, for example, a local area network (LAN) constructed in a factory, the Internet, a public telephone network, a direct connection via a connection interface, or combination thereof. Particular communication method in thenetwork 400, which of wired connection and wireless connection is used, and the like, are not limited particularly. -
FIG. 2 is a block diagram showing a set of a servo motor control device of the servo motor control system and a machine learning device, and a control target, of the first embodiment of the present invention. The servomotor control device 100 and themachine learning device 200 inFIG. 2 correspond to, for example, the servo motor control device 100-1 and the machine learning device 200-1 shown inFIG. 1 . Acontrol target 300 is, for example, a machine tool, a robot, an industrial machine, or the like including the servo motor. The servomotor control device 100 may be provided as a part of a machine tool, a robot, an industrial machine, or the like. - First, the servo
motor control device 100 will be described. As shown inFIG. 2 , the servomotor control device 100 includes a positioncommand creation unit 101, asubtractor 102, aposition control unit 103, anadder 104, asubtractor 105, avelocity control unit 106, anadder 107, anintegrator 108, a positionfeedforward calculation unit 109, a velocityfeedforward calculation unit 110, and anon-linear friction compensator 111. - The position
command creation unit 101 creates a position command value for operating the servo motor included in thecontrol target 300, in accordance with a program input from a host control device, an external input device, or the like not shown, to output the created position command value to thesubtractor 102, and the positionfeedforward calculation unit 109. Thesubtractor 102 determines a difference between the position command value and a detected position obtained by position feedback, to output the difference to theposition control unit 103, as position error, and transmit the difference to themachine learning device 200. The positioncommand creation unit 101 creates the position command value on the basis of a program that operates the servo motor included in thecontrol target 300. Thecontrol target 300 is, for example, a machine tool including the servo motor. When the machine tool moves a table mounted with a workpiece in an X axis direction and a Y axis direction, and machines the workpiece, the servo motor control device shown inFIG. 2 is provided with respect to each of the X axis direction and the Y axis direction. When the machine tool moves the table in directions of three or more axes, the servo motor control device shown inFIG. 2 is provided with respect to each of the axis directions. The positioncommand creation unit 101 creates the position command value by changing a pulse frequency so that a geometry specified by the program is obtained, and a speed of the servo motor is changed. - For example, the
position control unit 103 outputs to the adder 104 a value obtained by multiplying a position gain Kp that is set in advance to, the position error, as a velocity command value. The positionfeedforward calculation unit 109 outputs to theadder 104, the velocityfeedforward calculation unit 110, and thenon-linear friction compensator 111, a value obtained by differentiating the position command value and multiplying a feedforward coefficient. - The
adder 104 adds the velocity command value, and an output value of the positionfeedforward calculation unit 109, to output to thesubtractor 105, the obtained value as a feedforward controlled velocity command value. Thesubtractor 105 determines difference between an output of theadder 104, and the velocity detection value obtained by velocity feedback, to output the difference as velocity error, to thevelocity control unit 106. - For example, the
velocity control unit 106 adds a value obtained by multiplying an integral gain K1 v that is set in advance, to the velocity error and integrating, with a value obtained by multiplying a proportional gain K2 v that is set in advance, to the velocity error, to output an obtained value as a torque command value, to theadder 107. - The velocity
feedforward calculation unit 110 performs velocity feedforward calculation processing represented by a transfer function Gf(S) represented by, for example, formula 1 (shown as formula 1 below), on the basis of the output value of the positionfeedforward calculation unit 109 to output a calculation result to theadder 107 as a first torque compensation value. Coefficients ai, bj of the velocityfeedforward calculation unit 110 are constants that are set in advance so that 0≤i≤m for ai and 0≤j≤n for bj are satisfied. The dimensions m, n are natural numbers which are set in advance. -
- The
non-linear friction compensator 111 determines a non-linear friction compensation value for compensating non-linear friction generated in thecontrol target 300 on the basis of an output value of the positionfeedforward calculation unit 109, and outputs the non-linear friction compensation value to theadder 107 as a second torque compensation value. The non-linear friction is, for example, mainly generated by a ball screw, or the like of a machine tool that is other than the servo motor, when thecontrol target 300 is the machine tool including the servo motor. However, the non-linear friction is generated also in the servo motor. A non-linear friction compensation value f(ω) is shown by, for example, formula 2 (shown as formula 2 below), and can be determined by using a motor speed ω.FIG. 3 is a characteristic diagram showing a relationship between a non-linear friction compensation value f(ω) and a motor speed ω. -
- As described later, optimal values of combination of compensation coefficients c, d in formula 2 are determined by using the
machine learning device 200. - The
adder 107 adds the torque command value, the output value of the velocityfeedforward calculation unit 110, and the output value of thenon-linear friction compensator 111, to output an obtained value to a servo motor of thecontrol target 300 as a feedforward controlled torque command value. - The
control target 300 outputs a velocity detection value, and the velocity detection value is input to thesubtractor 105, as the velocity feedback. The velocity detection value is integrated by theintegrator 108 to be a position detection value. The position detection value is input to thesubtractor 102, as position feedback. The servomotor control device 100 is configured as described above. - Next, the
control target 300 that is controlled by the servomotor control device 100, will be described.FIG. 4 is a block diagram showing part of a machine tool including the servo motor, as an example of thecontrol target 300. The servomotor control device 100 moves a table 304 via acoupling mechanism 303 in aservo motor 302. The machine tool rotates a spindle attached with a tool while moving the table 304, to machine a workpiece mounted on the table 304. Thecoupling mechanism 303 has acoupling 3031 coupled to theservo motor 302, and aball screw 3033 fixed to thecoupling 3031. Anut 3032 is screwed into theball screw 3033. Thenut 3032 screwed into theball screw 3033 is moved in an axis direction of theball screw 3033, by rotation drive of theservo motor 302. The non-linear friction is generated in thecoupling mechanism 303 including thecoupling 3031 and theball screw 3033, thenut 3032, and the like. However, the non-linear friction is generated also in theservo motor 302. - A rotation angle position of the
servo motor 302 is detected by arotary encoder 301 that is a position detection unit, associated with theservo motor 302. A detected signal is utilized as the velocity feedback. The detected signal is integrated by theintegrator 108 to be utilized as the position feedback. The machine tool may include alinear scale 305 that detects a moving distance of theball screw 3033, in an end portion of theball screw 3033. An output of thelinear scale 305 can be used as the position feedback. - <
Machine Learning Device 200> - The
machine learning device 200 performs a preset evaluation program (hereinafter, also referred to as an “evaluation program”) to learn the compensation coefficients of thenon-linear friction compensator 111. When the machine tool machines by moving the table in the X axis direction and the Y axis direction by the servo motor, a geometry specified by the evaluation program may be, for example, a circle, a square, or a star-shape when the inversion motion of the servo motor is evaluated.FIG. 5 is a diagram for explaining motion of the servo motor when the geometry is a circle.FIG. 6 is a diagram for explaining motion of the servo motor when the geometry is a square. InFIG. 5 andFIG. 6 , the table moves so that the workpiece is machined in a clockwise direction. - When the inversion motion of the servo motor is evaluated, for example, as shown in
FIG. 7 orFIG. 8 , the servomotor control device 100 controls theservo motor 302 so that the table included in thecontrol target 300 moves in a sine wave shape or triangular-wave shape in at least one direction of the X axis direction and the Y axis direction. The evaluation program controls the frequency of the pulse output from the positioncommand creation unit 101 of the servomotor control device 100. This frequency controlling, a feed rate of the X axis direction or the Y axis direction of the table is controlled. When the frequency of the pulse output from the positioncommand creation unit 101 becomes high, rotation speed of the motor increases, and the feed rate increases. On the other hand, when the frequency of the pulse becomes low, the rotation speed of the pulse decreases, and the feed rate decreases. When the rotation direction of theservo motor 302 inverts, the movement direction of the table inverts in the X axis direction or the Y axis direction. - When the geometry is a circle shown in
FIG. 5 , the servomotor control device 100 controls the servo motor of the X axis direction and the Y axis direction so that the table moves in a sine wave shape in the X axis direction as shown inFIG. 7 , and moves in a cosine-wave shape in the Y axis direction. At a position A1 shown inFIG. 5 , the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction. On the other hand, at the position A1, the servo motor that moves the table in the X axis direction rotates at the same speed as speeds of before and after the position A1, and the table moves at the same speed as the speeds of before and after the position A1 in the X axis direction. The position A1 in which the table inverts in the Y axis direction corresponds to an inversion position of a positive direction shown inFIG. 7 . On the other hand, the table that moves at a constant speed in the X axis direction, moves so that a phase of a wave form (sine wave shape) becomes a wave form (cosine wave) that is delayed or advances by 90 degrees. The position A1 in which the table moves at a constant speed in the X axis direction corresponds to an intermediate position between an inversion position of the positive direction and an inversion position of a negative direction, shown inFIG. 7 . At a position A2 shown inFIG. 5 , the servomotor control device 100 controls each servo motor so that the motion of the servo motor that moves the table in the X axis direction, and the motion of the servo motor that moves the table in the Y axis direction are opposite from the position A1. That is, at the position A2, the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction. On the other hand, at the position A2, the servo motor that moves the table in the Y axis direction rotates at the same speed as speeds of before and after the position A2, and the table moves at the same speed as speeds of before and after the position A2 in the Y axis direction. - When the geometry is a square shown in
FIG. 6 , the servomotor control device 100 controls the servo motor of the X axis direction and the Y axis direction so that the table moves in a triangular-wave shape in the X axis direction as shown inFIG. 8 , and a phase of a triangular wave shown in FIG. 8 moves in a triangular-wave shape that is delayed or advances by 90 degrees, in the Y axis direction. When the table moves in the triangular-wave shape, the rotation speed of the servo motor from a positive inversion position to a negative inversion position, and from the negative inversion position to the positive inversion position, is constant. At a position B1 shown inFIG. 6 , the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction. On the other hand, at the position B1, the servo motor that moves the table in the Y axis direction rotates at a constant speed, and the table moves in the Y axis direction at the constant speed. The position B1 in which the table inverts in the X axis direction corresponds to an inversion position of the negative direction shown inFIG. 8 . On the other hand, the table that moves in the Y axis direction at a constant speed, moves so that the phase of the triangular wave shown inFIG. 8 moves in a triangular-wave shape that is delayed or advances by 90 degrees. The position B1 in which the table moves at a constant speed in the Y axis direction corresponds to an intermediate position between the inversion position of the positive direction and the inversion position of the negative direction, shown inFIG. 8 . At a position B2 shown inFIG. 6 , the servomotor control device 100 controls each servo motor so that the motion of the servo motor that moves the table in the X axis direction, and the motion of the servo motor that moves the table in the Y axis direction are opposite from the position B1. That is, at the position B2, the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction. On the other hand, at the position B2, the servo motor that moves the table in the X axis direction rotates at a constant speed, and the table moves in the X axis direction at a constant speed. - When the evaluation program is executed, the position
command creation unit 101 of the servomotor control device 100 outputs the position command value so that the geometry is a circle, or a square, sequentially. The positioncommand creation unit 101 changes the feed rate for every geometry that is a circle or a square, so that themachine learning device 200 can learn the influence on a plurality of feed rates. The positioncommand creation unit 101 may change the feed rate in a middle of moving of a shape of the geometry, for example, when the table passes a corner in moving the table into a square geometry. Thereby, when the table moves in a sine-wave shape or a triangular-wave shape in the X axis direction or the Y axis direction, respectively, themachine learning device 200 can make a frequency high, or learn a pattern of gradually increasing a frequency. - When the geometry is a circle or a square, when the rotation direction of one servo motor of the servo motors that move the table in the X axis direction and the Y axis direction rotates, the other servo motor rotates at a constant speed. However, the geometry specified by the evaluation program may be a geometry, for example, a star shape as shown in
FIG. 9 , with which both the rotation directions of two servo motors that move the table in the X axis direction and the Y axis direction invert. Themachine learning device 200 may learn combination of the compensation coefficients c, d of thenon-linear friction compensator 111 in such geometry. -
FIG. 9 is a diagram for explaining motion of the servo motor of when the geometry is a star shape. When the geometry is a star shape as shown inFIG. 9 , the servomotor control device 100 controls the servo motors of the X axis direction and the Y axis direction so that the table moves in a triangular-wave shape in the X axis direction and the Y axis direction, in a projection portion of four “<” shapes of the star shape. At a vertex of the projection portion of the four “<” shapes of the star shape, for example, a position C1 shown inFIG. 9 , the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction. - Similarly, at the position C1, the rotation direction of the servo motor that moves the table in the Y axis direction inverts, and the table moves so as to linearly invert in the Y axis direction. Accordingly, the inversion motion of the servo motor of when both the rotation directions of the two servo motors that move the table in the X axis direction and the Y axis direction invert is evaluated.
- In the geometry of the star shape shown in
FIG. 9 , in a vertex of four concaved portions, as similar to when the geometry is a square shown inFIG. 6 , the motion of when the rotation direction of the servo motor that moves in one direction of the X axis direction and the Y axis direction inverts, and the servo motor that moves in the other direction rotates at a constant speed, is evaluated. For example, at the position C2 shown inFIG. 9 , the rotation direction of the servo motor that moves the table in the X axis direction inverts, and the table moves so as to linearly invert in the X axis direction. However, at the position C2, the servo motor that moves the table in the Y axis direction rotates at a constant speed, and the table moves in the Y axis direction at a constant speed. - Next, a configuration of the
machine learning device 200 will be described. Themachine learning device 200 learns combination of the compensation coefficients c, d of thenon-linear friction compensator 111 for reducing position error of when thecontrol target 300 is driven on the basis of the evaluation program. - Before each function block included in the
machine learning device 200 is described, first, the basic mechanism of the reinforcement learning will be described. An agent (corresponding to themachine learning device 200 in the present embodiment) observes an environmental state, and selects one action. Then, the environment changes on the basis of the action. The agent calculates some rewards according to the environmental change, to learn selection (decision) of better action. While learning with a teacher presents a complete correct, the reward in the reinforcement learning often presents a fragmental value based on change of part of the environment. Thus, the agent learns to select an action so that the total reward in the future is the maximum. - In this way, in the reinforcement learning, by learning an action, the
machine learning device 200 learns a suitable action in consideration of the mutual effect of the action with the environment, that is, an action for maximizing the reward to be obtained in the future. This represents that, in the present embodiment, themachine learning device 200 gains an action that affects the future, for example, selecting action information for reducing position error. - As the reinforcement learning, an arbitrary learning method is used. In the description below, a case where Q-learning is used will be described as an example. The Q-learning is a method of learning a value function Q (S, A) of selecting an action A, under an environmental state S. An object of the Q-learning is to select the action A having the highest value function Q (S, A) as a suitable action, from among actions A that can be taken, in a state S.
- However, at the time when the Q-learning is performed for the first time, for combination of the state S and the action A, the correct value of the value function Q (S, A) is not identified at all. Thus, the agent selects various actions A under a state S, and with respect to the action A at that time, selects a better action on the basis of the given reward, to learn the correct value Q (S, A).
- The Q-learning tries to finally obtain Q (S, A)=E[Σ(γt)rt], in order to maximize the total reward that can be obtained in the future. E[ ] represents an expected value, t represents time, γ represents a parameter called a discount rate described later, rt is a reward at the time t, and Σ represents the total by the time t. The expected value in this formula is an expected value in a case where the state is changed according to the suitable action. However, the suitable action is not clear in a process of the Q-learning. Thus, the agent takes various actions to perform the reinforcement learning while searching. An updating formula of such value function Q (S, A) can be represented by, for example, the following formula 3 (shown as formula 3 below).
-
- In the formula 3 described above, St represents an environmental state at the time t, and At represents an action at the time t. The state is changed to St+1 by the action At. rt+1 represents reward obtained by the state change. An item added with max is obtained by multiplying γ to the Q value of when the action A having the highest Q value that has been identified at that time, is selected, under the state St+1. The γ is a parameter of 0<γ≤1, and is called a discount rate. α is a learning coefficient, and is in a range of 0<α≤1.
- The formula 3 described above represents a method of updating the value function Q (St, At) of the action At in the state St, on the basis of the reward rt+1 sent back as a result of the action At. This updating formula represents that the Q (St, At) is set to be large when a value maxa Q(St+1, A) of the best action in the next state St+1 by the action At is larger than the value Q (St, At) of the action At in the state St, while, the Q (St, At) is set to be small when the value maxa Q(St+1, A) of the best action in the next state St+1 by the action At is smaller. That is, the updating formula represents that a value in an action in a state is approximated to a value of the best action in the next state by the action. Difference between them changes depending on the discount rate γ and the reward rt+1. However, basically, mechanism is such that a value of the best action in a state is propagated to a value of an action in a state that is one before that state.
- In the Q-learning, there is a method of learning by creating a table of the value function Q (S, A) for every state action pair (S, A). However, when the values of the value function Q (S, A) of all state action pairs are determined, the number of states is too large, and there is a case where much time is required for settling the Q-learning.
- Thus, the Q-learning may utilize a known technique called a Deep Q-Network (DQN). Particularly, the Q-learning may configure a value function Q by using an appropriate neural network, and adjust a parameter of the neural network, to approximate the value function Q by the appropriate neural network, to calculate the value of the value function Q (S, A) By utilizing the DQN, the time required for settling the Q-learning can be shorten. The DQN is described in detail, for example, in Non-Patent Document below.
- “Human-level control through deep reinforcement learning”, Volodymyr Mnihl [online], [searched on Jan. 17, 2017], Internet <URL: http://files.davidqiu.com/research/nature14236.pdf>
- The Q-learning described above is performed by the
machine learning device 200. Particularly, themachine learning device 200 learns the value function Q of selecting adjustment of the compensation coefficients c, d of thenon-linear friction compensator 111 related to a state S, as the action A by setting values of the compensation coefficients c, d of thenon-linear friction compensator 111 in the servomotor control device 100, and a servo state such as a command and feedback to be the state S. The servo state includes position error information of the servomotor control device 100 acquired by executing the evaluation program. - The
machine learning device 200 observes the state information S including a servo state such as a command and feedback, including the position error information of the servomotor control device 100 acquired by executing the evaluation program on the basis of the compensation coefficients c, d of thenon-linear friction compensator 111, to determine the action A. Themachine learning device 200 gives a reward every time the action A is performed. Themachine learning device 200, for example, searches an optimal action A so that the total reward in the future is the maximum, through trial and error. By that, themachine learning device 200 can select the optimal action A (that is, the optimal compensation coefficients c, d of the non-linear friction compensator 111) with respect to the state S including the servo state such as a command and feedback, including the position error information of the servomotor control device 100 obtained by executing the evaluation program on the basis of the compensation coefficients c, d of thenon-linear friction compensator 111. - That is, the
machine learning device 200 selects the action A with which the value of the value function Q learned by themachine learning device 200 is the maximum, from among the actions A applied to the compensation coefficients c, d of thenon-linear friction compensator 111 related to a state S, to select the action A with which the position error obtained by executing the evaluation program is the minimum (that is, the combination of the compensation coefficients c, d of the non-linear friction compensator 111). -
FIG. 10 is a block diagram showing themachine learning device 200 of the first embodiment of the present invention. In order to perform the reinforcement learning described above, as shown inFIG. 10 , themachine learning device 200 includes the stateinformation acquisition unit 201, alearning unit 202, the actioninformation output unit 203, a valuefunction storage unit 204, and an optimizing actioninformation output unit 205. - The state
information acquisition unit 201 acquires a state S including the servo state such as the command and the feedback, including the position error information of the servomotor control device 100 acquired by executing the evaluation program, from the servomotor control device 100 on the basis of the compensation coefficients c, d of thenon-linear friction compensator 111 in theservo control device 100. This state information S corresponds to the environmental state S in the Q-learning. The stateinformation acquisition unit 201 outputs the acquired state information S to thelearning unit 202. A user creates in advance, the compensation coefficients c, d of thenon-linear friction compensator 111 at the time when the Q-learning starts for the first time. In the present embodiment, initial set values of the compensation coefficients c, d of thenon-linear friction compensator 111, created by the user is adjusted to be more optimal value by the reinforcement learning. - The
learning unit 202 is a unit that learns the value function Q (S, A) of when an action A is selected under an environmental state S. Particularly, thelearning unit 202 includes thereward output unit 2021, the valuefunction updating unit 2022, and the actioninformation generation unit 2023. - The
reward output unit 2021 is a unit that calculates the reward of when the action A is selected under a state S. A set of position error (position error set) that is a state variable number in the state S is represented by PD(S), and a position error set that is a state variable number related to state information S′ that has changed from the state S due to action information A (compensation of the compensation coefficients c, d of the non-linear friction compensator 111) is represented by PD(S′). A value of the position error in the state S is a value calculated on the basis of an evaluation function f (PD(S)) that is set in advance. As the evaluation function f, for example, - a function of calculating an integrated value of an absolute value of the position error
∫|e|dt,
a function of calculating an integrated value by weighting the absolute value of the position error, with time
∫t|e|dt,
a function of calculating an integrated value of 2n-th (n is a natural number) power of the absolute value of the position error
∫e2ndt (n is a natural number), or
a function of calculating the maximum value of the absolute value of the position error - may be applied. Note that the evaluation function is not limited thereto. It is sufficient that the evaluation function is a function of appropriately evaluating the position error value in the state S on the basis of the position error set PD(S).
- When a value f(PD(S′)) of the position error of the servo
motor control device 100 that has operated by using thenon-linear friction compensator 111 of after the correction related to the state information S′ corrected by the action information A, is larger than the value f(PD(S)) of the position error value f of the servomotor control device 100 that has operated by using thenon-linear friction compensator 111 of before the correction related to the state information S before the correction by the action information A, thereward output unit 2021 sets a reward value to be a negative value. - On the other hand, when the value f(PD(S′)) of the position error of the servo
motor control device 100 that has operated by using thenon-linear friction compensator 111 of after the correction related to the state information S′ corrected by the action information A, is smaller than the value f(PD(S)) of the position error of the servomotor control device 100 that has operated by using thenon-linear friction compensator 111 of before the correction related to the state information S before the correction by the action information A, thereward output unit 2021 sets the reward value to be a positive value. - When the value f(PD(S′)) of the position error of the servo
motor control device 100 that has operated by using thenon-linear friction compensator 111 of after the correction related to the state information S′ corrected by the action information A, is equal to a value f(PD(S)) of the position error of the servomotor control device 100 that has operated by using thenon-linear friction compensator 111 of before the correction related to the state information S before the correction by the action information A, thereward output unit 2021 sets the reward value to be zero, for example. - The negative value of when the value f(PD(S′)) of the position error of the state S′ after performing of the action A, is larger than the value f(PD(S)) of the position error in the prior state S, may be larger according to a ratio. That is, the negative value may be larger according to the degree of increasing of the value of the position error. On the contrary, the positive value of when the value f(PD(S′)) of the position error of the state S′ of after performing of the action A is smaller than the value f(PD(S)) of the position error in the prior state S, may be larger according to a ratio. That is, the positive value may be larger according to the degree of decreasing of the value of the position error.
- The value
function updating unit 2022 performs Q-learning on the basis of the state S, the action A, the state S′ of when the action A is applied to the state S, and the reward value calculated as described above, to update a value function Q that the valuefunction storage unit 204 stores. Updating of the value function Q may be performed by online learning, batch learning, or mini-batch learning. - The online learning is a learning method of applying an action A to the current state S to update the value function Q immediately every time when the state S makes a transition to a new state S′. The batch learning is a learning method of applying an action A to the current state S to repeat the transition of the state S to the new state S′ to collect learning data and perform updating of the value function Q by using all the collected learning data. Further, the mini-batch learning is an intermediate learning method between the online learning and the batch learning, and is a learning method of performing updating of the value function Q every time when certain pieces of learning data are accumulated.
- The action
information generation unit 2023 selects the action A in a process of the Q-learning, with respect to the current state S. The actioninformation generation unit 2023 generates the action information A in order to cause operation (corresponding to the action A in the Q-learning) of correcting each of the compensation coefficients c, d of thenon-linear friction compensator 111 of the servomotor control device 100 in the process of the Q-learning to be performed, to output the generated action information A to the actioninformation output unit 203. More particularly, the actioninformation generation unit 2023, for example, causes incremental adding or subtracting of each of the compensation coefficients c, d of thenon-linear friction compensator 111 included in the action A with respect to each of the compensation coefficients of thenon-linear friction compensator 111 included in the state S. - When increasing or decreasing of each of the compensation coefficients c, d of the
non-linear friction compensator 111 is applied, a transition is made to the state S′, and a positive reward (reward of a positive value) is given, the actioninformation generation unit 2023 may take, as the next action A′, a measure of selecting the action A′ such that the value of the position error becomes smaller, such as incremental adding or subtracting as similar to the previous action, with respect to each of the compensation coefficients c, d of thenon-linear friction compensator 111. - On the contrary, when a negative reward (reward of a negative value) is given, the action
information generation unit 2023 may take, as the next action A′, for example, a measure of selecting the action A′ such that the position error is smaller than the previous value, such as incremental subtracting or adding on the contrary to the previous action, with respect to each of the compensation coefficients c, d of thenon-linear friction compensator 111. - The action
information generation unit 2023 may take a measure of selecting the action A′ by a known method such as the greedy method of selecting the action A′ having the highest value function Q (S, A) from among values of the action A currently estimated, or the ε greedy method of randomly selecting the action A′ with a small probability ε, and other than that, selecting the action A′ having the highest value function Q (S, A). - The action
information output unit 203 is a unit that transmits the action information A output from thelearning unit 202 to the servomotor control device 100. As described above, the servomotor control device 100 slightly corrects the current state S, that is, each of the compensation coefficients c, d of thenon-linear friction compensator 111 that are currently set on the basis of the action information, to make a transition to the next state S′ (that is, the corrected compensation coefficients of the non-linear friction compensator 111). - The value
function storage unit 204 is a storage device that stores the value function Q. The value function Q may be stored in a table (hereinafter, referred to as an action value table) for example, for every state S and every action A. The value function Q stored in the valuefunction storage unit 204 is updated by the valuefunction updating unit 2022. The value function Q stored in the valuefunction storage unit 204 may be shared with the othermachine learning devices 200. When the value function Q is shared among a plurality ofmachine learning devices 200, distributed reinforcement learning can be performed by themachine learning devices 200. Thus, efficiency of the reinforcement learning can be improved. - The optimizing action
information output unit 205 creates the action information A (hereinafter, referred to as “optimizing action information”) for causing thenon-linear friction compensator 111 to perform operation with which the value function Q (S, A) is the maximum, on the basis of the value function Q updated by performing the Q-learning by the valuefunction updating unit 2022. More particularly, the optimizing actioninformation output unit 205 acquires the value function Q stored in the valuefunction storage unit 204. This value function Q is updated by performing the Q-learning by the valuefunction updating unit 2022 as described above. - Then, the optimizing action
information output unit 205 creates the action information on the basis of the value function Q to output the created action information to the servo motor control device 100 (the non-linear friction compensator 111). This optimizing action information includes information of correcting each of the compensation coefficients c, d of thenon-linear friction compensator 111, as similar to the action information output in the process of the Q-learning by the actioninformation output unit 203. - In the servo
motor control device 100, each of the compensation coefficients c, d of thenon-linear friction compensator 111 are corrected on the basis of this action information. Accordingly, the servomotor control device 100 can operate to reduce the value of the position error. As described above, by utilizing themachine learning device 200 according to the present invention, the parameter adjustment of thenon-linear friction compensator 111 of the servomotor control device 100 is simplified. - The function blocks included in the servo
motor control device 100, and themachine learning device 200 have been described above. In order to realize these function blocks, each of the servomotor control device 100 and themachine learning device 200 include an operation processing device such as a central processing unit (CPU). Each of the servomotor control device 100 and themachine learning device 200 also include a sub storage device such as a hard disk drive (HDD) stored with various control programs such as application software and an operating system (OS), and a main storage device such as a random access memory (RAM) for storing data temporarily required for execution of the program by the operation processing device. - In each of the servo
motor control device 100 and themachine learning device 200, while reading the application software and the OS from the sub storage device, and decompressing the read application software and OS into the main storage device, the operation processing device performs operation processing based on these application software and OS. On the basis of this operation result, various hardware included in each device is controlled. Thereby, the function blocks of the present embodiment are realized. That is, the present embodiment can be realized by cooperation of the hardware and the software. - The
machine learning device 200 performs a large amount of operation associated with the machine learning. Thus, it is desirable that, for example, a personal computer is mounted with graphics processing units (GPUs), and the GPUs are utilized for the operation processing associated with the machine learning by a technique called general-purpose computing on graphics processing units (GPGPU). Themachine learning device 200 can perform high speed processing by utilizing the GPU. Further, in order to perform higher speed processing, a plurality of such computers mounted with the GPU may be used to construct a computer cluster, so that themachine learning device 200 performs parallel processing by the plurality of computers included in the computer cluster. - Next, operation of the
machine learning device 200 at the time of Q-learning in the present embodiment will be described with reference to a flowchart ofFIG. 11 . A case where the geometry is a circle is described here. However, the geometry may be a square, and themachine learning device 200 may learn sequentially cases where the geometry is a circle, a square, and the like. - In step S11, the state
information acquisition unit 201 acquires the state information S from the servomotor control device 100. The acquired state information is output to the valuefunction updating unit 2022 and the actioninformation generation unit 2023. As described above, this state information S is information corresponding to a state in the Q-learning, and includes each of the compensation coefficients c, d of thenon-linear friction compensator 111 of at the time of step S11. In this way, a set PD(S) of the position error corresponding to the geometry that is a circle of when the compensation coefficients are initial values, is obtained from thenon-linear friction compensator 111. - As described above, the coefficients c, d of the
non-linear friction compensator 111 in the initial state S0 are, initially set by a user. - The value PD(S0) of the position error in the state S0 of when the Q-learning starts for the first time, is obtained from the
subtractor 102 by operating the servomotor control device 100 by the evaluation program. The positioncommand creation unit 101 sequentially outputs the position command in a circle geometry specified by the evaluation program. The position command value corresponding to the geometry that is a circle is output from the positioncommand creation unit 101, and thesubtractor 102 outputs a difference between the position command value, and a detection position output from theintegrator 108, to themachine learning device 200, as the position error PD(S0). - In step S12, the action
information generation unit 2023 generates new action information A, to output the generated new action information A to the servomotor control device 100 via the actioninformation output unit 203. The actioninformation generation unit 2023 outputs the new action information A on the basis of the measure described above. The servomotor control device 100 that has received the action information A drives the machining tool including the servo motor, by the state S′ in which each of the compensation coefficients c, d of thenon-linear friction compensator 111 related to the current state S are corrected on the basis of the received action information. As described above, this action information corresponds to the action A in the Q-learning. - In step S13, the state
information acquisition unit 201 acquires the position error PD(S′) in the new state S′ from thesubtractor 102, and the compensation coefficients c, d from thenon-linear friction compensator 111. In this way, the stateinformation acquisition unit 201 acquires the set PD(S′) of the position error corresponding to the geometry that is a circle, of when compensation coefficients are the compensation coefficients c, d in the state S′, from thenon-linear friction compensator 111. The acquired state information is output to thereward output unit 2021. - In step S14, the
reward output unit 2021 determines size relationship between the value f(PD(S′)) of the position error in the state S′, and the value f(PD(S)) of the position error in the state S. When it is f(PD(S′))>f(PD(S)), thereward output unit 2021 sets the reward to be a negative value in step S15. When it is f(PD(S′))<f(PD(S)), thereward output unit 2021 sets the reward to be a positive value, in step S16. When it is f(PD(S′))=f(PD(S)), thereward output unit 2021 sets the reward to be zero in step S17. Thereward output unit 2021 may perform weighting with respect to the negative value and the positive value of the reward. - When any of step S15, step S16, and step S17 ends, the value
function updating unit 2022 updates the value function Q stored in the valuefunction storage unit 204 on the basis of the reward value calculated in any of these steps, in step S18. Then, processing returns to step S11 again, and processing described above is repeated. Thereby, the value function Q settles to a suitable value. The processing described above may end with a condition of being repeated for a predetermined number of times, or being repeated for predetermined time. Although online updating is exemplified for step S18, batch updating or mini batch updating may be performed instead of the online updating. - As described above, by the operation described with reference to
FIG. 11 , the present embodiment exhibits an effect capable of obtaining a suitable value function for adjustment of the compensation coefficients c, d of thenon-linear friction compensator 111 by utilizing themachine learning device 200, to simplify optimization of the compensation coefficients c, d of the feedforward. Next, operation at the time of generation of optimizing action information by the optimizing actioninformation output unit 205 will be described with reference to a flowchart ofFIG. 12 . First, in step S21, the optimizing actioninformation output unit 205 obtains the value function Q stored in the valuefunction storage unit 204. The value function Q has been updated by performing of the Q-learning by the valuefunction updating unit 2022 as described above. - In step S22, the optimizing action
information output unit 205 generates the optimizing action information on the basis of the value function Q, to output the generated optimizing action information to thenon-linear friction compensator 111 of the servomotor control device 100. - By operation described with reference to
FIG. 12 , in the present embodiment, the optimizing action information is generated on the basis of the value function Q determined by learning by themachine learning device 200, and adjustment of the compensation coefficient c, d of thenon-linear friction compensator 111 currently set by the servomotor control device 100 can be simplified, on the basis of the optimizing action information, and the value of the position error can be reduced. - Effects by the machine learning device of the present embodiment will be described below with reference to
FIG. 13 andFIG. 14 .FIG. 13 is a diagram showing a movement path of the table before parameter adjustment of a non-linear friction compensator by the machine learning.FIG. 14 is a diagram showing a movement path of the table after the parameter adjustment of the non-linear friction compensator by the machine learning. BothFIG. 13 andFIG. 14 show a movement path of when the table included in the control target moves in a sine-wave shape in the X axis direction or the Y axis direction as shown inFIG. 5 . As shown inFIG. 13 , before the parameter of thenon-linear friction compensator 111 is adjusted by the machine learning, when the rotation direction of the servo motor attempts to invert, inversion delay occurs as the movement path shown by a solid line, with respect to the movement path of the table of the sine-wave shape shown by a dotted line. This inversion delay occurs due to non-linear friction, and the table does not instantly invert and move. As shown inFIG. 14 , after the parameter of thenon-linear friction compensator 111 is adjusted by the machine learning, the inversion delay due to the non-linear friction is removed, and the table moves in the movement path of the sine-wave shape as shown by the solid line. Arrows inFIG. 13 andFIG. 14 show inversion positions. - As shown in
FIG. 6 , even when the table included in the control target moves in a triangular-wave shape in the X axis direction or the Y axis direction, before the parameter of thenon-linear friction compensator 111 is adjusted by the machine learning, as similar to when the table moves in the sine-wave shape, the inversion delay due to the non-linear friction occurs when the rotation direction of the servo motor attempts to invert. That is, the inversion delay due to the non-linear friction occurs with respect to the movement path of the table of the triangular wave shape, that is set in the positioncommand creation unit 101, and the table does not instantly invert and move. However, after the parameter of thenon-linear friction compensator 111 is adjusted by the machine learning, the inversion delay due to the non-linear friction is removed, and the table moves in the movement path of the triangular-wave shape. - The servo motor control unit of the servo motor control device described above, and each of components included in the machine learning device may be realized by hardware, software or combination thereof. The servo motor control method performed by cooperation of each of the components included in the servo motor control device described above, also may be realized by hardware, software, or combination thereof. Being realized by software means being realized by reading and executing a program by a computer.
- The program may be stored by using various types of non-transitory computer readable media, and supplied to the computer. The non-transitory computer readable media include various types of tangible storage media. Examples of the non-transitory computer readable media include a magnetic recording medium (for example, a hard disk drive), a magneto-optical recording medium (for example, a magneto-optical disk), a CD-ROM (read only memory), a CD-R, a CD-R/W, a semiconductor memory (for example, a mask ROM, a programmable ROM (PROM), an erasable PROM (EPROM), a flash ROM, and a random access memory (RAM).
- Although the embodiment described above is a preferable embodiment of the present invention, the scope of the present invention is not limited thereto. The present invention may be performed in an embodiment in which various modifications are performed without departing from the scope of the present invention.
- <When Velocity
Feedforward Calculation Unit 110 is Set to be Option> - In the embodiments described above, the servo
motor control device 100 is described as including the velocityfeedforward calculation unit 110, and having a configuration in which thenon-linear friction compensator 111 is connected in parallel to the velocityfeedforward calculation unit 110, but not limited to this. The velocityfeedforward calculation unit 110 may be an option, and the servomotor control device 100 may not include the velocityfeedforward calculation unit 110. - <Modification in which Servo Motor Control Device Includes Machine Learning Device>
- In the embodiments described above, the
machine learning device 200 is a device that is different from the servomotor control device 100. Part or all of the function of themachine learning device 200 may be realized by the servomotor control device 100. That is, the servomotor control device 100 may include themachine learning device 200. - <Degree of Freedom of System Configuration>
- In the embodiment described above, the
machine learning device 200 and the servomotor control device 100 are communicatively connected as a set of one-to-one. However, for example, onemachine learning device 200 is communicatively connected with a plurality of servomotor control devices 100 via thenetwork 400 to perform machine learning of each of the servomotor control devices 100. At that time, respective functions of themachine learning device 200 may be realized by a distributed processing system in which the functions are distributed in a plurality of servers, as appropriate. The functions of themachine learning device 200 may be realized by utilizing a virtual server function, or the like, in a cloud. When there are a plurality of machine learning devices 200-1 to 200-n corresponding to a plurality of servo motor control devices 100-1 to 100-n, respectively, of the same type name, the same specification, or the same series, the servo motor control system may be configured to share learning results in the machine learning devices 200-1 to 200-n. Thereby, more optimal model can be constructed. -
- 10 Servo motor control system
- 100 Servo motor control device
- 101 Position command creation unit
- 102 Subtractor
- 103 Position control unit
- 104 Adder
- 105 Subtractor
- 106 Velocity control unit
- 107 Adder
- 108 Integrator
- 109 Position feedforward calculation unit
- 110 Velocity feedforward calculation unit
- 111 Non-linear friction compensator
- 200 Machine learning device
- 201 State information acquisition unit
- 202 Learning unit
- 203 Action information output unit
- 204 Value function storage unit
- 205 Optimizing action information output unit
- 300 Control target
- 400 Network
Claims (9)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017-138949 | 2017-07-18 | ||
JP2017138949A JP6538766B2 (en) | 2017-07-18 | 2017-07-18 | Machine learning apparatus, servo motor control apparatus, servo motor control system, and machine learning method |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190028043A1 true US20190028043A1 (en) | 2019-01-24 |
US10418921B2 US10418921B2 (en) | 2019-09-17 |
Family
ID=64951961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/021,447 Active US10418921B2 (en) | 2017-07-18 | 2018-06-28 | Machine learning device, servo motor control device, servo motor control system, and machine learning method |
Country Status (4)
Country | Link |
---|---|
US (1) | US10418921B2 (en) |
JP (1) | JP6538766B2 (en) |
CN (1) | CN109274314B (en) |
DE (1) | DE102018211148A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180231943A1 (en) * | 2015-11-30 | 2018-08-16 | Omron Corporation | Control device |
US10444733B2 (en) * | 2017-04-07 | 2019-10-15 | Fanuc Corporation | Adjustment device and adjustment method |
US10691091B2 (en) * | 2017-09-15 | 2020-06-23 | Fanuc Corporation | Controller and machine learning device |
US20200372413A1 (en) * | 2018-03-15 | 2020-11-26 | Omron Corporation | Learning device, learning method, and program therefor |
TWI715425B (en) * | 2018-10-12 | 2021-01-01 | 日商三菱電機股份有限公司 | Positioning control device and positioning method |
US11023827B2 (en) * | 2018-03-19 | 2021-06-01 | Fanuc Corporation | Machine learning device, servo control device, servo control system, and machine learning method for suppressing variation in position error using feedforward control |
CN113325804A (en) * | 2021-06-08 | 2021-08-31 | 中国科学院数学与系统科学研究院 | Q learning extended state observer design method of motion control system |
US11654556B2 (en) * | 2019-01-16 | 2023-05-23 | Fanuc Corporation | Determination apparatus for determining an operation of an industrial robot |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6982557B2 (en) * | 2018-08-31 | 2021-12-17 | 株式会社日立製作所 | Reward function generation method and computer system |
JP6978452B2 (en) * | 2019-02-12 | 2021-12-08 | ファナック株式会社 | How to set the machine learning device, control device, and machine learning search range |
JP7000371B2 (en) * | 2019-03-22 | 2022-01-19 | ファナック株式会社 | Machine learning equipment, control systems and machine learning methods |
JP7384572B2 (en) * | 2019-05-13 | 2023-11-21 | 株式会社東芝 | Control device, control method, and motor control system |
JP7305113B2 (en) * | 2019-05-30 | 2023-07-10 | 国立大学法人長岡技術科学大学 | Motor control device, motor device and machine learning device |
JP2021002194A (en) * | 2019-06-21 | 2021-01-07 | ファナック株式会社 | Numerical control apparatus, cnc machine tool, numerical control method, and numerical control program |
JP2021040239A (en) | 2019-09-03 | 2021-03-11 | ファナック株式会社 | Machine learning device, receiving device, and machine learning method |
CN111152227A (en) * | 2020-01-19 | 2020-05-15 | 聊城鑫泰机床有限公司 | Mechanical arm control method based on guided DQN control |
CN112083687B (en) * | 2020-09-11 | 2021-06-11 | 苏州浩智工业控制技术有限公司 | Over-quadrant compensation method and device based on speed feedforward of field bus |
CN112828678B (en) * | 2021-02-09 | 2022-03-18 | 蓝思智能机器人(长沙)有限公司 | Speed compensation method and device and electronic equipment |
CN113472242B (en) * | 2021-07-05 | 2022-07-15 | 江南大学 | Anti-interference self-adaptive fuzzy sliding mode cooperative control method based on multiple intelligent agents |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3465236B2 (en) * | 2000-12-20 | 2003-11-10 | 科学技術振興事業団 | Robust reinforcement learning method |
US7437201B2 (en) * | 2003-01-14 | 2008-10-14 | Cullen Christopher P | Electric motor controller |
JP3917094B2 (en) | 2003-03-17 | 2007-05-23 | 山洋電気株式会社 | Motor control method and apparatus |
JP3943061B2 (en) * | 2003-08-22 | 2007-07-11 | 三菱電機株式会社 | Servo control device |
JP4741637B2 (en) * | 2008-06-30 | 2011-08-03 | ファナック株式会社 | Servo motor drive control device and drive control method |
CN102725956B (en) * | 2010-01-27 | 2015-07-01 | 三菱电机株式会社 | Motor control device |
JP2013003845A (en) * | 2011-06-16 | 2013-01-07 | Institute Of National Colleges Of Technology Japan | Built-in intelligence controller, control system, control program, recording medium, and control method |
JP6214948B2 (en) | 2013-07-12 | 2017-10-18 | 三菱重工業株式会社 | Friction compensation device, friction compensation method, and servo control device |
JP6020537B2 (en) | 2014-11-21 | 2016-11-02 | 株式会社安川電機 | Motor control device and motor control method |
CN105045103B (en) * | 2015-07-27 | 2018-06-29 | 台州学院 | One kind is based on LuGre friction models servo manipulator friciton compensation control system and method |
JP6169655B2 (en) * | 2015-07-30 | 2017-07-26 | ファナック株式会社 | Machine tool, simulation device, and machine learning device |
JP6106226B2 (en) * | 2015-07-31 | 2017-03-29 | ファナック株式会社 | Machine learning device for learning gain optimization, motor control device including machine learning device, and machine learning method |
JP6193961B2 (en) * | 2015-11-30 | 2017-09-06 | ファナック株式会社 | Machine learning device and method for optimizing the smoothness of feed of a machine feed shaft, and motor control device equipped with the machine learning device |
-
2017
- 2017-07-18 JP JP2017138949A patent/JP6538766B2/en active Active
-
2018
- 2018-06-28 US US16/021,447 patent/US10418921B2/en active Active
- 2018-07-06 DE DE102018211148.0A patent/DE102018211148A1/en active Pending
- 2018-07-13 CN CN201810771452.8A patent/CN109274314B/en active Active
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180231943A1 (en) * | 2015-11-30 | 2018-08-16 | Omron Corporation | Control device |
US10571874B2 (en) * | 2015-11-30 | 2020-02-25 | Omron Corporation | Control device for performing learning control |
US10444733B2 (en) * | 2017-04-07 | 2019-10-15 | Fanuc Corporation | Adjustment device and adjustment method |
US10691091B2 (en) * | 2017-09-15 | 2020-06-23 | Fanuc Corporation | Controller and machine learning device |
US20200372413A1 (en) * | 2018-03-15 | 2020-11-26 | Omron Corporation | Learning device, learning method, and program therefor |
US12051013B2 (en) * | 2018-03-15 | 2024-07-30 | Omron Corporation | Learning device, learning method, and program therefor for shorten time in generating appropriate teacher data |
US11023827B2 (en) * | 2018-03-19 | 2021-06-01 | Fanuc Corporation | Machine learning device, servo control device, servo control system, and machine learning method for suppressing variation in position error using feedforward control |
TWI715425B (en) * | 2018-10-12 | 2021-01-01 | 日商三菱電機股份有限公司 | Positioning control device and positioning method |
US11654556B2 (en) * | 2019-01-16 | 2023-05-23 | Fanuc Corporation | Determination apparatus for determining an operation of an industrial robot |
CN113325804A (en) * | 2021-06-08 | 2021-08-31 | 中国科学院数学与系统科学研究院 | Q learning extended state observer design method of motion control system |
Also Published As
Publication number | Publication date |
---|---|
US10418921B2 (en) | 2019-09-17 |
JP2019021024A (en) | 2019-02-07 |
JP6538766B2 (en) | 2019-07-03 |
CN109274314A (en) | 2019-01-25 |
DE102018211148A1 (en) | 2019-01-24 |
CN109274314B (en) | 2020-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10418921B2 (en) | Machine learning device, servo motor control device, servo motor control system, and machine learning method | |
US10564619B2 (en) | Machine learning device, servo control device, servo control system, and machine learning method | |
US10747193B2 (en) | Machine learning apparatus, servo control apparatus, servo control system, and machine learning method | |
US11023827B2 (en) | Machine learning device, servo control device, servo control system, and machine learning method for suppressing variation in position error using feedforward control | |
US10444733B2 (en) | Adjustment device and adjustment method | |
US10824121B2 (en) | Machine learning device, servo motor controller, servo motor control system, and machine learning method | |
JP6490131B2 (en) | Machine learning device, servo control device, servo control system, and machine learning method | |
US11087509B2 (en) | Output device, control device, and evaluation function value output method | |
US10901396B2 (en) | Machine learning device, control device, and machine learning method | |
US10877442B2 (en) | Machine learning device, control device, and machine learning method | |
JP6474456B2 (en) | Machine learning apparatus, servo control system, and machine learning method | |
US11126149B2 (en) | Control parameter adjusting device and adjusting method using machine learning | |
US11256220B2 (en) | Machine learning device, control device and machine learning method | |
US11029650B2 (en) | Machine learning device, control system, and machine learning method | |
US11507885B2 (en) | Machine learning device, control device, and machine learning search range setting method | |
WO2021251226A1 (en) | Control assist device, control device, and control assist method | |
US10684594B2 (en) | Machine learning device, servo motor controller, servo motor control system, and machine learning method | |
US11243501B2 (en) | Machine learning device, control system, and machine learning | |
US10901374B2 (en) | Machine learning device, control device, and machine learning method | |
CN115437238B (en) | Variable-length PD type iterative learning control method for direct-current motor driven single-rod system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FANUC CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OHO, YUSAKU;SONODA, NAOTO;REEL/FRAME:046229/0120 Effective date: 20180621 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |