US20180164756A1 - Control system and machine learning device - Google Patents
Control system and machine learning device Download PDFInfo
- Publication number
- US20180164756A1 US20180164756A1 US15/838,510 US201715838510A US2018164756A1 US 20180164756 A1 US20180164756 A1 US 20180164756A1 US 201715838510 A US201715838510 A US 201715838510A US 2018164756 A1 US2018164756 A1 US 2018164756A1
- Authority
- US
- United States
- Prior art keywords
- machine
- learning
- section
- adjustment
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/18—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/18—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
- G05B19/408—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by data handling or data format, e.g. reading, buffering or conversion of data
- G05B19/4083—Adapting programme, configuration
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/18—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
- G05B19/414—Structure of the control system, e.g. common controller or multiprocessor systems, interface to servo, programmable interface controller
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/18—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
- G05B19/4155—Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by programme execution, i.e. part programme or machine function execution, e.g. selection of a programme
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/33—Director till display
- G05B2219/33034—Online learning, training
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/33—Director till display
- G05B2219/33056—Reinforcement learning, agent acts, receives reward, emotion, action selective
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/42—Servomotor, servo controller kind till VSS
- G05B2219/42018—Pid learning controller, gains adapted as function of previous error
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/42—Servomotor, servo controller kind till VSS
- G05B2219/42152—Learn, self, auto tuning, calibrating, environment adaptation, repetition
Definitions
- the present invention relates to a control system and a machine learning device and, in particular, to a controller and a machine learning device that perform machine learning to optimize servo gain in machine control inside a facility.
- the setting of the servo gain of a controller largely affects the action of a machine to be controlled and directly affects the quality and the productivity of a workpiece.
- Servo gain is often set and adjusted on a case-by-case basis according to a need using an adjustment tool for each machine under machining conditions such as a workpiece, a tool, accuracy, cycle time, weight of a mold, and viscosity of a resin, and the optimization of the servo gain requires time. Further, in some situations, gain is desirably switched to an optimum one during an operation on a case-by-case basis according to an operating state.
- Japanese Patent Application Laid-open No. 3-259303 discloses a system that performs an adjustment in the control of a machine action using a neural network.
- Japanese Patent Application Laid-open No. 2006-302282 discloses a technology for acquiring the performance data of a separate robot and optimizing a control program including control gain according to the performance goal of a user.
- a machine learning device is introduced into a control apparatus or a field computer placed in a rank higher than a plurality of controllers to perform machine learning so that a priority factor is optimized based on information (position deflection, cycle time, a motor load, consumption power, a speed fluctuation rate, or the like) on each axis and information (characteristics of a motor, rigidity of the machine, a type of a workpiece, viscosity of a resin, a used tool, a jig, weight and a shape of a mold, a type and viscosity of a resin, or the like) on the operating conditions of each machine collected from the controller of the machine.
- information position deflection, cycle time, a motor load, consumption power, a speed fluctuation rate, or the like
- information characteristics of a motor, rigidity of the machine, a type of a workpiece, viscosity of a resin, a used tool, a jig, weight and a shape of a mold,
- an estimated initial value of optimum gain is set in a controller according to the operating conditions of each machine and the latest data of a value function, and a reward based on information on each axis obtained from the controller is calculated and used in the leaning of a machine learning device.
- a plurality of value functions used in machine learning may be stored in advance according to at least a priority factor, and an optimum one of the value functions may be selectively used from a data base according to a situation.
- a control system has at least one machine that machines a workpiece and a high-order apparatus that adjusts servo gain used in machining by the machine.
- the control system includes: a machine learning device that performs machine learning of an adjustment of the servo gain of the machine, wherein the machine learning device has: a state observation section that observes machine information on the machine as state data; a determination data acquisition section that acquires information on machining by the machine as determination data; a reward calculation section that calculates a reward based on the determination data and preset reward conditions; a learning section that performs the machine learning of the adjustment of the servo gain of the machine; a decision making section that determines an action of adjustment of the servo gain of the machine, based on the state data and a machine learning result of the adjustment of the servo gain of the machine by the learning section; and a gain changing section that changes the servo gain of the machine, based on the action of adjustment of the servo gain determined by the decision making section; and the learning section
- the control system further includes a value function switching determination section that switches a value function used in the machine learning and the determination of the action of adjustment, based on a priority factor preset to the machine. In addition, a positive reward or a negative reward is calculated based on a reward condition set correspondingly to the priority factor.
- the control system is connected to at least one another high-order apparatus and mutually exchanges or shares the machine learning result with the other high-order apparatus.
- a machine learning device performs machine learning of an adjustment of servo gain used in machining by at least one machine that machines a workpiece.
- the machine learning device includes: a state observation section that observes machine information on the machine as state data; a determination data acquisition section that acquires information on machining by the machine as determination data; a reward calculation section that calculates a reward based on the determination data and preset reward conditions; a learning section that performs the machine learning of the adjustment of the servo gain of the machine; a decision making section that determines an action of adjustment of the servo gain of the machine, based on the state data and a machine learning result of the adjustment of the servo gain of the machine by the learning section; and a gain changing section that changes the servo gain of the machine, based on the action of adjustment of the servo gain determined by the decision making section, wherein the learning section performs the machine learning of the adjustment of the servo gain of the machine, based on the state data, the action of adjustment, and the reward calculated after the action of adjustment.
- the machine learning device further includes: a value function switching determination section that switches a value function used in the machine learning and the determination of the action of adjustment, based on a priority factor preset to the machine.
- a value function switching determination section that switches a value function used in the machine learning and the determination of the action of adjustment, based on a priority factor preset to the machine.
- FIG. 1 is a diagram for describing the basic concept of a reinforcement learning algorithm
- FIG. 2 is a schematic diagram showing a neuron model
- FIG. 3 is a schematic diagram showing a neural network having weights of three layers
- FIG. 4 is a diagram showing an image on the machine learning of a control system according to an embodiment of the present invention.
- FIG. 5 is a schematic function block diagram of the control system according to the embodiment of the present invention.
- FIG. 6 is a flowchart showing the flow of the machine learning according to the embodiment of the present invention.
- a machine learning device acting as artificial intelligence is introduced into a high-order apparatus (such as a control apparatus and a field computer) placed in a rank higher than at least one or more controllers that control respective machines to perform the machine learning of the adjustment of servo gain used in the control the machines by the respective controllers with respect to information on respective axes, machine operating conditions, and priority factors.
- a high-order apparatus such as a control apparatus and a field computer
- controllers that control respective machines to perform the machine learning of the adjustment of servo gain used in the control the machines by the respective controllers with respect to information on respective axes, machine operating conditions, and priority factors.
- machine learning will be briefly described.
- the machine learning is realized in such a way that useful rules, knowledge expressions, determination criteria, or the like are extracted by analysis from sets of data input to a device that performs the machine learning (hereinafter called a machine learning device), determination results of the extraction are output, and learning of knowledge is performed.
- a machine learning device that performs the machine learning
- the methods are roughly classified into “supervised learning,” “unsupervised learning,” and “reinforcement learning.”
- shallow learning by which to learn the extraction of feature amounts per se.
- the “supervised learning” is a model by which sets of input and result (label) data are given to a machine learning device in large amounts to learn the features of the data sets and estimate results from inputs, i.e., a method by which to learn the relationship between inputs and results may be inductively obtained.
- the method may be realized using an algorithm such as a neural network that will be described later.
- the “unsupervised learning” is a learning method by which a device learns, with the reception of only large amounts of input data, as to how the input data is distributed and applies compression, classification, shaping, or the like to the input data even if corresponding supervised output data is not given.
- the features of the data sets may be arranged in clusters as two of a kind. Using the results, any standard is set to allocate outputs so as to be optimized. Thus, the prediction of the outputs may be realized.
- the “reinforcement learning” is a method by which to learn not only determinations or classifications but also actions to perform learning of optimum actions in consideration of interactions given to environments by actions, i.e., learning to maximize rewards that will be obtained in the future.
- a machine learning device may start learning in a state in which the machine learning device does not completely know or imperfectly knows results brought about by actions.
- a machine learning device may start learning from a desirable start point in an initial state in which prior learning (a method such as the above supervised learning and inverse reinforcement learning) is performed in such a way as to imitate human's actions.
- the present invention employs, as the principal learning algorithm of a machine learning device, the algorithm of reinforcement learning by which the machine learning device is given rewards to automatically learn actions to achieve a goal.
- FIG. 1 is a diagram for describing the basic concept of a reinforcement learning algorithm.
- agent learning and an action are advanced by the interactions between an agent (machine learning device) acting as a learning subject and an environment (control target system) acting as a control target. More specifically, the following interactions are performed between the agent and the environment.
- the agent observes an environmental state s t at certain time.
- the agent selects and performs an action a t that he/she is allowed to take based on an observation result and past learning.
- the agent accepts a reward r t ⁇ 1 based on the state change as a result of the action a t .
- the agent advances the learning based on the state s t , the action a t , the reward r t+1 , and a past learning result.
- the agent does not understand the standard of a value determination for selecting the optimum action a t with respect to the environmental state s t in the above action selection (2). Therefore, the agent selects various actions a t in a certain state s t and learns the selection of a better action, i.e., the standard of an appropriate value determination based on rewards r t ⁇ 1 given with respect to the actions a t at that time.
- the agent acquires the mapping of an observed state s t , an action a t , and a reward r t+1 as reference information for determining an amount of a reward that he/she is allowed to obtain in the future. For example, when the number of states that the agent is allowed to take at each time is m and the number of actions that the agent is allowed to take is n, the agent obtains a two-dimensional arrangement of m ⁇ n, in which rewards r t+1 corresponding to pairs of states s t and actions a t are stored, by repeatedly performing actions.
- the agent updates the value function (evaluation function) while repeatedly performing actions to learn an optimum action corresponding to a state.
- a “state value function” is a value function indicating to what degree a certain state s t is valuable.
- the state value function is expressed as a function using a state as an argument and updated based on a reward obtained with respect to an action in a certain state, a value of a future state changed with the action, or the like in learning from repeated actions.
- the update formula of the state value function is defined according to a reinforcement learning algorithm. For example, in temporal-difference (TD) learning indicating as one of reinforcement learning algorithms, the state value function is updated by the following Formula 1. Note that in Formula 1, ⁇ is called a learning coefficient, ⁇ is called a discount rate, and the learning coefficient and the discount rate are defined to fall within 0 ⁇ 1 and 0 ⁇ 1, respectively.
- an “action value function” is a value function indicating to what degree an action a t is valuable in a certain state s t .
- the action value function is expressed as a function using a state and an action as arguments and updated based on a reward obtained with respect to an action in a certain state, an action value of a future state changed with the action, or the like in learning from repeated actions.
- the update formula of the action value function is defined according to a reinforcement learning algorithm. For example, in Q-learning indicating as one of typical reinforcement learning algorithms, the action value function is updated by the following Formula 2. Note that in Formula 2, ⁇ is called a learning coefficient, ⁇ is called a discount rate, and the learning coefficient and the discount rate are defined to fall within 0 ⁇ 1 and 0 ⁇ 1, respectively.
- Formula 2 expresses a method for updating an evaluation value Q(s t , a t ) of an action a t in a state s t based on a reward r t+1 returned as a result of the action a t .
- Q(s t , a t ) is increased if an evaluation value Q(s t ⁇ 1 , max(a)) of the best action max(a) in a next state as a result of the reward r t+1 and the action a t is larger than the evaluation value Q(s t , a t ) of the action a t in the state s t , while Q(s t , a t ) is decreased if not. That is, a value of a certain action in a certain state is made closer to a value of a reward immediately returned as a result of the action and the best action in a next state accompanied by the action.
- an action a t by which a reward (r t+1 +r t+2 + . . . ) over a future becomes maximum in a current state s t (an action for changing to a most valuable state when a state value function is used or a most valuable action in the state when an action value function is used) is selected using a value function (evaluation function) generated by past learning.
- a value function evaluation function
- the above update formula may be realized by adjusting parameters of an approximate function based on a method such as stochastic gradient descent.
- a supervised learning device such as a neural network may be used.
- the neural network is constituted by a calculation unit, a memory, or the like that realizes a neural network following a neuron model as shown in, for example, FIG. 2 .
- FIG. 2 is a schematic diagram showing a neuron model.
- a neuron outputs an output y with respect to a plurality of inputs x (here, inputs x 1 to x 3 as an example).
- a weight w (w 1 to w 3 ) is placed on each of the inputs x 1 to x 3 .
- the neuron outputs the output y expressed by the following Formula 3. Note that in the following Formula 3, an input x, an output y, and a weight w are all vectors.
- ⁇ indicates a bias
- f k indicates an activation function.
- FIG. 3 is a schematic diagram showing a neural network having weights of three layers D 1 to D 3 .
- a plurality of inputs x (here, inputs x 1 to x 3 as an example) is input from the left side of the neural network, and results y (here, results y 1 to y 3 as an example) are output from the right side of the neural network.
- weights are placed correspondingly on the inputs x 1 to x 3 .
- the weights placed on the inputs are collectively indicated as w 1 .
- the neurons N 11 to N 13 output z 11 to z 13 , respectively.
- Z 11 to Z 13 are collectively indicated as a feature vector z 1 , and may be regarded as vectors obtained by extracting feature amounts of the input vectors.
- the feature vector z 1 is a feature vector between the weight w 1 and a weight w 2 .
- weights are placed correspondingly on z 11 to z 13 .
- the weights placed on the feature vectors are collectively indicated as w 2 .
- the neurons N 21 and N 22 output z 21 and z 22 , respectively.
- z 21 and z 22 are collectively indicated as a feature vector z 2 .
- the feature vector z 2 is a feature vector between the weight w 2 and a weight w 3 .
- weights are placed correspondingly on the feature vectors z 21 and z 22 .
- the weights placed on the feature vectors are collectively indicated as w 3 .
- the neurons N 31 to N 33 output the results y 1 to y 3 , respectively.
- the action of the neural network includes a learning mode and a prediction mode.
- a learning data set is used to learn the weight w in the learning mode, and the parameters are used to determine the action of a machining machine in the prediction mode (here, “prediction” is only for the sake of convenience, but various tasks such as detection, classification, and deduction may be performed).
- the back propagation is a method for adjusting (learning) each of the weights to reduce a difference between the output y obtained when the input x is input and a real output y (supervised) for each of the neurons.
- the neural network may have three or more layers (called deep learning). It is possible to automatically obtain a calculation unit that extracts the features of inputs on a step-by-step basis and performs the regression of a result only from supervised data.
- the above value function (evaluation function) may be stored as the neural network to advance learning while the above machining (1) to (5) in the above reinforcement learning is repeatedly performed.
- a machine learning device may generally advance the learning so as to be adapted to the new environment by performing additional learning. Accordingly, as in the present invention, by the application of learning to the adjustment of servo gain used to control a machine by each controller, additional learning under new machining preconditions is performed based on the learning of the adjustment of past servo gain. Thus, it becomes possible to perform the learning of the adjustment of servo gain in a short period of time.
- reinforcement learning employs a system in which a plurality of agents is connected to each other via a network or the like, and information on states s, actions a, rewards r, or the like is shared between the agents and applied to each learning, whereby each of the agents performs dispersed reinforcement learning in consideration of the environments of the other agents to be able to perform efficient learning.
- a plurality of agents machine learning devices incorporated in a plurality of environments (numerical controllers of lathe machining machines) performs dispersed machine learning in a state of being connected to each other via a network or the like, learning of the adjustment of a machining path of a turning cycle command and machining conditions in the numerical control controllers of the lathe machining machines may be efficiently performed.
- FIG. 4 is a diagram showing an image on the machine learning of the adjustment of servo gain used to control a machine by each controller in a control system into which a machine learning device according to an embodiment of the present invention is introduced. Note that in FIG. 4 , only configurations necessary for describing the machine learning in the control system of the embodiment are shown.
- information on each axis and machine operating conditions acquired from a machine 3 indicating state information are input to a machine learning device 20 as information used by the machine learning device 20 to specify an environment (the state s t described in 1. Machine Learning).
- the machine learning device 20 outputs the action of adjustment of servo gain used to control the machine by each controller as an output to an environment (the action a t described in 1. Machine Learning).
- the above state information is defined by the information on each axis and the machine operating conditions acquired from the machine 3 . Further, the above action of adjustment may be defined by an adjustment amount of the servo gain used to control the machine output from the machine learning device 20 .
- a condition (positive/negative reward) defined by a priority factor and an operating result of the machine 3 is employed as a reward (the reward r t described in 1. Machine Learning) to be given to the machine learning device 20 .
- a reward the reward r t described in 1. Machine Learning
- an operator may appropriately set as to which data is used to determine a reward.
- the machine learning device 20 performs machine learning based on state information (input data), an action of adjustment (output data), and a reward described above.
- a state s t is defined by the combination of input data at certain time t, the adjustment of servo gain performed with respect to the defined state s t is equivalent to an action a t , and a value evaluated and calculated based on data on a machine operating result newly obtained as a result of the adjustment of the servo gain with the action a t is equivalent to a reward r t+1 .
- a state s t , an action a t , and a reward r t+1 are applied to the update formula of a value function (evaluation function) corresponding to a machine learning algorithm to advance the learning.
- FIG. 5 is the function block diagram of the control system according to the embodiment.
- the machine learning device 20 corresponds to the agent and configurations such as the machine 3 other than the machine learning device 20 correspond to the environment.
- the control system 1 is constituted by a high-order apparatus 2 having the machine learning device 20 and at least one machine 3 .
- the high-order apparatus 2 is an apparatus placed in a rank higher than controllers that control the machines 3 , and examples of the high-order apparatus 2 include a control apparatus, a field computer, a host computer, or the like.
- Each of the machines 3 acting as a facility inside a factory has a machine information output section 30 , a machining information output section 31 , and a servo gain setting section 32 .
- the machine information output section 30 acquires information on the machine 3 such as temperature and rigidity of the machine 3 , a type of a workpiece to be machined, a tool and a jig used in machining, a type of a mold, and types of a resin and clay used in machining and outputs the acquired information to the high-order apparatus 2 .
- the machining information output section 31 acquires information on machining such as a position deflection rate and a fluctuation rate of an axis, machining cycle time, a maximum motor load value, consumption power, and a speed and a fluctuation rate of a motor and outputs the acquired information to the high-order apparatus 2 .
- the servo gain setting section 32 sets servo gain such as current loop gain, speed loop gain, position loop gain, and pressure control proportional/integral gain.
- the machine information output section 30 acquires information on the machine 3 from a setting memory (not shown) of the machine 3 , a sensor (not shown) provided in each section of the machine 3 , or the like and outputs the acquired information on the machine 3 in response to a request from the high-order apparatus 2 .
- the machining information output section 31 monitors an output or the like of a sensor or a servo motor (not shown) provided in each section of the machine 3 when machining is performed, generates information on the machining based on data monitored at, for example, a timing at which one-cycle machining is completed, and outputs the generated information on the machining to the high-order apparatus 2 .
- the servo gain setting section 32 sets the servo gain of a servo motor provided in the machine 3 in response to a request from the high-order apparatus 2 .
- servo gain For the setting of servo gain, current loop gain, speed loop gain, position loop gain, pressure control proportional/integral gain, or the like may be set.
- the machine learning device 20 that performs machine learning performs the action of adjustment of the servo gain of the machine 3 when machining by the machine 3 is started, and performs the learning of the action of adjustment when the machining by the machine 3 with the servo gain adjusted by the action of adjustment is completed.
- the machine learning device 20 that performs machine learning has a state observation section 21 , a determination data acquisition section 22 , a learning section 23 , a value function switching determination section 24 , a value function updating section 25 , a reward calculation section 26 , a decision making section 27 , and a gain changing section 28 .
- the machine learning device 20 may be incorporated in the high-order apparatus 2 as shown in FIG. 5 or may be constituted by a personal computer or the like connected to the high-order apparatus 2 .
- the state observation section 21 is a function unit that observes machine information output from the machine information output section 30 provided in the machine 3 as state data and acquires the observed information inside the machine learning device 20 .
- the state observation section 21 outputs the observed state data to the learning section 23 .
- the state observation section 21 may temporarily store the observed state data on a memory (not shown) to be managed.
- the state data observed by the state observation section 21 may be data acquired by the latest machining operation of the machine 3 or may be data acquired by the past machining operation of the machine 3 .
- the determination data acquisition section 22 is a function unit that acquires machining information output from the machining information output section 31 provided in the machine 3 inside the machine learning device 20 as determination data.
- the determination data acquisition section 22 outputs the acquired determination data to the learning section 23 .
- the determination data acquisition section 22 may temporarily store the acquired determination data on the memory (not shown) to be managed together with state data acquired by the state observation section 21 .
- the determination data acquired by the determination data acquisition section 22 may be data acquired by the latest machining operation of the machine 3 or may be data acquired by the past machining operation of the machine 3 .
- the learning section 23 performs the machine learning (reinforcement learning) of the action of adjustment of servo gain with respect to machine information and machining information for each priority factor based on state data observed by the state observation section 21 , determination data acquired by the determination data acquisition section 22 , and a reward calculated by the reward calculation section 26 that will be described later.
- a state s t is defined by the combination of state data at certain time t, the determination of the action of adjustment of the servo gain of the machine 3 by the decision making section 27 that will be described later and the adjustment of the servo gain of the machine 3 by the gain changing section 28 that will be described later according to the defined state s t are equivalent to an action a t , and a value calculated by the reward calculation section 26 that will be described later based on the determination data acquired by the determination data acquisition section 22 as a result of the adjustment of the servo gain of the machine 3 and the machining of the machine 3 is equivalent to a reward r t+1 .
- a value function used in the learning is determined according to an applied learning algorithm. For example, when Q-learning is used, it is only necessary to update an action value function Q(s t , a t ) according to the above Formula 2 to advance the learning.
- the value function switching determination section 24 performs the determination of the action of adjustment of servo gain with respect to the machine 3 and the switching of a value function used in machine learning based on a result of the action of adjustment of the servo gain with respect to the machine 3 based on the priority factor of each machine 3 set by a user.
- a value function storage section 40 provided on the memory (not shown) of the machine learning device 20 , a plurality of value functions different for each priority factor of the machine is stored in advance.
- the value function switching determination section 24 selectively switches a value function to be used by the learning section 23 , the value function updating section 25 , and the decision making section 27 according to a priority factor set in the machine 3 that performs the action of adjustment of servo gain (or the machine 3 that performs the machine learning of the action of adjustment of the servo gain).
- the value function updating section 25 stores a result of machine learning performed by the learning section 23 in the value function storage section 40 after applying the same to a value function selected by the value function switching determination section 24 .
- the value function acting as the learning result stored in the value function storage section 40 by the value function updating section 25 is used in machine learning and the determination of the action of adjustment of servo gain from the next time.
- a learning result may be stored in such a way that a value function corresponding to a machine learning algorithm to be used is stored in a supervised learning device such as a support vector machine (SVM) and a neural network of an approximate function, an arrangement, or a multiple-value output.
- SVM support vector machine
- the reward calculation section 26 performs the calculation of a reward to be used in machine learning based on reward conditions preset on the memory (not shown) and determination data acquired by the determination data acquisition section 22 .
- Reward 1 Improvement in Machining Quality (Positive/Negative Reward)
- a positive reward is given.
- a negative reward is given according to the degree. Note that as for giving a negative reward, a large negative reward may be given when the machining accuracy is too bad and a small negative reward may be given when the machining accuracy is too good.
- Reward 3 Energy-Saving Performance (Positive/Negative Reward)
- the above reward conditions are preferably used in combination according to the priority factor rather than being used singly.
- the priority factor when the priority factor is set to an improvement in machining quality, merely setting a reward condition on the improvement in the machining quality does not suffice. That is, if reward conditions on an improvement in productivity and energy-saving performance are also set at the same time and an amount of a reward obtained when the reward condition on the improvement in the machining quality is satisfied is set at a value larger than an amount of a reward obtained when the reward conditions on the improvement in the productivity and the energy-saving performance are satisfied, the selection of an action of adjustment to maintain minimum productivity and energy-saving performance may be learned while the machining quality being prioritized. The same applies to a case in which the priority factor is set to an improvement in productivity or energy-saving performance.
- the decision making section 27 determines the action of adjustment of the servo gain of the machine 3 based on a learning result learned by the learning section 23 (and stored in the value function storage section 40 ) and state data observed by the state observation section 21 .
- the determination of the action of adjustment of the servo gain here is equivalent to an action a used in machine learning.
- the selectable actions may be actions by which a plurality of types of servo gain is adjusted at the same time or may be actions by which the servo gain of a plurality of servo motors provided in the machine 3 is adjusted at the same time.
- the above ⁇ greedy method may be employed to select a random action with a constant probability for the purpose of advancing the learning of the learning section 23 that will be described later.
- the gain changing section 28 instructs the servo gain setting section 32 of the machine 3 to adjust servo gain based on the action of adjustment of the servo gain of the machine 3 determined by the decision making section 27 .
- Step SA 01 When the machine learning starts, the state observation section 21 observes machine data output from the machine 3 as state data.
- Step SA 02 The learning section 23 specifies a current state s t based on the state data observed by the state observation section 21 .
- Step SA 03 The decision making section 27 selects an action a t (action of adjustment of the servo gain of the machine 3 ) based on a past learning result and the state s t specified in step SA 02 .
- Step SA 04 The gain changing section 28 performs the action a t selected in step SA 03 .
- Step SA 05 The state observation section 21 observes machine information on the machine 3 as state information, and the determination data acquisition section 22 acquires machining information on the machine 3 as determination data. At this stage, the state of the machine 3 changes with a temporal transition from time t to time t+1 as a result of the action a t performed in step SA 04 .
- Step SA 06 The reward calculation section 26 calculates a reward r t+1 based on the determination data acquired in step SA 05 .
- Step SA 07 The learning section 23 advances the machine learning based on the state s t specified in step SA 02 , the action a t selected in step SA 03 , and the reward r t+1 calculated in step SA 06 and then returns to step SA 02 .
- the gain changing section 28 adjusts the servo gain of the machine 3 based on the decision making of the decision making section 27 , the machine 3 is controlled by the adjusted servo gain to machine a workpiece, state data is observed by the state observation section 21 , determination data is acquired by the determination data acquisition section 22 , and the machine learning is repeatedly performed. Thus, a more excellent learning result may be acquired.
- the machine learning device 20 may be operated so as not to perform new learning using the learning data that has been sufficiently subjected to the machine learning as it is.
- machine learning device 20 that has completed the machine learning may be attached to other high-order apparatuses 2 and operated using the learning data obtained when the sufficient machine learning was performed.
- the machine learning device 20 of the high-order apparatus 2 may perform the machine learning alone. However, when the high-order apparatus 2 provided in each of a plurality of control systems 1 further has a unit used to communicate with an outside, it becomes possible to send/receive and share a value function stored in each of the value function storage sections 40 . Thus, the machine learning may be more efficiently performed. For example, parallel learning is advanced between a plurality of the high-order apparatuses 2 in such a way that state data, determination data, and value functions acting as learning results are exchanged between the high-order apparatuses 2 while adjustment targets and adjustment amounts different between the plurality of high-order apparatuses 2 are fluctuated within a prescribed range. Thus, the learning may be more efficiently performed.
- communication may be performed via a management apparatus (not shown), the high-order apparatuses 2 may directly communicate with each other, or a cloud may be used.
- a communication unit with a faster communication speed is preferably provided.
- the relationships between the respective function units provided in the high-order apparatus 2 described in the above embodiment are not limited to those shown in the function block diagram of FIG. 5 . That is, functions may be divided in any unit or any hierarchical relationship may be established between the functions so long as configurations equivalent to the functions of the respective function units are provided.
- the above embodiment describes the adjustment of the servo gain of the servo motor provided in the one machine 3 .
- learning may be performed in such a way that the servo gain of a plurality of machines 3 arranged in a factory is adjusted at the same time to attain an improvement in overall energy-saving performance in the factory.
- a plurality of actions different in the combination of adjustment amounts or the like is registered in the action pattern storage section 41 with consideration given to the combination of the adjustment of the servo gain of the plurality of machines 3 as an action.
- the decision making section 27 determines an action so that consumption power obtained from the plurality of machines 3 becomes small, and the learning section 23 learns the determined action.
- machine learning that achieves the above object may be performed.
- the above embodiment describes the configuration in which a value function is switched for each priority factor by the value function switching determination section 24 .
- the priority factor may be added to the input data of the learning section 23 to omit the value function switching determination section 24 .
- the efficiency of the machine learning for each priority factor reduces, but the same effect may be obtained when the machine learning is performed over a longer period of time.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Manufacturing & Machinery (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Numerical Control (AREA)
- Feedback Control In General (AREA)
Abstract
Description
- The present invention relates to a control system and a machine learning device and, in particular, to a controller and a machine learning device that perform machine learning to optimize servo gain in machine control inside a facility.
- The setting of the servo gain of a controller largely affects the action of a machine to be controlled and directly affects the quality and the productivity of a workpiece. Servo gain is often set and adjusted on a case-by-case basis according to a need using an adjustment tool for each machine under machining conditions such as a workpiece, a tool, accuracy, cycle time, weight of a mold, and viscosity of a resin, and the optimization of the servo gain requires time. Further, in some situations, gain is desirably switched to an optimum one during an operation on a case-by-case basis according to an operating state.
- As the servo gain of a controller, various types such as current loop gain, speed loop gain, position loop gain, and pressure control proportional/integral gain exist. Conventionally, there has been a need to separately set gain at an optimum value using an adjustment tool or the like depending on various conditions such as rigidity of a machine, a load inertia, a type of a tool, and an action mode. In addition, there has been a need to readjust gain to an optimum one depending on a priority factor such as machining accuracy and a speed.
- As a related art relating to the adjustment of servo gain, Japanese Patent Application Laid-open No. 3-259303 discloses a system that performs an adjustment in the control of a machine action using a neural network. In addition, Japanese Patent Application Laid-open No. 2006-302282 discloses a technology for acquiring the performance data of a separate robot and optimizing a control program including control gain according to the performance goal of a user.
- In the adjustment of servo gain, it is often difficult to estimate optimum gain before an operation due to the action conditions and the action environments of a machine to be controlled such as rigidity of the machine, a target workpiece, a tool, a jig, weight of a mold, and the affect of the viscosity of a resin during injection molding. In addition, it is often difficult to estimate optimum gain before an operation since a priority factor for setting gain such as a shape error, productivity, consumption power, and a load on a machine is different on a case-by-case basis. Such problems may not be addressed by the technologies disclosed in Japanese Patent Application Laid-open No. 3-259303 and Japanese Patent Application Laid-open No. 2006-302282.
- In view of the above problems, it is an object of the present invention to provide a controller and a machine learning device that perform machine learning to optimize the servo gain of a machine inside a facility in accordance with action conditions, action environments, and a priority factor of the machine.
- In the present invention, a machine learning device is introduced into a control apparatus or a field computer placed in a rank higher than a plurality of controllers to perform machine learning so that a priority factor is optimized based on information (position deflection, cycle time, a motor load, consumption power, a speed fluctuation rate, or the like) on each axis and information (characteristics of a motor, rigidity of the machine, a type of a workpiece, viscosity of a resin, a used tool, a jig, weight and a shape of a mold, a type and viscosity of a resin, or the like) on the operating conditions of each machine collected from the controller of the machine. In the control system of the present invention, an estimated initial value of optimum gain is set in a controller according to the operating conditions of each machine and the latest data of a value function, and a reward based on information on each axis obtained from the controller is calculated and used in the leaning of a machine learning device. In the control system of the present invention, a plurality of value functions used in machine learning may be stored in advance according to at least a priority factor, and an optimum one of the value functions may be selectively used from a data base according to a situation.
- A control system according to an embodiment of the present invention has at least one machine that machines a workpiece and a high-order apparatus that adjusts servo gain used in machining by the machine. The control system includes: a machine learning device that performs machine learning of an adjustment of the servo gain of the machine, wherein the machine learning device has: a state observation section that observes machine information on the machine as state data; a determination data acquisition section that acquires information on machining by the machine as determination data; a reward calculation section that calculates a reward based on the determination data and preset reward conditions; a learning section that performs the machine learning of the adjustment of the servo gain of the machine; a decision making section that determines an action of adjustment of the servo gain of the machine, based on the state data and a machine learning result of the adjustment of the servo gain of the machine by the learning section; and a gain changing section that changes the servo gain of the machine, based on the action of adjustment of the servo gain determined by the decision making section; and the learning section performs the machine learning of the adjustment of the servo gain of the machine, based on the state data, the action of adjustment, and the reward calculated after the action of adjustment.
- The control system further includes a value function switching determination section that switches a value function used in the machine learning and the determination of the action of adjustment, based on a priority factor preset to the machine. In addition, a positive reward or a negative reward is calculated based on a reward condition set correspondingly to the priority factor. In addition, the control system is connected to at least one another high-order apparatus and mutually exchanges or shares the machine learning result with the other high-order apparatus. A machine learning device according to another embodiment of the present invention performs machine learning of an adjustment of servo gain used in machining by at least one machine that machines a workpiece. The machine learning device includes: a state observation section that observes machine information on the machine as state data; a determination data acquisition section that acquires information on machining by the machine as determination data; a reward calculation section that calculates a reward based on the determination data and preset reward conditions; a learning section that performs the machine learning of the adjustment of the servo gain of the machine; a decision making section that determines an action of adjustment of the servo gain of the machine, based on the state data and a machine learning result of the adjustment of the servo gain of the machine by the learning section; and a gain changing section that changes the servo gain of the machine, based on the action of adjustment of the servo gain determined by the decision making section, wherein the learning section performs the machine learning of the adjustment of the servo gain of the machine, based on the state data, the action of adjustment, and the reward calculated after the action of adjustment. The machine learning device further includes: a value function switching determination section that switches a value function used in the machine learning and the determination of the action of adjustment, based on a priority factor preset to the machine. According to an embodiment of the present invention, it is possible to estimate the combination of gain that improves a priority factor with respect to each machine that is to be controlled and apply an estimated result to machine control by a controller to improve the priority factor in the machine in real time and automatically. In addition, since an operator has no need to adjust gain for each machine and may automatically optimize the gain of all machines in an edge environment and in a unified way, time and effort for adjusting the gain may be eliminated. Moreover, since a value function is updated by the operating conditions of a separate machine and an estimated result of optimum gain and the shared value function may be used in the learning of another machine, the optimum gain may be automatically and efficiently estimated and set.
- The above and other objects and features of the present invention will become apparent from the descriptions of the following embodiments with reference to the accompanying drawings in which;
-
FIG. 1 is a diagram for describing the basic concept of a reinforcement learning algorithm; -
FIG. 2 is a schematic diagram showing a neuron model; -
FIG. 3 is a schematic diagram showing a neural network having weights of three layers; -
FIG. 4 is a diagram showing an image on the machine learning of a control system according to an embodiment of the present invention; -
FIG. 5 is a schematic function block diagram of the control system according to the embodiment of the present invention; and -
FIG. 6 is a flowchart showing the flow of the machine learning according to the embodiment of the present invention. - Hereinafter, a description will be given of embodiments of the present invention with reference to the drawings.
- In the present invention, a machine learning device acting as artificial intelligence is introduced into a high-order apparatus (such as a control apparatus and a field computer) placed in a rank higher than at least one or more controllers that control respective machines to perform the machine learning of the adjustment of servo gain used in the control the machines by the respective controllers with respect to information on respective axes, machine operating conditions, and priority factors. Thus, the combination of gain by which the priority factors are improved may be automatically calculated.
- Hereinafter, a description will be briefly given of machine learning to be introduced into the present invention.
- Here, machine learning will be briefly described. The machine learning is realized in such a way that useful rules, knowledge expressions, determination criteria, or the like are extracted by analysis from sets of data input to a device that performs the machine learning (hereinafter called a machine learning device), determination results of the extraction are output, and learning of knowledge is performed. Although the machine learning is performed according to various methods, the methods are roughly classified into “supervised learning,” “unsupervised learning,” and “reinforcement learning.” In addition, in order to realize such methods, there is a method called “deep learning” by which to learn the extraction of feature amounts per se.
- The “supervised learning” is a model by which sets of input and result (label) data are given to a machine learning device in large amounts to learn the features of the data sets and estimate results from inputs, i.e., a method by which to learn the relationship between inputs and results may be inductively obtained. The method may be realized using an algorithm such as a neural network that will be described later.
- The “unsupervised learning” is a learning method by which a device learns, with the reception of only large amounts of input data, as to how the input data is distributed and applies compression, classification, shaping, or the like to the input data even if corresponding supervised output data is not given. The features of the data sets may be arranged in clusters as two of a kind. Using the results, any standard is set to allocate outputs so as to be optimized. Thus, the prediction of the outputs may be realized. In addition, as an intermediate problem setting between the “unsupervised learning” and the “supervised learning,” there is a method called “semi-supervised learning” in which some parts are given sets of input and output data while the other part is given only input data. In an embodiment, since data that may be acquired even if a machining machine does not actually operate is used in the unsupervised learning, learning may be efficiently performed.
- The “reinforcement learning” is a method by which to learn not only determinations or classifications but also actions to perform learning of optimum actions in consideration of interactions given to environments by actions, i.e., learning to maximize rewards that will be obtained in the future. In the reinforcement learning, a machine learning device may start learning in a state in which the machine learning device does not completely know or imperfectly knows results brought about by actions. In addition, a machine learning device may start learning from a desirable start point in an initial state in which prior learning (a method such as the above supervised learning and inverse reinforcement learning) is performed in such a way as to imitate human's actions.
- Note that when machine learning is applied to a machining machine, it is necessary to consider the fact that results may be obtained as data only after the machining machine actually operates, i.e., searching of optimum actions is performed by a trial and error approach. In view of the above circumstances, the present invention employs, as the principal learning algorithm of a machine learning device, the algorithm of reinforcement learning by which the machine learning device is given rewards to automatically learn actions to achieve a goal.
-
FIG. 1 is a diagram for describing the basic concept of a reinforcement learning algorithm. In reinforcement learning, agent learning and an action are advanced by the interactions between an agent (machine learning device) acting as a learning subject and an environment (control target system) acting as a control target. More specifically, the following interactions are performed between the agent and the environment. - (1) The agent observes an environmental state st at certain time.
- (2) The agent selects and performs an action at that he/she is allowed to take based on an observation result and past learning.
- (3) The environmental state st changes to a next state st+1 after any rule and the action at are performed.
- (4) The agent accepts a reward rt−1 based on the state change as a result of the action at.
- (5) The agent advances the learning based on the state st, the action at, the reward rt+1, and a past learning result.
- At the initial stage of the reinforcement learning, the agent does not understand the standard of a value determination for selecting the optimum action at with respect to the environmental state st in the above action selection (2). Therefore, the agent selects various actions at in a certain state st and learns the selection of a better action, i.e., the standard of an appropriate value determination based on rewards rt−1 given with respect to the actions at at that time.
- In the learning of the above machining (5), the agent acquires the mapping of an observed state st, an action at, and a reward rt+1 as reference information for determining an amount of a reward that he/she is allowed to obtain in the future. For example, when the number of states that the agent is allowed to take at each time is m and the number of actions that the agent is allowed to take is n, the agent obtains a two-dimensional arrangement of m×n, in which rewards rt+1 corresponding to pairs of states st and actions at are stored, by repeatedly performing actions.
- Then, with a value function (evaluation function) indicating to what degree a state or an action selected based on the above acquired mapping is valuable, the agent updates the value function (evaluation function) while repeatedly performing actions to learn an optimum action corresponding to a state.
- A “state value function” is a value function indicating to what degree a certain state st is valuable. The state value function is expressed as a function using a state as an argument and updated based on a reward obtained with respect to an action in a certain state, a value of a future state changed with the action, or the like in learning from repeated actions. The update formula of the state value function is defined according to a reinforcement learning algorithm. For example, in temporal-difference (TD) learning indicating as one of reinforcement learning algorithms, the state value function is updated by the following
Formula 1. Note that inFormula 1, α is called a learning coefficient, γ is called a discount rate, and the learning coefficient and the discount rate are defined to fall within 0<α≤1 and 0<γ≤1, respectively. -
V(st)←V(st)+α[rt+1+γV(st+1)−V(st)] [Math. 1] - In addition, an “action value function” is a value function indicating to what degree an action at is valuable in a certain state st. The action value function is expressed as a function using a state and an action as arguments and updated based on a reward obtained with respect to an action in a certain state, an action value of a future state changed with the action, or the like in learning from repeated actions. The update formula of the action value function is defined according to a reinforcement learning algorithm. For example, in Q-learning indicating as one of typical reinforcement learning algorithms, the action value function is updated by the following
Formula 2. Note that inFormula 2, α is called a learning coefficient, γ is called a discount rate, and the learning coefficient and the discount rate are defined to fall within 0<α≤1 and 0<γ≤1, respectively. -
-
Formula 2 expresses a method for updating an evaluation value Q(st, at) of an action at in a state st based on a reward rt+1 returned as a result of the action at. It is indicated byFormula 2 that Q(st, at) is increased if an evaluation value Q(st−1, max(a)) of the best action max(a) in a next state as a result of the reward rt+1 and the action at is larger than the evaluation value Q(st, at) of the action at in the state st, while Q(st, at) is decreased if not. That is, a value of a certain action in a certain state is made closer to a value of a reward immediately returned as a result of the action and the best action in a next state accompanied by the action. - In Q-learning, such an update is repeatedly performed to finally set Q(st, at) at an expected value E[Σγtrt] (the expected value is one taken when a state is changed according to an optimum action. Since the expected value is unknown as a matter of course, it is necessary to learn the expected value by search).
- Further, in the selection of an action in the above machining (2), an action at by which a reward (rt+1+rt+2+ . . . ) over a future becomes maximum in a current state st (an action for changing to a most valuable state when a state value function is used or a most valuable action in the state when an action value function is used) is selected using a value function (evaluation function) generated by past learning. Note that during learning, an agent may select a random action with a constant probability for the purpose of advancing the learning in the selection of an action in the above machining (2) (ε greedy method).
- Note that in order to store a value function (evaluation function) as a learning result, there are a method for retaining values of all the pairs (s, a) of states and actions in a table form (action value table) and a method for preparing a function for approximating the above value function. According to the latter method, the above update formula may be realized by adjusting parameters of an approximate function based on a method such as stochastic gradient descent. For the approximate function, a supervised learning device such as a neural network may be used.
- The neural network is constituted by a calculation unit, a memory, or the like that realizes a neural network following a neuron model as shown in, for example,
FIG. 2 .FIG. 2 is a schematic diagram showing a neuron model. - As shown in
FIG. 2 , a neuron outputs an output y with respect to a plurality of inputs x (here, inputs x1 to x3 as an example). Correspondingly to input x, a weight w (w1 to w3) is placed on each of the inputs x1 to x3. Thus, the neuron outputs the output y expressed by the followingFormula 3. Note that in the followingFormula 3, an input x, an output y, and a weight w are all vectors. In addition, θ indicates a bias, and fk indicates an activation function. -
y=f k(Σi=1 n x i w i−θ) [Math. 3] - Next, a description will be given, with reference to
FIG. 3 , of a neural network having weights of three layers in which the above neurons are combined together.FIG. 3 is a schematic diagram showing a neural network having weights of three layers D1 to D3. As shown inFIG. 3 , a plurality of inputs x (here, inputs x1 to x3 as an example) is input from the left side of the neural network, and results y (here, results y1 to y3 as an example) are output from the right side of the neural network. - Specifically, when the inputs x1 to x3 are input to three neurons N11 to N13 respectively, weights are placed correspondingly on the inputs x1 to x3. The weights placed on the inputs are collectively indicated as w1. The neurons N11 to N13 output z11 to z13, respectively.
- Z11 to Z13 are collectively indicated as a feature vector z1, and may be regarded as vectors obtained by extracting feature amounts of the input vectors. The feature vector z1 is a feature vector between the weight w1 and a weight w2.
- When z11 to z13 are input to two neurons N21 and N22 respectively, weights are placed correspondingly on z11 to z13. The weights placed on the feature vectors are collectively indicated as w2. The neurons N21 and N22 output z21 and z22, respectively. z21 and z22 are collectively indicated as a feature vector z2. The feature vector z2 is a feature vector between the weight w2 and a weight w3.
- When the feature vectors z21 and z22 are input to three neurons N31 to N33 respectively, weights are placed correspondingly on the feature vectors z21 and z22. The weights placed on the feature vectors are collectively indicated as w3.
- Finally, the neurons N31 to N33 output the results y1 to y3, respectively.
- The action of the neural network includes a learning mode and a prediction mode. A learning data set is used to learn the weight w in the learning mode, and the parameters are used to determine the action of a machining machine in the prediction mode (here, “prediction” is only for the sake of convenience, but various tasks such as detection, classification, and deduction may be performed).
- It is possible to immediately learn data obtained when a controller actually controls a machine in the prediction mode and reflect the learning data on a next action (online learning), or is possible to perform collective learning using a previously-collected data group and thereafter perform a detection mode using the parameters at all times (batch learning). It is also possible to perform an intermediate mode, i.e., a learning mode that is performed every time data is accumulated by a certain degree.
- Learning of the weights w1 to w3 is made possible by back propagation. Error information enters from the right side and flows to the left side. The back propagation is a method for adjusting (learning) each of the weights to reduce a difference between the output y obtained when the input x is input and a real output y (supervised) for each of the neurons.
- The neural network may have three or more layers (called deep learning). It is possible to automatically obtain a calculation unit that extracts the features of inputs on a step-by-step basis and performs the regression of a result only from supervised data.
- When such a neural network is used as an approximate function, the above value function (evaluation function) may be stored as the neural network to advance learning while the above machining (1) to (5) in the above reinforcement learning is repeatedly performed.
- Even in a new environment after the completion of learning in a certain environment, a machine learning device may generally advance the learning so as to be adapted to the new environment by performing additional learning. Accordingly, as in the present invention, by the application of learning to the adjustment of servo gain used to control a machine by each controller, additional learning under new machining preconditions is performed based on the learning of the adjustment of past servo gain. Thus, it becomes possible to perform the learning of the adjustment of servo gain in a short period of time.
- In addition, reinforcement learning employs a system in which a plurality of agents is connected to each other via a network or the like, and information on states s, actions a, rewards r, or the like is shared between the agents and applied to each learning, whereby each of the agents performs dispersed reinforcement learning in consideration of the environments of the other agents to be able to perform efficient learning. Also in the present invention, when a plurality of agents (machine learning devices) incorporated in a plurality of environments (numerical controllers of lathe machining machines) performs dispersed machine learning in a state of being connected to each other via a network or the like, learning of the adjustment of a machining path of a turning cycle command and machining conditions in the numerical control controllers of the lathe machining machines may be efficiently performed.
- Note that although various methods such as Q-learning, an SARSA method, TD learning, and an AC method have been commonly known as reinforcement learning algorithms, any of the above reinforcement algorithms may be applied to the present invention. Since each of the reinforcement learning algorithms has been commonly known, its detailed description will be omitted in the specification.
- Hereinafter, a description will be given, based on a specific embodiment, of the control system of the present invention into which a machine learning device is introduced.
-
FIG. 4 is a diagram showing an image on the machine learning of the adjustment of servo gain used to control a machine by each controller in a control system into which a machine learning device according to an embodiment of the present invention is introduced. Note that inFIG. 4 , only configurations necessary for describing the machine learning in the control system of the embodiment are shown. - In the embodiment, information on each axis and machine operating conditions acquired from a
machine 3 indicating state information are input to amachine learning device 20 as information used by themachine learning device 20 to specify an environment (the state st described in 1. Machine Learning). - In the embodiment, the
machine learning device 20 outputs the action of adjustment of servo gain used to control the machine by each controller as an output to an environment (the action at described in 1. Machine Learning). - In a
control system 1 according to the embodiment, the above state information is defined by the information on each axis and the machine operating conditions acquired from themachine 3. Further, the above action of adjustment may be defined by an adjustment amount of the servo gain used to control the machine output from themachine learning device 20. - In addition, in the embodiment, a condition (positive/negative reward) defined by a priority factor and an operating result of the
machine 3 is employed as a reward (the reward rt described in 1. Machine Learning) to be given to themachine learning device 20. Note that an operator may appropriately set as to which data is used to determine a reward. - Moreover, in the embodiment, the
machine learning device 20 performs machine learning based on state information (input data), an action of adjustment (output data), and a reward described above. In the machine learning, a state st is defined by the combination of input data at certain time t, the adjustment of servo gain performed with respect to the defined state st is equivalent to an action at, and a value evaluated and calculated based on data on a machine operating result newly obtained as a result of the adjustment of the servo gain with the action at is equivalent to a reward rt+1. As in 1. Machine Learning described above, a state st, an action at, and a reward rt+1 are applied to the update formula of a value function (evaluation function) corresponding to a machine learning algorithm to advance the learning. - Hereinafter, a description will be given of a function block diagram of the control system.
-
FIG. 5 is the function block diagram of the control system according to the embodiment. When configurations shown inFIG. 5 are compared with the elements in the reinforcement learning shown inFIG. 1 , themachine learning device 20 corresponds to the agent and configurations such as themachine 3 other than themachine learning device 20 correspond to the environment. - The
control system 1 according to the embodiment is constituted by a high-order apparatus 2 having themachine learning device 20 and at least onemachine 3. Further, the high-order apparatus 2 is an apparatus placed in a rank higher than controllers that control themachines 3, and examples of the high-order apparatus 2 include a control apparatus, a field computer, a host computer, or the like. - Each of the
machines 3 acting as a facility inside a factory according to the embodiment has a machineinformation output section 30, a machininginformation output section 31, and a servogain setting section 32. The machineinformation output section 30 acquires information on themachine 3 such as temperature and rigidity of themachine 3, a type of a workpiece to be machined, a tool and a jig used in machining, a type of a mold, and types of a resin and clay used in machining and outputs the acquired information to the high-order apparatus 2. The machininginformation output section 31 acquires information on machining such as a position deflection rate and a fluctuation rate of an axis, machining cycle time, a maximum motor load value, consumption power, and a speed and a fluctuation rate of a motor and outputs the acquired information to the high-order apparatus 2. The servogain setting section 32 sets servo gain such as current loop gain, speed loop gain, position loop gain, and pressure control proportional/integral gain. - The machine
information output section 30 acquires information on themachine 3 from a setting memory (not shown) of themachine 3, a sensor (not shown) provided in each section of themachine 3, or the like and outputs the acquired information on themachine 3 in response to a request from the high-order apparatus 2. - The machining
information output section 31 monitors an output or the like of a sensor or a servo motor (not shown) provided in each section of themachine 3 when machining is performed, generates information on the machining based on data monitored at, for example, a timing at which one-cycle machining is completed, and outputs the generated information on the machining to the high-order apparatus 2. - The servo
gain setting section 32 sets the servo gain of a servo motor provided in themachine 3 in response to a request from the high-order apparatus 2. For the setting of servo gain, current loop gain, speed loop gain, position loop gain, pressure control proportional/integral gain, or the like may be set. - The
machine learning device 20 that performs machine learning performs the action of adjustment of the servo gain of themachine 3 when machining by themachine 3 is started, and performs the learning of the action of adjustment when the machining by themachine 3 with the servo gain adjusted by the action of adjustment is completed. - The
machine learning device 20 that performs machine learning has astate observation section 21, a determinationdata acquisition section 22, alearning section 23, a value function switchingdetermination section 24, a valuefunction updating section 25, areward calculation section 26, adecision making section 27, and again changing section 28. Themachine learning device 20 may be incorporated in the high-order apparatus 2 as shown inFIG. 5 or may be constituted by a personal computer or the like connected to the high-order apparatus 2. - The
state observation section 21 is a function unit that observes machine information output from the machineinformation output section 30 provided in themachine 3 as state data and acquires the observed information inside themachine learning device 20. Thestate observation section 21 outputs the observed state data to thelearning section 23. Thestate observation section 21 may temporarily store the observed state data on a memory (not shown) to be managed. The state data observed by thestate observation section 21 may be data acquired by the latest machining operation of themachine 3 or may be data acquired by the past machining operation of themachine 3. - The determination
data acquisition section 22 is a function unit that acquires machining information output from the machininginformation output section 31 provided in themachine 3 inside themachine learning device 20 as determination data. The determinationdata acquisition section 22 outputs the acquired determination data to thelearning section 23. The determinationdata acquisition section 22 may temporarily store the acquired determination data on the memory (not shown) to be managed together with state data acquired by thestate observation section 21. The determination data acquired by the determinationdata acquisition section 22 may be data acquired by the latest machining operation of themachine 3 or may be data acquired by the past machining operation of themachine 3. - The
learning section 23 performs the machine learning (reinforcement learning) of the action of adjustment of servo gain with respect to machine information and machining information for each priority factor based on state data observed by thestate observation section 21, determination data acquired by the determinationdata acquisition section 22, and a reward calculated by thereward calculation section 26 that will be described later. In the machine learning performed by thelearning section 23, a state st is defined by the combination of state data at certain time t, the determination of the action of adjustment of the servo gain of themachine 3 by thedecision making section 27 that will be described later and the adjustment of the servo gain of themachine 3 by thegain changing section 28 that will be described later according to the defined state st are equivalent to an action at, and a value calculated by thereward calculation section 26 that will be described later based on the determination data acquired by the determinationdata acquisition section 22 as a result of the adjustment of the servo gain of themachine 3 and the machining of themachine 3 is equivalent to a reward rt+1. A value function used in the learning is determined according to an applied learning algorithm. For example, when Q-learning is used, it is only necessary to update an action value function Q(st, at) according to theabove Formula 2 to advance the learning. - The value function switching
determination section 24 performs the determination of the action of adjustment of servo gain with respect to themachine 3 and the switching of a value function used in machine learning based on a result of the action of adjustment of the servo gain with respect to themachine 3 based on the priority factor of eachmachine 3 set by a user. In a valuefunction storage section 40 provided on the memory (not shown) of themachine learning device 20, a plurality of value functions different for each priority factor of the machine is stored in advance. The value function switchingdetermination section 24 selectively switches a value function to be used by thelearning section 23, the valuefunction updating section 25, and thedecision making section 27 according to a priority factor set in themachine 3 that performs the action of adjustment of servo gain (or themachine 3 that performs the machine learning of the action of adjustment of the servo gain). By the switching of a value function for each priority factor with the value function switchingdetermination section 24 as described above, an improvement in the efficiency of the machine learning may be achieved. - The value
function updating section 25 stores a result of machine learning performed by thelearning section 23 in the valuefunction storage section 40 after applying the same to a value function selected by the value function switchingdetermination section 24. The value function acting as the learning result stored in the valuefunction storage section 40 by the valuefunction updating section 25 is used in machine learning and the determination of the action of adjustment of servo gain from the next time. As described above, a learning result may be stored in such a way that a value function corresponding to a machine learning algorithm to be used is stored in a supervised learning device such as a support vector machine (SVM) and a neural network of an approximate function, an arrangement, or a multiple-value output. - The
reward calculation section 26 performs the calculation of a reward to be used in machine learning based on reward conditions preset on the memory (not shown) and determination data acquired by the determinationdata acquisition section 22. - Hereinafter, a description will be given of an example of reward conditions set in the embodiment. Note that the following reward conditions are given only as examples and may be changed in terms of design. Alternatively, various other reward conditions may be set.
- When the machining accuracy of a machined workpiece falls within a proper range preset in a case in which the priority factor of the
machine 3 is set to an improvement in machining quality, a positive reward is given. On the other hand, when the machining accuracy falls outside the proper range preset (when the machining accuracy is too bad or the machining accuracy is too good), a negative reward is given according to the degree. Note that as for giving a negative reward, a large negative reward may be given when the machining accuracy is too bad and a small negative reward may be given when the machining accuracy is too good. - When cycle time is not largely deviated from a prescribed reference value preset in a case in which the priority factor of the
machine 3 is set to an improvement in productivity, a small positive reward is given. When the cycle time is shorter than the prescribed reference value set in advance, a positive reward is given according to the degree. On the other hand, when the cycle time is longer than the prescribed reference value set in advance, a negative reward is given according to the degree. - When consumption power is not largely deviated from a prescribed reference value set in advance in a case in which the priority factor of the
machine 3 is set to energy-saving performance, a small positive reward is given. When the consumption power is smaller than the prescribed reference value set in advance, a positive reward is given according to the degree. On the other hand, when the consumption power is larger than the prescribed reference value set in advance, a negative reward is given according to the degree. - The above reward conditions are preferably used in combination according to the priority factor rather than being used singly. For example, when the priority factor is set to an improvement in machining quality, merely setting a reward condition on the improvement in the machining quality does not suffice. That is, if reward conditions on an improvement in productivity and energy-saving performance are also set at the same time and an amount of a reward obtained when the reward condition on the improvement in the machining quality is satisfied is set at a value larger than an amount of a reward obtained when the reward conditions on the improvement in the productivity and the energy-saving performance are satisfied, the selection of an action of adjustment to maintain minimum productivity and energy-saving performance may be learned while the machining quality being prioritized. The same applies to a case in which the priority factor is set to an improvement in productivity or energy-saving performance.
- The
decision making section 27 determines the action of adjustment of the servo gain of themachine 3 based on a learning result learned by the learning section 23 (and stored in the value function storage section 40) and state data observed by thestate observation section 21. The determination of the action of adjustment of the servo gain here is equivalent to an action a used in machine learning. The action of adjustment of the servo gain may be performed in such a way that the selection of a type (current loop gain, speed loop gain, position loop gain, or pressure control proportional/integral gain) of the gain to be adjusted and an adjustment degree of the type of the selected gain are combined together, respective combinations are stored and prepared in an actionpattern storage section 41 as selectable actions (for example, anaction 1=the current loop gain is set at XX, anaction 2=the speed loop gain is set at +YY, . . . ), and an action by which the largest reward will be obtained in the future based on a past learning result is selected. The selectable actions may be actions by which a plurality of types of servo gain is adjusted at the same time or may be actions by which the servo gain of a plurality of servo motors provided in themachine 3 is adjusted at the same time. In addition, the above ε greedy method may be employed to select a random action with a constant probability for the purpose of advancing the learning of thelearning section 23 that will be described later. - Then, the
gain changing section 28 instructs the servogain setting section 32 of themachine 3 to adjust servo gain based on the action of adjustment of the servo gain of themachine 3 determined by thedecision making section 27. - A description will be given, with reference to the flowchart of
FIG. 6 , of the flow of machine learning performed by thelearning section 23. - Step SA01. When the machine learning starts, the
state observation section 21 observes machine data output from themachine 3 as state data. - Step SA02. The
learning section 23 specifies a current state st based on the state data observed by thestate observation section 21. - Step SA03. The
decision making section 27 selects an action at (action of adjustment of the servo gain of the machine 3) based on a past learning result and the state st specified in step SA02. - Step SA04. The
gain changing section 28 performs the action at selected in step SA03. - Step SA05. The
state observation section 21 observes machine information on themachine 3 as state information, and the determinationdata acquisition section 22 acquires machining information on themachine 3 as determination data. At this stage, the state of themachine 3 changes with a temporal transition from time t to time t+1 as a result of the action at performed in step SA04. - Step SA06. The
reward calculation section 26 calculates a reward rt+1 based on the determination data acquired in step SA05. - Step SA07. The
learning section 23 advances the machine learning based on the state st specified in step SA02, the action at selected in step SA03, and the reward rt+1 calculated in step SA06 and then returns to step SA02. - As described above, the
gain changing section 28 adjusts the servo gain of themachine 3 based on the decision making of thedecision making section 27, themachine 3 is controlled by the adjusted servo gain to machine a workpiece, state data is observed by thestate observation section 21, determination data is acquired by the determinationdata acquisition section 22, and the machine learning is repeatedly performed. Thus, a more excellent learning result may be acquired. - When the servo gain of the
machine 3 is actually adjusted using learning data that has been sufficiently subjected to the above machine learning, themachine learning device 20 may be operated so as not to perform new learning using the learning data that has been sufficiently subjected to the machine learning as it is. - In addition, the
machine learning device 20 that has completed the machine learning (or themachine learning device 20 in which completed learning data on othermachine learning devices 20 has been copied) may be attached to other high-order apparatuses 2 and operated using the learning data obtained when the sufficient machine learning was performed. - The
machine learning device 20 of the high-order apparatus 2 may perform the machine learning alone. However, when the high-order apparatus 2 provided in each of a plurality ofcontrol systems 1 further has a unit used to communicate with an outside, it becomes possible to send/receive and share a value function stored in each of the valuefunction storage sections 40. Thus, the machine learning may be more efficiently performed. For example, parallel learning is advanced between a plurality of the high-order apparatuses 2 in such a way that state data, determination data, and value functions acting as learning results are exchanged between the high-order apparatuses 2 while adjustment targets and adjustment amounts different between the plurality of high-order apparatuses 2 are fluctuated within a prescribed range. Thus, the learning may be more efficiently performed. - In order to exchange state data and learning data between a plurality of high-
order apparatuses 2 as described above, communication may be performed via a management apparatus (not shown), the high-order apparatuses 2 may directly communicate with each other, or a cloud may be used. However, for handling large amounts of data, a communication unit with a faster communication speed is preferably provided. - The embodiment of the present invention is described above. However, the present invention is not limited only to the example of the above embodiment and may be carried out in various aspects with appropriate modifications.
- For example, the relationships between the respective function units provided in the high-
order apparatus 2 described in the above embodiment are not limited to those shown in the function block diagram ofFIG. 5 . That is, functions may be divided in any unit or any hierarchical relationship may be established between the functions so long as configurations equivalent to the functions of the respective function units are provided. - In addition, the above embodiment describes the adjustment of the servo gain of the servo motor provided in the one
machine 3. However, for example, learning may be performed in such a way that the servo gain of a plurality ofmachines 3 arranged in a factory is adjusted at the same time to attain an improvement in overall energy-saving performance in the factory. In this case, a plurality of actions different in the combination of adjustment amounts or the like is registered in the actionpattern storage section 41 with consideration given to the combination of the adjustment of the servo gain of the plurality ofmachines 3 as an action. Then, thedecision making section 27 determines an action so that consumption power obtained from the plurality ofmachines 3 becomes small, and thelearning section 23 learns the determined action. Thus, machine learning that achieves the above object may be performed. - Moreover, the above embodiment describes the configuration in which a value function is switched for each priority factor by the value function switching
determination section 24. However, the priority factor may be added to the input data of thelearning section 23 to omit the value function switchingdetermination section 24. Thus, the efficiency of the machine learning for each priority factor reduces, but the same effect may be obtained when the machine learning is performed over a longer period of time. - The embodiment of the present invention is described above. However, the present invention is not limited only to the example of the above embodiment but may be carried out in various aspects with appropriate modifications.
Claims (6)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016-242572 | 2016-12-14 | ||
JP2016242572A JP6457472B2 (en) | 2016-12-14 | 2016-12-14 | Control system and machine learning device |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180164756A1 true US20180164756A1 (en) | 2018-06-14 |
US10564611B2 US10564611B2 (en) | 2020-02-18 |
Family
ID=62201952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/838,510 Active 2038-01-25 US10564611B2 (en) | 2016-12-14 | 2017-12-12 | Control system and machine learning device |
Country Status (4)
Country | Link |
---|---|
US (1) | US10564611B2 (en) |
JP (1) | JP6457472B2 (en) |
CN (1) | CN108227482B (en) |
DE (1) | DE102017011544A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10396919B1 (en) * | 2017-05-12 | 2019-08-27 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US10466658B2 (en) | 2017-01-24 | 2019-11-05 | Fanuc Corporation | Numerical controller and machine learning device |
CN110658785A (en) * | 2018-06-28 | 2020-01-07 | 发那科株式会社 | Output device, control device, and method for outputting evaluation function value |
US10574054B2 (en) * | 2018-01-16 | 2020-02-25 | Total Solar International | Systems and methods for controlling and managing thermostatically controlled loads |
EP3651081A1 (en) * | 2018-11-09 | 2020-05-13 | Siemens Aktiengesellschaft | Tuning of axis control of multi-axis machines |
US20210173357A1 (en) * | 2019-12-10 | 2021-06-10 | Canon Kabushiki Kaisha | Control method, control apparatus, mechanical equipment, and recording medium |
US11221611B2 (en) * | 2018-01-24 | 2022-01-11 | Milwaukee Electric Tool Corporation | Power tool including a machine learning block |
US11318565B2 (en) | 2018-08-24 | 2022-05-03 | Fanuc Corporation | Machining condition adjustment device and machine learning device |
US11481630B2 (en) | 2019-02-28 | 2022-10-25 | Fanuc Corporation | Machining condition adjustment device and machining condition adjustment system |
US11510136B2 (en) * | 2018-01-12 | 2022-11-22 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and apparatus for roaming between wireless communications networks |
US11960267B2 (en) | 2020-04-24 | 2024-04-16 | Yokogawa Electric Corporation | Control apparatus, control method, and storage medium |
US12005582B2 (en) | 2018-10-02 | 2024-06-11 | Fanuc Corporation | Controller and control system |
CN118409560A (en) * | 2024-07-02 | 2024-07-30 | 杭州励贝电液科技有限公司 | Rotary wheel hydraulic servo position control method and system for steel cylinder necking machine |
US12111621B2 (en) | 2019-07-23 | 2024-10-08 | Milwaukee Electric Tool Corporation | Power tool including a machine learning block for controlling a seating of a fastener |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018126796A (en) * | 2017-02-06 | 2018-08-16 | セイコーエプソン株式会社 | Control device, robot, and robot system |
JP7060546B2 (en) * | 2018-07-10 | 2022-04-26 | ファナック株式会社 | Tooth contact position adjustment amount estimation device, machine learning device, robot system and tooth contact position adjustment amount estimation system |
JP6740290B2 (en) * | 2018-07-17 | 2020-08-12 | ファナック株式会社 | Machine learning device, control device, and machine learning method |
JP6773738B2 (en) * | 2018-09-19 | 2020-10-21 | ファナック株式会社 | State judgment device and state judgment method |
JP6806746B2 (en) * | 2018-09-21 | 2021-01-06 | ファナック株式会社 | Motor control device |
JP6860541B2 (en) | 2018-10-29 | 2021-04-14 | ファナック株式会社 | Output device, control device, and evaluation function value output method |
JP6849643B2 (en) * | 2018-11-09 | 2021-03-24 | ファナック株式会社 | Output device, control device, and evaluation function and machine learning result output method |
US10739755B1 (en) * | 2019-01-31 | 2020-08-11 | Baker Hughes Oilfield Operations Llc | Industrial machine optimization |
JP7336856B2 (en) * | 2019-03-01 | 2023-09-01 | 株式会社Preferred Networks | Information processing device, method and program |
JP7302226B2 (en) | 2019-03-27 | 2023-07-04 | 株式会社ジェイテクト | SUPPORT DEVICE AND SUPPORT METHOD FOR GRINDER |
WO2020216452A1 (en) * | 2019-04-26 | 2020-10-29 | Siemens Aktiengesellschaft | State analysis of a system |
JP7410365B2 (en) * | 2019-07-10 | 2024-01-10 | 国立研究開発法人 海上・港湾・航空技術研究所 | Part placement system and part placement program |
CN110370076A (en) * | 2019-08-08 | 2019-10-25 | 合肥学院 | Free form surface in-situ measuring method based on machining tool |
JP6856162B2 (en) * | 2019-09-24 | 2021-04-07 | ダイキン工業株式会社 | Control system |
AU2020392948A1 (en) * | 2019-11-26 | 2022-07-14 | Daikin Industries, Ltd. | Machine learning device, demand control system, and air-conditioner control system |
JP7331660B2 (en) * | 2019-11-26 | 2023-08-23 | 横河電機株式会社 | Apparatus, method and program |
CN111046156B (en) * | 2019-11-29 | 2023-10-13 | 支付宝(杭州)信息技术有限公司 | Method, device and server for determining rewarding data |
CN115280077B (en) * | 2020-03-27 | 2024-03-08 | 三菱电机株式会社 | Learning device and reasoning device for air conditioner control |
WO2022003833A1 (en) * | 2020-06-30 | 2022-01-06 | 三菱電機株式会社 | Positioning control device and machine learning device |
JP2022070134A (en) * | 2020-10-26 | 2022-05-12 | 株式会社神戸製鋼所 | Machine learning method, machine learning device, machine learning program, communication method, and resin processing device |
CN114609976B (en) * | 2022-04-12 | 2024-08-30 | 天津航天机电设备研究所 | Homography and Q learning-based calibration-free visual servo control method |
AT526214A1 (en) * | 2022-05-23 | 2023-12-15 | Fill Gmbh | Optimizing a numerical control of a machine tool |
JP2023184198A (en) * | 2022-06-17 | 2023-12-28 | 株式会社日立製作所 | Federated learning system and federated learning method |
CN116599767B (en) * | 2023-07-12 | 2023-11-03 | 深圳市光网世纪科技有限公司 | Network threat monitoring system based on machine learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3259303B2 (en) * | 1992-01-28 | 2002-02-25 | セイコーエプソン株式会社 | Liquid crystal display |
US20050127051A1 (en) * | 2003-03-14 | 2005-06-16 | Hitachi Via Mechanics Ltd. | Laser machining apparatus |
US20070203871A1 (en) * | 2006-01-23 | 2007-08-30 | Tesauro Gerald J | Method and apparatus for reward-based learning of improved systems management policies |
US20080091446A1 (en) * | 2006-10-17 | 2008-04-17 | Sun Microsystems, Inc. | Method and system for maximizing revenue generated from service level agreements |
US20090099985A1 (en) * | 2007-10-11 | 2009-04-16 | Tesauro Gerald J | Method and apparatus for improved reward-based learning using adaptive distance metrics |
US20090187641A1 (en) * | 2006-03-29 | 2009-07-23 | Cong Li | Optimization of network protocol options by reinforcement learning and propagation |
US20120101960A1 (en) * | 2010-10-22 | 2012-04-26 | Chassang Sylvain | Method and system for the acquisition, exchange and usage of financial information |
JP5969676B1 (en) * | 2015-09-30 | 2016-08-17 | ファナック株式会社 | Machine learning device and machine learning method for optimizing frequency of tool correction of machine tool, and machine tool including the machine learning device |
US20170372226A1 (en) * | 2016-06-22 | 2017-12-28 | Microsoft Technology Licensing, Llc | Privacy-preserving machine learning |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2851901B2 (en) * | 1990-03-09 | 1999-01-27 | 株式会社日立製作所 | Learning control device |
JP3135738B2 (en) * | 1993-03-18 | 2001-02-19 | 三菱電機株式会社 | Numerical control unit |
JP2006302282A (en) | 2005-04-15 | 2006-11-02 | Fanuc Robotics America Inc | Method for optimizing robot program and robot control system |
JP4211831B2 (en) * | 2006-09-14 | 2009-01-21 | トヨタ自動車株式会社 | HYBRID VEHICLE, HYBRID VEHICLE CONTROL METHOD, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING PROGRAM FOR CAUSING COMPUTER TO EXECUTE THE CONTROL METHOD |
JP5461049B2 (en) * | 2009-04-07 | 2014-04-02 | 株式会社デンソー | Engine control device |
JP5758728B2 (en) * | 2011-07-26 | 2015-08-05 | 株式会社日立ハイテクノロジーズ | Charged particle beam equipment |
JP5733166B2 (en) * | 2011-11-14 | 2015-06-10 | 富士通株式会社 | Parameter setting apparatus, computer program, and parameter setting method |
CN103399488B (en) * | 2013-07-31 | 2018-01-09 | 中国人民解放军国防科学技术大学 | Multiple Model Control Method based on self study |
JP6308150B2 (en) * | 2015-03-12 | 2018-04-11 | トヨタ自動車株式会社 | Exhaust gas purification device for internal combustion engine |
JP5997330B1 (en) * | 2015-07-31 | 2016-09-28 | ファナック株式会社 | Machine learning apparatus capable of determining whether or not spindle replacement is required, spindle replacement determination apparatus, control apparatus, machine tool and production system, and machine learning method |
-
2016
- 2016-12-14 JP JP2016242572A patent/JP6457472B2/en active Active
-
2017
- 2017-12-12 US US15/838,510 patent/US10564611B2/en active Active
- 2017-12-13 DE DE102017011544.3A patent/DE102017011544A1/en active Pending
- 2017-12-14 CN CN201711337999.9A patent/CN108227482B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3259303B2 (en) * | 1992-01-28 | 2002-02-25 | セイコーエプソン株式会社 | Liquid crystal display |
US20050127051A1 (en) * | 2003-03-14 | 2005-06-16 | Hitachi Via Mechanics Ltd. | Laser machining apparatus |
US20070203871A1 (en) * | 2006-01-23 | 2007-08-30 | Tesauro Gerald J | Method and apparatus for reward-based learning of improved systems management policies |
US20090187641A1 (en) * | 2006-03-29 | 2009-07-23 | Cong Li | Optimization of network protocol options by reinforcement learning and propagation |
US20080091446A1 (en) * | 2006-10-17 | 2008-04-17 | Sun Microsystems, Inc. | Method and system for maximizing revenue generated from service level agreements |
US20090099985A1 (en) * | 2007-10-11 | 2009-04-16 | Tesauro Gerald J | Method and apparatus for improved reward-based learning using adaptive distance metrics |
US20120101960A1 (en) * | 2010-10-22 | 2012-04-26 | Chassang Sylvain | Method and system for the acquisition, exchange and usage of financial information |
JP5969676B1 (en) * | 2015-09-30 | 2016-08-17 | ファナック株式会社 | Machine learning device and machine learning method for optimizing frequency of tool correction of machine tool, and machine tool including the machine learning device |
US20170372226A1 (en) * | 2016-06-22 | 2017-12-28 | Microsoft Technology Licensing, Llc | Privacy-preserving machine learning |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10466658B2 (en) | 2017-01-24 | 2019-11-05 | Fanuc Corporation | Numerical controller and machine learning device |
US11032014B2 (en) | 2017-05-12 | 2021-06-08 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US11664910B2 (en) | 2017-05-12 | 2023-05-30 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US10541765B1 (en) | 2017-05-12 | 2020-01-21 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US10396919B1 (en) * | 2017-05-12 | 2019-08-27 | Virginia Tech Intellectual Properties, Inc. | Processing of communications signals using machine learning |
US11510136B2 (en) * | 2018-01-12 | 2022-11-22 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and apparatus for roaming between wireless communications networks |
US10574054B2 (en) * | 2018-01-16 | 2020-02-25 | Total Solar International | Systems and methods for controlling and managing thermostatically controlled loads |
US11221611B2 (en) * | 2018-01-24 | 2022-01-11 | Milwaukee Electric Tool Corporation | Power tool including a machine learning block |
US20220128973A1 (en) * | 2018-01-24 | 2022-04-28 | Milwaukee Electric Tool Corporation | Power tool including a machine learning block |
CN110658785A (en) * | 2018-06-28 | 2020-01-07 | 发那科株式会社 | Output device, control device, and method for outputting evaluation function value |
US11087509B2 (en) * | 2018-06-28 | 2021-08-10 | Fanuc Corporation | Output device, control device, and evaluation function value output method |
US11318565B2 (en) | 2018-08-24 | 2022-05-03 | Fanuc Corporation | Machining condition adjustment device and machine learning device |
US12005582B2 (en) | 2018-10-02 | 2024-06-11 | Fanuc Corporation | Controller and control system |
EP3651081A1 (en) * | 2018-11-09 | 2020-05-13 | Siemens Aktiengesellschaft | Tuning of axis control of multi-axis machines |
US11675331B2 (en) | 2018-11-09 | 2023-06-13 | Siemens Aktiengesellschaft | Tuning of axis control of multi-axis machines |
WO2020094779A1 (en) * | 2018-11-09 | 2020-05-14 | Siemens Aktiengesellschaft | Tuning of axis control of multi-axis machines |
US11481630B2 (en) | 2019-02-28 | 2022-10-25 | Fanuc Corporation | Machining condition adjustment device and machining condition adjustment system |
US12111621B2 (en) | 2019-07-23 | 2024-10-08 | Milwaukee Electric Tool Corporation | Power tool including a machine learning block for controlling a seating of a fastener |
US20210173357A1 (en) * | 2019-12-10 | 2021-06-10 | Canon Kabushiki Kaisha | Control method, control apparatus, mechanical equipment, and recording medium |
US11740592B2 (en) * | 2019-12-10 | 2023-08-29 | Canon Kabushiki Kaisha | Control method, control apparatus, mechanical equipment, and recording medium |
US11960267B2 (en) | 2020-04-24 | 2024-04-16 | Yokogawa Electric Corporation | Control apparatus, control method, and storage medium |
CN118409560A (en) * | 2024-07-02 | 2024-07-30 | 杭州励贝电液科技有限公司 | Rotary wheel hydraulic servo position control method and system for steel cylinder necking machine |
Also Published As
Publication number | Publication date |
---|---|
JP2018097680A (en) | 2018-06-21 |
DE102017011544A1 (en) | 2018-06-14 |
CN108227482B (en) | 2020-05-29 |
US10564611B2 (en) | 2020-02-18 |
CN108227482A (en) | 2018-06-29 |
JP6457472B2 (en) | 2019-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10564611B2 (en) | Control system and machine learning device | |
US10466658B2 (en) | Numerical controller and machine learning device | |
CN108241342B (en) | Numerical controller and machine learning device | |
US10589368B2 (en) | Machine learning device having function of adjusting welding positions of core in wire electric discharge machine | |
KR102168264B1 (en) | Wire electric discharge machine having movable axis abnormal load warning function | |
US20170060104A1 (en) | Numerical controller with machining condition adjustment function which reduces chatter or tool wear/breakage occurrence | |
US20170090459A1 (en) | Machine tool for generating optimum acceleration/deceleration | |
US10698380B2 (en) | Numerical controller | |
US20170090452A1 (en) | Machine tool for generating speed distribution | |
US10353351B2 (en) | Machine learning system and motor control system having function of automatically adjusting parameter | |
US9952574B2 (en) | Machine learning device, motor control system, and machine learning method for learning cleaning interval of fan motor | |
US10459424B2 (en) | Numerical controller for controlling tapping | |
US20190018392A1 (en) | Control apparatus and learning device | |
KR102382047B1 (en) | Automatic learning tuning system of motor controller using PSO | |
JP2019184575A (en) | Measurement operation parameter adjustment device, machine learning device, and system | |
WO2024180656A1 (en) | Learning device, control device, control system, learning method, and storage medium | |
US20190287189A1 (en) | Part supply amount estimating device and machine learning device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: FANUC CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YUMAGUCHI, TAKEHIRO;REEL/FRAME:044879/0217 Effective date: 20170927 |
|
AS | Assignment |
Owner name: FANUC CORPORATION, JAPAN Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE INVENTOR NAME PREVIOUSLY RECORDED ON REEL 044879 FRAME 0217. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:YAMAGUCHI, TAKEHIRO;REEL/FRAME:045611/0897 Effective date: 20170927 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |