WO2022180785A1 - 学習装置、学習方法及び記憶媒体 - Google Patents
学習装置、学習方法及び記憶媒体 Download PDFInfo
- Publication number
- WO2022180785A1 WO2022180785A1 PCT/JP2021/007341 JP2021007341W WO2022180785A1 WO 2022180785 A1 WO2022180785 A1 WO 2022180785A1 JP 2021007341 W JP2021007341 W JP 2021007341W WO 2022180785 A1 WO2022180785 A1 WO 2022180785A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- robot
- state
- learning
- function
- controller
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 46
- 230000006870 function Effects 0.000 claims abstract description 147
- 230000033001 locomotion Effects 0.000 claims abstract description 43
- 238000005457 optimization Methods 0.000 claims abstract description 43
- 238000004364 calculation method Methods 0.000 claims abstract description 36
- 238000011156 evaluation Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims description 18
- 230000009471 action Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 2
- 230000015654 memory Effects 0.000 description 19
- 238000012545 processing Methods 0.000 description 15
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 238000005259 measurement Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 5
- 239000012636 effector Substances 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/39—Robotics, robotics to robotics hand
- G05B2219/39244—Generic motion control operations, primitive skills each for special task
Definitions
- the present disclosure relates to the technical field of a learning device, a learning method, and a storage medium for learning about robot motions.
- Non-Patent Document 1 discloses a level set estimation method (LSE: Level Set Estimation), which is an estimation method using Gaussian process regression based on the concept of Bayesian optimization.
- Non-Patent Document 2 discloses truncated variance reduction (TRUVAR), which is another technique for estimating a level set function.
- One of the purposes of the present disclosure is to provide a learning device, a learning method, and a storage medium that are capable of suitably learning about the executable states of robot motions, in view of the above-described problems.
- One aspect of the learning device includes: An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot.
- optimization problem calculation means for setting and calculating the function value of the evaluation function that is the solution in the optimization problem;
- Executable state set learning means for learning an executable state set of the motion of the robot executed by the controller based on the function value; is a learning device having
- One aspect of the learning method comprises: the computer An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot.
- Set calculating the function value of the evaluation function that is the solution in the optimization problem; learning a set of possible states for actions of the robot performed by the controller based on the function values; It's a learning method.
- one aspect of the learning method is the computer For a system whose state changes by a robot that operates according to control parameters, the control parameters from a first state to a second state are obtained using a first model that expresses the relationship between a plurality of states and the control parameters. decide and The learning method determines a second model for evaluating a reachable initial state for a desired state of the system based on the first state and the determined control parameters.
- One aspect of the storage medium is An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot.
- Set calculating the function value of the evaluation function that is the solution in the optimization problem;
- a storage medium storing a program for causing a computer to execute a process of learning an executable state set of the robot motion executed by the controller based on the function value.
- FIG. 1 shows the configuration of a robot control system according to a first embodiment;
- A shows the hardware configuration of the learning device;
- B shows the hardware configuration of the robot controller;
- A A diagram showing a robot (manipulator) that grips an object and an object to be gripped in real space.
- B is a diagram representing the state shown in FIG. 3A in an abstract space.
- FIG. 4 is a block configuration diagram showing a control system related to skill execution; It is an example of the functional block of the learning device regarding update of the skill database. It is an example of functional blocks of a skill learning unit.
- FIG. 10 is an example of a flow chart showing update processing of a skill database by a learning device;
- FIG. It is an example of functional blocks of a skill learning unit in a modified example.
- 1 shows a schematic configuration of a learning device according to a second embodiment; It is an example of a flowchart executed by a learning device in the second embodiment.
- FIG. 1 shows the configuration of a robot control system 100 according to the first embodiment.
- a robot control system 100 mainly includes a learning device 1 , a storage device 2 , a robot controller 3 , a measuring device 4 and a robot 5 .
- the learning device 1 performs data communication with the storage device 2 via a communication network or by direct wireless or wired communication.
- the robot controller 3 performs data communication with the storage device 2, the measuring device 4, and the robot 5 via a communication network or direct wireless or wired communication.
- the learning device 1 obtains the motion of the robot 5 for executing a given task by self-supervised learning, and learns a set of states in which the motion can be executed. In this case, the learning device 1 performs learning (including learning of a set of states in which the skill can be executed) that is obtained by modularizing specific motions of the robot 5 for each motion. Then, the learning device 1 registers a tuple of information on learned skills (also referred to as a “skill tuple”) in the skill database 24 stored in the storage device 2 . The skill tuple contains various pieces of information necessary to execute the operations that we want to modularize. In this case, the learning device 1 generates skill tuples based on the detailed system model information 21, the low-level controller information 22, and the target parameter information 23 stored in the storage device 2. FIG.
- the storage device 2 stores information that the learning device 1 and the robot controller 3 refer to.
- the storage device 2 stores at least detailed system model information 21 , low-level controller information 22 , target parameter information 23 and skill database 24 .
- the storage device 2 may be an external storage device such as a hard disk connected to or built into the learning device 1 or the robot controller 3, or may be a storage medium such as a flash memory. 3 may be a server device or the like that performs data communication. Further, the storage device 2 may be composed of a plurality of storage devices, and each of the storage units described above may be held in a distributed manner.
- the detailed system model information 21 is information representing a detailed model of the robot 5 and the operating environment in the actual system in which the robot 5 operates (also referred to as a "detailed system model").
- the detailed system model information 21 may be a differential or difference equation representing a detailed system model, or may be a physical simulator.
- the low-level controller information 22 is information about the low-level controller that generates inputs for controlling the actual motion of the robot 5 based on the parameters output by the high-level controller. For example, when the high-level controller generates a trajectory for the robot 5, the low-level controller may generate a control input that follows the motion of the robot 5 according to the trajectory. Servo control by PID (Proportional Integral Differential) may be performed based on the parameters to be output.
- PID Proportional Integral Differential
- the target parameter information 23 is information representing parameters related to states or conditions to be satisfied for each skill to be learned.
- the target parameter information 23 includes target state information representing a target state (for example, in the case of a gripping operation, including information regarding stable gripping conditions such as form closure and force closure), execution time limit (time limit) It includes time information, general constraint condition information representing other general constraint conditions (for example, restrictions on the movable range of the robot 5, speed restrictions, input restrictions, etc.).
- the skill database 24 is a database of skill tuples prepared for each skill.
- a skill tuple consists of information about a high-level controller for executing the target skill, information about a low-level controller for executing the target skill, and a state set in which the target skill can be executed (executable state set ) and at least information about An executable state set is defined in an abstract space that abstracts the real space.
- the set of executable states can be expressed using, for example, Gaussian process regression or an approximation function of the level set function estimated by the level set estimation method.
- whether the feasible state set includes a certain state is determined by the value of the Gaussian process regression for the certain state (e.g., the mean value) or the value of the approximation function for the certain state. It can be determined by whether or not the constraint conditions to be satisfied are satisfied.
- the robot controller 3 formulates a motion plan for the robot 5 based on the measurement signal supplied by the measuring device 4, the skill database 24, etc., and causes the robot 5 to execute the planned motion. , and supplies the control command to the robot 5 .
- the robot controller 3 converts the task to be executed by the robot 5 into a sequence of tasks acceptable by the robot 5 for each time step. Then, the robot controller 3 controls the robot 5 based on the control command corresponding to the generated sequence execution command.
- a control command corresponds to a control input output by a low-level controller.
- the measuring device 4 is one or more sensors such as a camera, a range sensor, a sonar, or a combination thereof that detect the state in the workspace where the task by the robot 5 is executed.
- the measurement device 4 supplies the generated measurement signal to the robot controller 3 .
- the measurement device 4 may be a self-propelled or flying sensor (including a drone) that moves within the work space.
- the measuring device 4 may also include sensors provided on the robot 5 and sensors provided on other objects in the work space.
- the measurement device 4 may also include a sensor that detects sound within the work space. In this way, the measuring device 4 may include various sensors that detect conditions within the work space, and may include sensors provided at arbitrary locations.
- the robot 5 performs work related to designated tasks based on control commands supplied from the robot controller 3 .
- the robot 5 is, for example, a robot that operates in various factories such as an assembly factory and a food factory, or a physical distribution site.
- the robot 5 may be a vertical articulated robot, a horizontal articulated robot, or any other type of robot.
- the robot 5 may supply a status signal to the robot controller 3 indicating the status of the robot 5 .
- This state signal may be an output signal of a sensor that detects the state (position, angle, etc.) of the entire robot 5 or a specific part such as a joint, or a signal indicating the progress of the operation of the robot 5. .
- the configuration of the robot control system 100 shown in FIG. 1 is an example, and various modifications may be made to the configuration.
- the robot controller 3 and the robot 5 may be configured integrally.
- at least any two of the learning device 1, the storage device 2, and the robot controller 3 may be integrated.
- FIG. 2A shows the hardware configuration of the learning device 1.
- the learning device 1 includes a processor 11, a memory 12, and an interface 13 as hardware.
- Processor 11 , memory 12 and interface 13 are connected via data bus 10 .
- the processor 11 functions as a controller (arithmetic device) that controls the entire learning device 1 by executing programs stored in the memory 12 .
- the processor 11 is, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a TPU (Tensor Processing Unit).
- Processor 11 may be composed of a plurality of processors.
- Processor 11 is an example of a computer.
- the memory 12 is composed of various volatile and nonvolatile memories such as RAM (Random Access Memory), ROM (Read Only Memory), and flash memory.
- the memory 12 also stores a program for executing the processing that the learning device 1 executes. Note that part of the information stored in the memory 12 may be stored in one or more external storage devices (for example, the storage device 2) that can communicate with the learning device 1. It may be stored by a medium.
- the interface 13 is an interface for electrically connecting the learning device 1 and other devices. These interfaces may be wireless interfaces such as network adapters for wirelessly transmitting and receiving data to and from other devices, or hardware interfaces for connecting to other devices via cables or the like. For example, the interface 13 performs an interface operation with an input device such as a touch panel, button, keyboard, voice input device that receives user input (external input), a display device such as a display or a projector, and a sound output device such as a speaker.
- an input device such as a touch panel, button, keyboard, voice input device that receives user input (external input)
- a display device such as a display or a projector
- a sound output device such as a speaker.
- the hardware configuration of the learning device 1 is not limited to the configuration shown in FIG. 2(A).
- the learning device 1 may incorporate at least one of a display device, an input device, and a sound output device.
- the learning device 1 may be configured including a storage device 2 .
- FIG. 2(B) shows the hardware configuration of the robot controller 3.
- the robot controller 3 includes a processor 31, a memory 32, and an interface 33 as hardware.
- Processor 31 , memory 32 and interface 33 are connected via data bus 30 .
- the processor 31 functions as a controller (arithmetic device) that performs overall control of the robot controller 3 by executing programs stored in the memory 32 .
- the processor 31 is, for example, a processor such as a CPU, GPU, or TPU.
- the processor 31 may be composed of multiple processors.
- the memory 32 is composed of various volatile and nonvolatile memories such as RAM, ROM, and flash memory.
- the memory 32 also stores a program for executing the process executed by the robot controller 3 .
- Part of the information stored in the memory 32 may be stored in one or a plurality of external storage devices (for example, the storage device 2) that can communicate with the robot controller 3. It may be stored by a medium.
- the interface 33 is an interface for electrically connecting the robot controller 3 and other devices. These interfaces may be wireless interfaces such as network adapters for wirelessly transmitting and receiving data to and from other devices, or hardware interfaces for connecting to other devices via cables or the like.
- the hardware configuration of the robot controller 3 is not limited to the configuration shown in FIG. 2(B).
- the robot controller 3 may incorporate at least one of a display device, an input device, and a sound output device. Further, the robot controller 3 may be configured including the storage device 2 .
- the abstract space robot controller 3 formulates a motion plan for the robot 5 in the abstract space based on the skill tuple. Therefore, the abstract space targeted in the motion planning of the robot 5 will be described.
- FIG. 3(A) is a diagram showing a robot (manipulator) 5 that grips an object and a grip target object 6 in real space.
- FIG. 3B is a diagram representing the state shown in FIG. 3A in an abstract space.
- the shape of the end effector of the robot 5, the geometric shape of the object to be gripped 6, the gripping position/orientation of the robot 5, and the shape of the object to be gripped 6 are: Strict calculations are required in consideration of object characteristics and the like.
- the robot controller 3 formulates a motion plan in an abstract space in which the state of each object such as the robot 5 and the graspable object 6 is abstractly (simplifiedly) represented. In the example of FIG.
- an abstract model 5x corresponding to the end effector of the robot 5, an abstract model 6x corresponding to the gripping target object 6, and a gripping operation of the gripping target object 6 by the robot 5 can be executed.
- a region (see dashed frame 60) is defined.
- the state of the robot in the abstract space is abstractly (simplifiedly) expressed as the state of the end effector.
- the state of each object corresponding to the operation target object or the environmental object is also represented abstractly, for example, in a coordinate system based on a reference object such as a workbench.
- the robot controller 3 in this embodiment uses the skill to formulate an action plan in an abstract space that abstracts the actual system.
- the robot controller 3 formulates a motion plan for executing a skill for performing a grip in a grippable region (dashed-line frame 60) defined in the abstract space.
- a control command for the robot 5 is generated based on.
- State x′ is represented as a vector (abstract state vector). vector representing the state of the end effector of the , vector representing the state of the environmental object.
- state x' is defined as a state vector abstractly representing the state of some elements in the real system.
- FIG. 4 is a block configuration diagram showing a control system for skill execution.
- the processor 31 of the robot controller 3 functionally has a motion planner 34 , a high-level controller 35 and a low-level controller 36 .
- the system 50 corresponds to an actual system.
- a speech balloon representing a diagram (see FIG. 3B) exemplifying the abstract space targeted by the motion planning unit 34 is displayed in association with the motion planning unit 34
- the system 50 A balloon representing a diagram (see FIG. 3A) exemplifying the corresponding real system is displayed in association with the system 50 .
- balloons representing information about the set of skill executable states are displayed in association with the high-level control section 35 .
- the motion planning unit 34 formulates a motion plan for the robot 5 based on the state x' in the abstract system and the skill database 24.
- the motion planning unit 34 expresses the target state by, for example, a logical expression based on temporal logic.
- the operation planning unit 34 may express the logical formula using arbitrary temporal logic such as linear temporal logic, MTL (Metric Temporal Logic), STL (Signal Temporal Logic). Then, the operation planning unit 34 converts the generated logical expression into a sequence (operation sequence) for each time step. This action sequence includes, for example, information about the skill to use at each time step.
- an approximation function of the level set function that can determine whether or not the skill belongs to the executable state set ⁇ 0 ′ is defined as “ ⁇ ”
- the robot controller 3 determines that the state x 0 ′ belongs to the executable state set ⁇ 0 ' belongs to g ⁇ (x 0 ') ⁇ 0 can be determined by determining whether or not is satisfied.
- the condition can be said to represent a constraint that determines the feasibility of a state.
- the function “ ⁇ ” can be said to be a model that can evaluate whether a given goal state can be reached from some initial state x 0 ′.
- the goal state set which is a set of goal states in the abstract space after execution of the target skill, is " ⁇ ' d ", the length of time required to execute the target skill (execution time length) is "T”, and the initial Assuming that the state after T time from the state x 0 ′ is “x′(T)”, x′(T) ⁇ ′ d can be realized by using the low-level controller 36 .
- the approximation function ⁇ is obtained by learning by the learning device 1 as will be described later.
- the low-level controller 36 generates an input “u” based on the control parameter ⁇ generated by the high-level controller 35 and the current real system state x obtained from the system 50 .
- the low-level control unit 36 generates the input u as a control command as shown in the following equation based on the low-level controller " ⁇ L " included in the skill tuple.
- u ⁇ L (x, ⁇ )
- the low-level controller ⁇ L is not limited to the form of the above formula, and controllers having various forms may be used.
- the low-level control unit 36 recognizes the state x of the robot 5 and the environment using any state recognition technology based on the measurement signal (which may include the signal from the robot 5) output by the measuring device 4. Get the state as state x.
- the operator “ ⁇ ” represents differentiation with respect to time or difference with respect to time.
- FIG. 5 is an example of functional blocks of the learning device 1 relating to update of the skill database.
- the processor 11 of the learning device 1 functionally has an abstract system model setting unit 14 , a skill learning unit 15 , and a skill tuple generating unit 16 .
- FIG. 5 shows an example of data exchanged between blocks, but the present invention is not limited to this. The same applies to other functional block diagrams to be described later.
- the abstract system model setting unit 14 sets an abstract system model based on the detailed system model information 21.
- This abstract system model is an operation plan having an abstract state vector x' constructed based on the state x of the detailed system model (model corresponding to the system 50 in FIG. 4) specified by the detailed system model information 21. It is a simple model for In this case, the abstract system model setting unit 14 calculates an abstract system model from the detailed system model, for example, based on an algorithm stored in advance in the storage device 2 or the like. Note that if information about the abstract system model is stored in advance in the storage device 2 or the like, the abstract system model setting unit 14 may acquire the information about the abstract system model from the storage device 2 or the like.
- the abstract system model setting unit 14 supplies information about the set abstract system model to the skill learning unit 15 and the skill tuple generation unit 16, respectively.
- the skill learning unit 15 acquires the abstract system model set by the abstract system model setting unit 14, detailed system model information 21, low-level controller information 22, and target parameter information 23 (target state information, execution time information and general constraint (including condition information) to learn about the skills to be generated.
- the skill learning unit 15 learns the executable state set ⁇ 0 ′ of the skill executed by the low-level controller ⁇ L , and learns the states included in the learned executable state set ⁇ 0 ′.
- the learning of the high- level controller ⁇ H that outputs the value of the control parameter ⁇ of the level controller ⁇ L (the value that satisfies the judgment condition that it is suitable, for example, the optimum value) is performed. Detailed processing of the skill learning unit 15 will be described later.
- the skill tuple generation unit 16 generates information about the executable state set ⁇ 0 ′ learned by the skill learning unit 15, information about the high-level controller ⁇ H , and information about the abstract system model set by the abstract system model setting unit 14. , low-level controller information 22 and target parameter information 23 (tuple) is generated as a skill tuple.
- the skill tuple generator 16 then registers the generated skill tuple in the skill database 24 .
- each component of the abstract system model setting unit 14, the skill learning unit 15, and the skill tuple generating unit 16 can be realized by the processor 11 executing a program, for example. Further, each component may be realized by recording necessary programs in an arbitrary nonvolatile storage medium and installing them as necessary. Note that at least part of each of these components may be realized by any combination of hardware, firmware, and software, without being limited to being implemented by program software. Also, at least part of each of these components may be implemented using a user-programmable integrated circuit, such as an FPGA (Field-Programmable Gate Array) or a microcontroller. In this case, this integrated circuit may be used to implement a program composed of the above components.
- FPGA Field-Programmable Gate Array
- each component may be composed of an ASSP (Application Specific Standard Produce), an ASIC (Application Specific Integrated Circuit), or a quantum computer control chip.
- ASSP Application Specific Standard Produce
- ASIC Application Specific Integrated Circuit
- quantum computer control chip a quantum computer control chip.
- each component may be realized by various hardware. The above also applies to other embodiments described later.
- each of these components may be implemented by cooperation of a plurality of computers using, for example, cloud computing technology.
- FIG. 6 is an example of functional blocks of the skill learning unit 15. As shown in FIG.
- the skill learning unit 15 functionally includes an optimal control problem calculation unit 51 , a level set learning unit 52 , a level set approximation unit 53 and a high level controller learning unit 54 .
- the optimal control problem calculation unit 51 calculates reachability to the goal state set ⁇ ′ d of the solution with the initial state in the abstract state being x 0 ′. determine gender. In this case, it is assumed that the state at the time when T time has elapsed from the initial state x 0 ' is "x'(T)" and the target state set ⁇ 'd is given as a state set satisfying g(x') ⁇ 0. Then, the control problem calculator 51 sets an optimum control problem (optimization problem) for minimizing the evaluation function g(x'(T)).
- the optimum control problem calculation unit 51 calculates the value of the control parameter ⁇ , which is the solution obtained according to the optimum control problem (the value that satisfies the conditions for determining suitability as a solution, for example, the optimum value, hereinafter “optimum Also called a control parameter ⁇ * ”) and a function value “g * ” are obtained. Then, when the function value g * satisfies “g * ⁇ 0”, the optimal control problem calculation unit 51 determines that the transition from the initial state x 0 ′ to the target state set ⁇ ′ d is feasible.
- the initial state x 0 ′ is specified by the level set learning section 52 and the high-level controller learning section 54, as will be described later. Details of the processing of the optimum control problem calculator 51 will be further described in the section "(6-2) Details of the optimum control problem calculator".
- the level-set learning unit 52 learns a level-set function representing the executable state set ⁇ 0 ′ in the abstract space of the low-level controllers of the target skill.
- the level set learning unit 52 requests the optimal control problem calculation unit 51 to calculate an optimal control problem specifying the initial state x 0 ′ of the state x′, and the specified initial state x 0 ′ and the optimal A level set function is learned based on a plurality of combinations with the function value g * supplied as a response from the control problem calculator 51 .
- the level set learning unit 52 uses the level set estimation method to identify an approximation function of the level set function as an example of the level set function “g GP ” obtained through Gaussian process regression. Details of the processing of the level set learning unit 52 will be further described in the section "(6-3) Details of level set learning unit".
- the level-set approximation unit 53 determines an approximation function g ⁇ (also referred to as a "level-set approximation function g ⁇ ") obtained by simplifying the level-set function by polynomial approximation or the like, taking into consideration the calculation cost of the level-set function in the motion plan. do.
- the level set approximation unit 53 determines the inner set of the level set approximation function g ⁇ (x 0 ') and the inner set of the level set function g GP "g GP (x 0 ') ⁇ 0” satisfies the relationship shown below.
- the optimal control problem calculator 51 determines reachability to the target state set ⁇ ′ d in the abstract state by the optimal control problem.
- the target state set ⁇ ' d is expressed by the following equation using the evaluation function g.
- the optimum control problem calculator 51 sets the problem of whether the transition from the initial state x 0 ′ in the abstract system to the target state set ⁇ ′ d can be realized in an actual system as an optimum control problem. Specifically, the optimum control problem calculator 51 sets an optimum control problem represented by the following equation (2) for obtaining a control parameter ⁇ that minimizes the evaluation function g.
- c is a function representing a constraint condition and is a function specified based on the target parameter information 23 .
- T represents the length of execution time
- x(t) represents the state x at the point of time t elapsed from the state x0 in the real system corresponding to the initial state x0'.
- Equation (2) is a model representing the relationship between the first state (x 0 '), the second state (X(T)), and the control parameter ( ⁇ ).
- the process of obtaining the function value g * can also be said to be the process of determining the function value g * using a model (equation (2)) including constraints on state changes.
- the solution of the model shown in Equation (2) does not have to be a mathematically optimal solution, and may be any value that satisfies the determination conditions for determining that it is a solution.
- the optimal control problem calculation unit 51 may use, for example, the Direct Collaboration method, the differential dynamics Any optimal control algorithm, such as Differential Dynamic Programming (DDP), can be used to solve.
- the optimal control problem calculation unit 51 uses a model-free optimal control method such as Path Integral Control. can be used to solve the optimal control problem shown in equation (2).
- the optimum control problem calculator 51 obtains the control parameter ⁇ according to the problem of minimizing the evaluation function g based on the function c representing the constraint.
- the detailed model uses a physical simulator capable of acquiring information on the state x, the input u, and the contact force F, which is the force with which the grip target object 6 is gripped.
- the target state information of the target parameter information 23 is information regarding stable gripping conditions such as form closure and force closure, and is expressed by the following formula. g(x, F) ⁇ 0
- the execution time information of the target parameter information 23 includes information designating the upper limit "T max " (T ⁇ T max ) of the execution time length T of the skill.
- the general constraint information of the target parameter information 23 includes information representing the following constraint equations regarding the state x, the input u, and the contact force F. c(x,u,F) ⁇ 0
- this constraint formula includes the upper limit of contact force F “F max ” (F ⁇ F max ), the limit of movable range (or speed) “x max ” (
- the low-level controller ⁇ L is, for example, a PID-based servo controller.
- the input u and the target trajectory x rd (here, polynomial) are expressed by the following equations. .
- control parameter ⁇ to be determined as the optimum control parameter is the coefficient of the target trajectory polynomial and the gain of the PID control, and is expressed as follows.
- the level set learning unit 52 calculates the executable state of the low-level controller from multiple sets of initial states x 0 ' and function values g * obtained by solving optimal control problems for various initial states x 0 '. Learn a level set function that represents the set ⁇ 0 '.
- the level-set learning unit 52 uses a level-set estimation method, which is an estimation method using Gaussian process regression based on the concept of Bayesian optimization, to perform a processing procedure for determining the executable state set ⁇ 0 ′.
- the level set function g GP may be defined using the mean value function of the Gaussian process obtained through the level set estimation method, or may be defined as a combination of the mean value function and the variance function.
- the level set learning unit 52 instructs the optimum control problem calculation unit 51 to calculate the optimum control problem with the designated initial state x 0 ′, and the designated initial state x 0 ′ and the function value that is the solution of the optimum control problem Update the level set function based on the pair with g * .
- the level set learning unit 52 first randomly specifies the initial state x 0 ′ to be specified, and then determines the initial state x 0 ′ to be specified next based on Gaussian process regression.
- the level set estimation method allows efficient learning of the level set function. Details of the level set estimation method are disclosed in Non-Patent Document 1, for example. According to this method, the initial state x 0 ' is effectively sampled for estimating the level set function, and the approximation function of the level set function can be preferably calculated with a small number of samples of the initial state x 0 '. .
- the level set function may be obtained using TRUVAR, which is an estimation method using Gaussian process regression.
- TRUVAR is disclosed in Non-Patent Document 2.
- the level set function may be any model that evaluates the initial states that can be reached with respect to the desired state. It can also be said that the parameters in the model are determined based on a set of the initial state x 0 ′ and the function value g * , which is the solution of the optimal control problem. By determining the model, it is possible to evaluate an initial state that can be reached with respect to a certain desired state, so that it is possible to determine control parameters that can achieve the desired state of the system. .
- the model determines whether or not a desired state can be reached from a certain state, and if the desired state can be reached, the robot is controlled to operate according to control parameters representing the action performed in the certain state.
- the high-level controller learning unit 54 learns the high-level controller ⁇ H using an arbitrary learning model used in machine learning.
- the high-level controller learning unit 54 learns the learning model so as to output ⁇ * i when x 0i ′ is input data.
- the learning model in this case may be any machine learning model such as neural network, Gaussian process regression, or support vector regression.
- the high-level controller learning unit 54 specifies the set (x 0i ′, ⁇ * i ) to be used as a learning sample by the level set learning unit 52 to the optimal control problem calculation unit 51 in learning the level set function. It is preferable to select from a combination of the initial state x 0 ′ and the optimal control parameter ⁇ * that is the solution of the optimal control problem. In this case, the high-level controller learning unit 54 uses the approximation function g ⁇ supplied from the level set approximation unit 53 to obtain the initial state x 0 ′ satisfying “g ⁇ (x 0 ′) ⁇ 0” and the optimal control A combination of parameters ⁇ * is selected as a training sample.
- the high-level controller learning unit 54 instructs the optimum control problem calculation unit 51 to calculate an optimum control problem specifying an initial state x 0 ′ further selected from the executable state set ⁇ 0 ′.
- the optimum control parameter ⁇ * which is the solution of the optimum control problem based on the initial state x 0 ′, may be obtained from the optimum control problem calculator 51 .
- the high-level controller learning unit 54 adds the set of the designated initial state x 0 ′ and the optimum control parameter ⁇ * obtained from the optimum control problem calculation unit 51 to the learning sample, and calculates We train the high-level controller ⁇ H shown.
- FIG. 7 is an example of a flow chart showing update processing of the skill database 24 by the learning apparatus 1.
- FIG. The learning device 1 executes the processing of the flowchart for each skill to be generated.
- the abstract system model setting unit 14 of the learning device 1 sets an abstract system model based on the detailed system model information 21 (step S11).
- the optimal control problem calculation unit 51 of the skill learning unit 15 calculates the detailed system model indicated by the detailed system model information 21, the abstract system model set in step S11, the low-level controller indicated by the low-level controller information 22, and Based on the target parameters indicated by the target parameter information 23, the optimum control problem shown in Equation (2) is set, and the solution of the optimum control problem is calculated (step S12).
- the optimal control problem calculation unit 51 sets optimal control problems for the initial states x 0 ′ specified by the level set learning unit 52 and the level set approximation unit 53, respectively, and the function value g * and the optimum control parameter ⁇ * .
- the level set learning unit 52 of the skill learning unit 15 computes the level set function of the executable state set ⁇ 0 ′ of the low-level controller of the target skill based on the solution of the optimal control problem calculated in step S12. Estimate (step S13).
- the level set learning unit 52 instructs the optimum control problem calculation unit 51 to calculate the optimum control problem with the initial state x 0 ′ specified, and the function value g * obtained as a response and the specified initial state x Compute the level set function g GP by multiple pairs with 0 ′.
- the level set approximation unit 53 of the skill learning unit 15 calculates a level set approximation function g ⁇ that approximates the level set function estimated in step S13 (step S14).
- the high-level controller learning unit 54 of the skill learning unit 15 learns the high-level controller ⁇ H based on the state elements within the level set specified by the level set approximation function (step S15).
- high-level control that satisfies the relationship of equation (1) based on a plurality of sets of initial states x 0 ' belonging to the set of executable states ⁇ 0 ' identified by the approximation function g ⁇ and optimal control parameters ⁇ *.
- ⁇ H is learned.
- the skill tuple generation unit 16 generates a set of abstract system model, high-level controller, low-level controller, target parameter information, and level set approximation function as a skill tuple. (step S16).
- the learning device 1 suitably learns the executable state set ⁇ 0 ' of the low-level controller of the target skill, and also learns the high-level controller ⁇ H necessary for executing the target skill. can be preferably carried out.
- the learning device 1 can simultaneously acquire information on the skill execution controller and the skill executable area, and can suitably construct the skill database 24 that can be utilized for motion planning of the robot 5 .
- the skill database 24 can be suitably used when performing tasks including complicated actions such as assembly and tool use.
- the level set approximation unit 53 may not exist in the functional blocks of the skill learning unit 15 shown in FIG.
- FIG. 8 is an example of functional blocks of a skill learning unit 15A in a modified example.
- the skill learning section 15A has an optimal control problem calculation section 51, a level set learning section 52, and a high level controller learning section .
- the level set learning unit 52 supplies the level set function g GP to the high-level controller learning unit 54 and also outputs it to the skill tuple generating unit 16 as the level set function output by the skill learning unit 15A.
- the high-level controller learning unit 54 uses the level set function g GP output by the level set learning unit 52 to specify the initial state x 0 ′ that satisfies g GP (x 0 ′) ⁇ 0 as a learning sample. , learn the high-level controller ⁇ H.
- the learning device 1 may generate a skill tuple without calculating the approximation function g ⁇ of the level set function g GP learned by the level set learning unit 52 .
- Mode 2 When the parameters of the low-level controller are determined, the learning device 1 calculates the executable state set ⁇ 0 ' A function g GP (or ⁇ ) representing a processing procedure for determining may be calculated. In this modification, there may be no high-level controller that determines the parameters of the low-level controller.
- the optimal control problem calculation unit 51 minimizes the evaluation function g based on the system model, the controller corresponding to the low-level controller, the target parameter, and the information on the initial state x 0 ′ specified by the level set learning unit 52.
- An optimum control problem (optimization problem) to be optimized is set, and the function value g * , which is the solution of the set optimum control problem, is calculated.
- the level set learning unit 52 calculates the level set function g GP based on the set of the initial state x 0 ′ and the function value g * .
- the learning device 1 can preferably generate information on the executable state set ⁇ 0 ′ and include it in the skill tuple registered in the skill database 24 .
- FIG. 9 shows a schematic configuration diagram of a learning device 1X according to the second embodiment.
- the learning device 1X mainly has an optimization problem calculation means 51X and an executable state set learning means 52X. Note that the learning device 1X may be composed of a plurality of devices.
- the optimization problem calculation means 51X calculates an evaluation function for evaluating reachability to a target state based on an abstract system model and a detailed system model relating to the system in which the robot works, a controller relating to the robot, and target parameters relating to the operation of the robot.
- the optimization problem used is set, and the function value of the evaluation function that is the solution to the optimization problem is calculated.
- the "controller” is, for example, the low-level controller in the first embodiment (including modifications; the same applies hereinafter).
- the "evaluation function for evaluating the reachability to the target state” is, for example, the evaluation function g in the first embodiment (including modifications; the same shall apply hereinafter).
- a "function value” is, for example, the function value g * in the first embodiment.
- the optimization problem calculation means 51X can be, for example, the optimum control problem calculation unit 51 in the first embodiment.
- the executable state set learning means 52X learns the executable state set of the robot motion executed by the controller based on the function value.
- the executable state set may be learned as a function (for example, the level set function in the first embodiment).
- the executable state set learning means 52X can be, for example, the level set learning section 52 in the first embodiment.
- FIG. 10 is an example of a flowchart in the second embodiment.
- the optimization problem calculation means 51X calculates an evaluation function for evaluating reachability to a target state based on an abstract system model and a detailed system model relating to the system in which the robot works, a controller relating to the robot, and target parameters relating to the operation of the robot.
- the optimum control problem used is set (step S21).
- the optimization problem calculation means 51X calculates the function value of the evaluation function that is the solution to the optimization problem (step S22).
- the executable state set learning means 52X learns the executable state set of the robot motion executed by the controller based on the function value of the function that solves the optimal control problem (step S23).
- the learning device 1X can suitably identify a skill executable state set through learning when, for example, a robot motion executed by a controller is modularized as a skill.
- [Appendix 1] An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot.
- optimization problem calculation means for setting and calculating the function value of the evaluation function that is the solution in the optimization problem;
- Executable state set learning means for learning an executable state set of the motion of the robot executed by the controller based on the function value;
- the learning device according to appendix 2, further comprising level set approximation means for calculating a level set approximation function that approximates the level set function.
- the executable state set learning means designates the initial state by sampling based on Gaussian process regression, and based on the function value and the initial state that are solutions in the optimization problem based on the designated initial state, 4.
- the learning device according to appendix 2 or 3, which learns the level set function.
- the controller includes a low-level controller that generates a control command for the robot, and a high-level controller that outputs control parameters for operating the low-level controller,
- the optimization problem calculation means calculates the control parameters and function values that are solutions to an optimal control problem set based on the abstract system model, the detailed system model, the low-level controller, and the target parameters, 5.
- the learning device according to any one of attachments 1 to 4, further comprising high-level controller learning means for learning the high-level controller based on the learned states included in the set of executable states. learning device.
- the high-level controller learning means is based on a set of a state included in the executable state set and the control parameter that is a solution to the optimal control problem when the state is an initial state in the optimal control problem. , for training the high-level controller.
- the evaluation function is a function that evaluates the reachability to a state in an abstract space, 7.
- Appendix 8 8.
- [Appendix 9] the computer An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot. Set, calculating the function value of the evaluation function that is the solution in the optimization problem; learning a set of possible states for actions of the robot performed by the controller based on the function values; learning method.
- [Appendix 10] An optimization problem using an evaluation function that evaluates reachability to a target state based on an abstract system model and a detailed system model of a system in which a robot works, a controller for the robot, and target parameters for the operation of the robot.
- a storage medium storing a program for causing a computer to execute a process of learning an executable state set of the robot motion executed by the controller based on the function value.
- the computer For a system whose state changes by a robot that operates according to control parameters, the control parameters from a first state to a second state are obtained using a first model that expresses the relationship between a plurality of states and the control parameters. decide and determining a second model that evaluates a reachable initial state for a desired state of the system based on the first state and the determined control parameters.
- a learning method For a system whose state changes by a robot that operates according to control parameters, the control parameters from a first state to a second state are obtained using a first model that expresses the relationship between a plurality of states and the control parameters. decide and determining a second model that evaluates a reachable initial state for a desired state of the system based on the first state and the determined control parameters.
- Non-transitory computer readable media include various types of tangible storage media (Tangible Storage Medium).
- non-transitory computer-readable media examples include magnetic storage media (e.g., floppy disks, magnetic tapes, hard disk drives), magneto-optical storage media (e.g., magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
- the program may also be delivered to the computer on various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer-readable media can deliver the program to the computer via wired channels, such as wires and optical fibers, or wireless channels.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manipulator (AREA)
Abstract
Description
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、当該最適化問題における解となる前記評価関数の関数値を計算する最適化問題計算手段と、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する実行可能状態集合学習手段と、
を有する学習装置である。
コンピュータが、
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する、
学習方法である。
コンピュータが、
制御パラメータに従い動作するロボットによって状態が変化するシステムについて、第1状態から第2状態に至るまでの前記制御パラメータを、複数の前記状態と前記制御パラメータとの関係性を表す第1モデルを用いて決定し、
前記システムのある所望状態に対して到達可能な初期状態を評価する第2モデルを、前記第1状態と、決定した前記制御パラメータとに基づき決定する
学習方法である。
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する処理をコンピュータに実行させるプログラムが格納された記憶媒体である。
(1)システム構成
図1は、第1実施形態に係るロボット制御システム100の構成を示す。ロボット制御システム100は、主に、学習装置1と、記憶装置2と、ロボットコントローラ3と、計測装置4と、ロボット5とを有する。学習装置1は、通信網を介し、又は、無線若しくは有線による直接通信により、記憶装置2とデータ通信を行う。また、ロボットコントローラ3は、記憶装置2、計測装置4及びロボット5と、通信網を介し、又は、無線若しくは有線による直接通信により、データ通信を行う。
図2(A)は、学習装置1のハードウェア構成を示す。学習装置1は、ハードウェアとして、プロセッサ11と、メモリ12と、インターフェース13とを含む。プロセッサ11、メモリ12及びインターフェース13は、データバス10を介して接続されている。
ロボットコントローラ3は、スキルタプルに基づき、抽象空間においてロボット5の動作計画の策定を行う。そこで、ロボット5の動作計画において対象とする抽象空間について説明する。
図4は、スキルの実行に関する制御系を表すブロック構成図である。ロボットコントローラ3のプロセッサ31は、機能的には、動作計画部34と、ハイレベル制御部35と、ローレベル制御部36とを有する。また、システム50は、実際のシステムに相当する。また、図4では、説明便宜上、動作計画部34において対象とする抽象空間を例示した図(図3(B)参照)を表す吹き出しを動作計画部34に対応付けて表示すると共に、システム50に対応する実システムを例示した図(図3(A)参照)を表す吹き出しをシステム50に対応付けて表示している。同様に、図4では、スキルの実行可能状態集合に関する情報を表す吹き出しをハイレベル制御部35に対応付けて表示している。
α=πH(x0’) (1)
g^(x0’)≦0
が満たされるか否か判定することで判定することが可能となる。言い換えると、該条件は、ある状態についての実行可能性を判定する制約条件を表しているということもできる。あるいは、関数「g^」は、ある初期状態x0’から与えられた目標状態に到達できるかどうかを評価することができるモデルであるということもできる。この場合、対象のスキルの実行後の抽象空間での目標状態の集合である目標状態集合を「χ’d」、対象のスキルの実行に要する時間長(実行時間長)を「T」、初期状態x0’からT時間長経過時点の状態を「x’(T)」とすると、ローレベル制御器36を利用することによってx’(T)∈χ’dが実現可能である。近似関数g^は、後述するように、学習装置1が学習することで求められる。
u=πL(x,α)
なお、ローレベル制御器πLは、上記の式の形式に限定されず、種々の形式を有する制御器であってもよい。
図5は、スキルデータベースの更新に関する学習装置1の機能ブロックの一例である。学習装置1のプロセッサ11は、機能的には、抽象システムモデル設定部14と、スキル学習部15と、スキルタプル生成部16とを有する。なお、図5では、各ブロック間で授受が行われるデータの一例が示されているが、これに限定されない。後述する他の機能ブロックの図においても同様である。
次に、図5に示すスキル学習部15が実行する処理の詳細について説明する。
図6は、スキル学習部15の機能ブロックの一例である。スキル学習部15は、機能的には、最適制御問題計算部51と、レベルセット学習部52と、レベルセット近似部53と、ハイレベル制御器学習部54とを有する。
gGP(x0’)≦g^(x0’)≦0
最適制御問題計算部51による最適制御問題の計算について具体的に説明する。最適制御問題計算部51は、最適制御問題による抽象状態における目標状態集合χ’dへの可到達性を判定する。この場合、目標状態集合χ’dは、評価関数gを用いて以下の式により表される。
g(x,F)≦0
c(x,u,F)≦0
例えば、この制約式は、接触力Fの上限「Fmax」(F≦Fmax)、可動範囲(又は速度)の制限「xmax」(|x|≦xmax)、入力uの上限「umax」(|u|≦umax)などを包括的に表す式となっている。
次に、レベルセット学習部52による学習について説明する。抽象状態x0’に対して対応する最適制御問題の解g*を出力する関数をg*(x0’)とすると、対象となるスキルの実行可能状態集合χ0’は、以下のように定義される。
次に、ハイレベル制御器学習部54によるハイレベル制御器πHの学習について説明する。
図7は、学習装置1によるスキルデータベース24の更新処理を表すフローチャートの一例である。学習装置1は、フローチャートの処理を、生成するスキルの各々に対して実行する。
次に、上述した実施形態の変形例について説明する。以下の変形例は、任意に組み合わせて上述の実施形態に適用してもよい。
図6に示すスキル学習部15の機能ブロックにおいてレベルセット近似部53は存在しなくともよい。
学習装置1は、ローレベル制御器のパラメータが定まっている場合に、最適制御問題計算部51及びレベルセット学習部52(及びレベルセット近似部53)の処理に基づき、実行可能状態集合χ0’を判定する際の処理手順を表す関数gGP(又はg^)を算出してもよい。本変形例では、ローレベル制御器のパラメータを定めるハイレベル制御器が存在しなくともよい。
図9は、第2実施形態における学習装置1Xの概略構成図を示す。学習装置1Xは、主に、最適化問題計算手段51Xと、実行可能状態集合学習手段52Xとを有する。なお、学習装置1Xは、複数の装置から構成されてもよい。
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、当該最適化問題における解となる前記評価関数の関数値を計算する最適化問題計算手段と、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する実行可能状態集合学習手段と、
を有する学習装置。
[付記2]
前記実行可能状態集合学習手段は、前記関数値と、前記最適化問題において設定した初期状態との複数の組に基づき、前記実行可能状態集合を表すレベルセット関数を学習する、付記1に記載の学習装置。
[付記3]
前記レベルセット関数に近似するレベルセット近似関数を算出するレベルセット近似手段をさらに有する、付記2に記載の学習装置。
[付記4]
前記実行可能状態集合学習手段は、ガウス過程回帰に基づくサンプリングにより前記初期状態を指定し、指定された前記初期状態に基づく前記最適化問題における解となる前記関数値と前記初期状態とに基づき、前記レベルセット関数の学習を行う、付記2または3に記載の学習装置。
[付記5]
前記制御器は、前記ロボットの制御指令を生成するローレベル制御器と、前記ローレベル制御器を動作させるための制御パラメータを出力するハイレベル制御器とを含み、
前記最適化問題計算手段は、前記抽象システムモデル、前記詳細システムモデル、前記ローレベル制御器、及び前記目標パラメータに基づき設定した最適制御問題の解となる前記制御パラメータ及び前記関数値を計算し、
前記学習装置は、学習された前記実行可能状態集合に含まれる状態に基づき、前記ハイレベル制御器を学習するハイレベル制御器学習手段をさらに有する、付記1~4のいずれか一項に記載の学習装置。
[付記6]
前記ハイレベル制御器学習手段は、前記実行可能状態集合に含まれる状態と、当該状態を前記最適制御問題における初期状態とした場合の前記最適制御問題の解となる前記制御パラメータとの組に基づき、前記ハイレベル制御器を学習する、付記5に記載の学習装置。
[付記7]
前記評価関数は、抽象空間における状態に対する前記可到達性を評価する関数であり、
前記実行可能状態集合学習手段は、前記抽象空間における前記実行可能状態集合を学習する、付記1~6のいずれか一項に記載の学習装置。
[付記8]
学習された前記実行可能状態集合に基づき、前記ロボットの動作に対するスキルタプルを生成するスキルタプル生成手段をさらに有する、付記1~7のいずれか一項に記載の学習装置。
[付記9]
コンピュータが、
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する、
学習方法。
[付記10]
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する処理をコンピュータに実行させるプログラムが格納された記憶媒体。
[付記11]
コンピュータが、
制御パラメータに従い動作するロボットによって状態が変化するシステムについて、第1状態から第2状態に至るまでの前記制御パラメータを、複数の前記状態と前記制御パラメータとの関係性を表す第1モデルを用いて決定し、
前記システムのある所望状態に対して到達可能な初期状態を評価する第2モデルを、前記第1状態と、決定した前記制御パラメータとに基づき決定する
学習方法。
2 記憶装置
3 ロボットコントローラ
4 計測装置
5 ロボット
100 ロボット制御システム
Claims (11)
- ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、当該最適化問題における解となる前記評価関数の関数値を計算する最適化問題計算手段と、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する実行可能状態集合学習手段と、
を有する学習装置。 - 前記実行可能状態集合学習手段は、前記関数値と、前記最適化問題において設定した初期状態との複数の組に基づき、前記実行可能状態集合を表すレベルセット関数を学習する、請求項1に記載の学習装置。
- 前記レベルセット関数に近似するレベルセット近似関数を算出するレベルセット近似手段をさらに有する、請求項2に記載の学習装置。
- 前記実行可能状態集合学習手段は、ガウス過程回帰に基づくサンプリングにより前記初期状態を指定し、指定された前記初期状態に基づく前記最適化問題における解となる前記関数値と前記初期状態とに基づき、前記レベルセット関数の学習を行う、請求項2または3に記載の学習装置。
- 前記制御器は、前記ロボットの制御指令を生成するローレベル制御器と、前記ローレベル制御器を動作させるための制御パラメータを出力するハイレベル制御器とを含み、
前記最適化問題計算手段は、前記抽象システムモデル、前記詳細システムモデル、前記ローレベル制御器、及び前記目標パラメータに基づき設定した最適制御問題の解となる前記制御パラメータ及び前記関数値を計算し、
前記学習装置は、学習された前記実行可能状態集合に含まれる状態に基づき、前記ハイレベル制御器を学習するハイレベル制御器学習手段をさらに有する、請求項1~4のいずれか一項に記載の学習装置。 - 前記ハイレベル制御器学習手段は、前記実行可能状態集合に含まれる状態と、当該状態を前記最適制御問題における初期状態とした場合の前記最適制御問題の解となる前記制御パラメータとの組に基づき、前記ハイレベル制御器を学習する、請求項5に記載の学習装置。
- 前記評価関数は、抽象空間における状態に対する前記可到達性を評価する関数であり、
前記実行可能状態集合学習手段は、前記抽象空間における前記実行可能状態集合を学習する、請求項1~6のいずれか一項に記載の学習装置。 - 学習された前記実行可能状態集合に基づき、前記ロボットの動作に対するスキルタプルを生成するスキルタプル生成手段をさらに有する、請求項1~7のいずれか一項に記載の学習装置。
- コンピュータが、
ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する、
学習方法。 - ロボットが作業を行うシステムに関する抽象システムモデル及び詳細システムモデル、前記ロボットに関する制御器、並びに前記ロボットの動作に関する目標パラメータに基づき、目標状態に対する可到達性を評価する評価関数を用いた最適化問題を設定し、
当該最適化問題における解となる前記評価関数の関数値を計算し、
前記関数値に基づき、前記制御器によって実行される前記ロボットの動作の実行可能状態集合を学習する処理をコンピュータに実行させるプログラムが格納された記憶媒体。 - コンピュータが、
制御パラメータに従い動作するロボットによって状態が変化するシステムについて、第1状態から第2状態に至るまでの前記制御パラメータを、複数の前記状態と前記制御パラメータとの関係性を表す第1モデルを用いて決定し、
前記システムのある所望状態に対して到達可能な初期状態を評価する第2モデルを、前記第1状態と、決定した前記制御パラメータとに基づき決定する
学習方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023501951A JPWO2022180785A5 (ja) | 2021-02-26 | 学習装置、学習方法及びプログラム | |
EP21927886.8A EP4300224A4 (en) | 2021-02-26 | 2021-02-26 | LEARNING DEVICE, LEARNING METHOD AND STORAGE MEDIUM |
US18/278,305 US20240123614A1 (en) | 2021-02-26 | 2021-02-26 | Learning device, learning method, and recording medium |
PCT/JP2021/007341 WO2022180785A1 (ja) | 2021-02-26 | 2021-02-26 | 学習装置、学習方法及び記憶媒体 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/007341 WO2022180785A1 (ja) | 2021-02-26 | 2021-02-26 | 学習装置、学習方法及び記憶媒体 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022180785A1 true WO2022180785A1 (ja) | 2022-09-01 |
Family
ID=83049002
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/007341 WO2022180785A1 (ja) | 2021-02-26 | 2021-02-26 | 学習装置、学習方法及び記憶媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240123614A1 (ja) |
EP (1) | EP4300224A4 (ja) |
WO (1) | WO2022180785A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024180656A1 (ja) * | 2023-02-28 | 2024-09-06 | 日本電気株式会社 | 学習装置、制御装置、制御システム、学習方法および記憶媒体 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018190241A (ja) * | 2017-05-09 | 2018-11-29 | オムロン株式会社 | タスク実行システム、タスク実行方法、並びにその学習装置及び学習方法 |
JP2019153246A (ja) * | 2018-03-06 | 2019-09-12 | オムロン株式会社 | 情報処理装置、情報処理方法、及びプログラム |
-
2021
- 2021-02-26 US US18/278,305 patent/US20240123614A1/en active Pending
- 2021-02-26 EP EP21927886.8A patent/EP4300224A4/en active Pending
- 2021-02-26 WO PCT/JP2021/007341 patent/WO2022180785A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018190241A (ja) * | 2017-05-09 | 2018-11-29 | オムロン株式会社 | タスク実行システム、タスク実行方法、並びにその学習装置及び学習方法 |
JP2019153246A (ja) * | 2018-03-06 | 2019-09-12 | オムロン株式会社 | 情報処理装置、情報処理方法、及びプログラム |
Non-Patent Citations (4)
Title |
---|
A. GOTOVOSN. CASATIG. HITZA. KRAUSE: "Active learning for level set estimation", INT. JOINT. CONF. ART. INTEL., 2013 |
ILIJA BOGUNOVICONATHAN SCARLETTANDREAS KRAUSEVOLKAN CEVHER: "Truncated variance reduction: A unified approach to Bayesian optimization and level-set estimation", IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS, 2016, pages 1507 - 1515 |
See also references of EP4300224A4 |
TOUSSAINT, MARC: "Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning", PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 25 July 2015 (2015-07-25), pages 1930 - 1936, XP055965160 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024180656A1 (ja) * | 2023-02-28 | 2024-09-06 | 日本電気株式会社 | 学習装置、制御装置、制御システム、学習方法および記憶媒体 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022180785A1 (ja) | 2022-09-01 |
EP4300224A1 (en) | 2024-01-03 |
US20240123614A1 (en) | 2024-04-18 |
EP4300224A4 (en) | 2024-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Whitney et al. | Reducing errors in object-fetching interactions through social feedback | |
US11235461B2 (en) | Controller and machine learning device | |
JP2007299365A (ja) | データ処理装置、データ処理方法、及びプログラム | |
US20210107144A1 (en) | Learning method, learning apparatus, and learning system | |
US20210107142A1 (en) | Reinforcement learning for contact-rich tasks in automation systems | |
CN110809505A (zh) | 用于执行机器人手臂的移动控制的方法和装置 | |
Chen et al. | Offset-free model predictive control of a soft manipulator using the Koopman operator | |
JP2021060988A (ja) | 準ニュートン信頼領域法を用いたポリシー最適化のためのシステムおよび方法 | |
Khansari-Zadeh et al. | Learning to play minigolf: A dynamical system-based approach | |
Navarro-Alarcon et al. | A Lyapunov-stable adaptive method to approximate sensorimotor models for sensor-based control | |
WO2022180785A1 (ja) | 学習装置、学習方法及び記憶媒体 | |
JP2020196102A (ja) | 制御装置、システム、学習装置および制御方法 | |
CN114529010A (zh) | 一种机器人自主学习方法、装置、设备及存储介质 | |
Ma et al. | A Human-Robot Collaboration Controller Utilizing Confidence for Disagreement Adjustment | |
Cursi et al. | Task accuracy enhancement for a surgical macro-micro manipulator with probabilistic neural networks and uncertainty minimization | |
US20240202569A1 (en) | Learning device, learning method, and recording medium | |
Sugimoto et al. | Trajectory-model-based reinforcement learning: Application to bimanual humanoid motor learning with a closed-chain constraint | |
Mainampati et al. | Implementation of human in the loop on the TurtleBot using reinforced learning methods and robot operating system (ROS) | |
Kasaei et al. | A Data-efficient Neural ODE Framework for Optimal Control of Soft Manipulators | |
CN115421387A (zh) | 一种基于逆强化学习的可变阻抗控制系统及控制方法 | |
US20230364792A1 (en) | Operation command generation device, operation command generation method, and storage medium | |
Mao et al. | Co-active learning to adapt humanoid movement for manipulation | |
JP2020203374A (ja) | ロボット制御装置、学習装置及び推論装置 | |
WO2023166573A1 (ja) | 学習装置、制御装置、学習方法及び記憶媒体 | |
WO2023166574A1 (ja) | 学習装置、制御装置、学習方法及び記憶媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21927886 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023501951 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18278305 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021927886 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021927886 Country of ref document: EP Effective date: 20230926 |