WO2017134735A1 - Robot system, robot optimization system, and robot operation plan learning method - Google Patents

Robot system, robot optimization system, and robot operation plan learning method Download PDF

Info

Publication number
WO2017134735A1
WO2017134735A1 PCT/JP2016/052979 JP2016052979W WO2017134735A1 WO 2017134735 A1 WO2017134735 A1 WO 2017134735A1 JP 2016052979 W JP2016052979 W JP 2016052979W WO 2017134735 A1 WO2017134735 A1 WO 2017134735A1
Authority
WO
WIPO (PCT)
Prior art keywords
trajectory
robot
evaluation
unit
cost
Prior art date
Application number
PCT/JP2016/052979
Other languages
French (fr)
Japanese (ja)
Inventor
祐太 是枝
敬介 藤本
潔人 伊藤
宣隆 木村
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2016/052979 priority Critical patent/WO2017134735A1/en
Publication of WO2017134735A1 publication Critical patent/WO2017134735A1/en

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric

Definitions

  • the present invention relates to a robot motion planning method and motion planning system.
  • Robot movement generation is most easily performed by teaching playback.
  • a remote controller called teaching pendant is used, or a person directly grabs and operates the robot to record the trajectory (posture set that interpolates the initial posture and the target posture) (teaching). The trajectory is reproduced faithfully (playback) (Patent Document 1).
  • the motion plan is a technology that determines the trajectory of the motion of the robot based on some criteria with the input of the initial posture, the target posture and the surrounding environment.
  • the motion planning problem is a problem of searching for a route from a certain point to a different point in a car robot.
  • the motion planning problem is a problem of searching for how to move to the specified posture in an industrial arm type robot.
  • the motion planning method currently used is a method of sampling an infinite number of postures that interpolate an initial posture and a target posture, and determining a trajectory so as to minimize the cost based on a predetermined cost function (Patent Document 2).
  • the current posture of the robot and the action pair to be taken at that time are generated without explicitly obtaining the motion trajectory (Patent Document 3).
  • Patent Document 4 As a prior art disclosing a method for supporting generation of a robot motion by means such as learning, there is a pet robot (Patent Document 4) that optimizes selection of a reaction expected by an operator from operator evaluation.
  • JP 2006-346792 A Japanese Patent Laying-Open No. 2015-160253 JP 2005-56185 A JP-A-11-175132
  • the initial posture and the target posture are fixedly set. Therefore, when the shape and position of the work object and the work environment change, it is necessary to redo the setting work, and the operation planning method cannot be substituted.
  • the operation is only selected from a predetermined list, and the method for acquiring the operation not explicitly given is not disclosed.
  • a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate From the trajectory feature extraction unit that extracts the motion feature that is the characteristic of the trajectory candidate, the trajectory selection reference recording unit that records the trajectory selection criterion for calculating the appropriateness of the trajectory candidate from the motion feature, and the motion feature, Based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost, which is an index of the appropriateness of trajectory candidates, and a trajectory candidate to be used as a robot motion trajectory are determined using the trajectory cost, and an operation signal is determined.
  • An output trajectory calculation unit an operation unit that operates the robot based on the operation signal, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the operation result of the robot Based on the input evaluation trend, the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result to be evaluated, and the orbit cost after being updated with the update amount
  • the robot system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the costs calculated by the calculation unit coincide.
  • Another aspect of the present invention provides a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and the trajectory candidates from the trajectory candidates.
  • the trajectory selection extraction unit that extracts the trajectory feature that is the characteristic
  • the trajectory selection criteria recording unit that records the trajectory selection criteria for calculating the appropriateness of the trajectory candidate from the motion features
  • the trajectory cost calculation unit that calculates the trajectory cost that is an index of the appropriateness of the trajectory candidate, the trajectory calculation unit that determines the candidate trajectory to be adopted as the robot trajectory using the trajectory cost
  • the motion trajectory A simulation unit that calculates at least one of the robot movement and the influence of the robot movement on the predetermined virtual physical environment based on a predetermined calculation model, and the robot movement
  • a display unit that visualizes at least one of the effects of robot motion on a predetermined virtual physical environment as a simulation result, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the simulation result of the display unit.
  • the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result that is the evaluation target, and the orbit cost after the update amount is updated by the orbit cost calculation unit
  • the robot optimization system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the calculated costs match.
  • Another aspect of the present invention provides a trajectory generation process for generating one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate from a trajectory candidate.
  • the trajectory cost which is an index of the appropriateness of the candidate trajectory from the motion features, based on the trajectory feature extraction process for extracting the motion features that are characteristics and the trajectory selection criteria for calculating the appropriateness of the trajectory candidates from the motion features
  • receiving the evaluation input by ordinal scale, performing a learning process to change the trajectory selection criteria based on the evaluation input is an operation plan learning method of the robot.
  • Configuration diagram of an embodiment of the present invention SCARA robot perspective view Block diagram showing the trajectory generator Block diagram showing the orbital cost calculator Block diagram showing the candidate trajectory recording unit Table showing the data structure of the candidate trajectory recording unit Block diagram showing the trajectory calculation unit Block diagram showing the operating unit
  • the perspective view which shows the condition which uses an evaluation input part Plan view of evaluation input unit with different configuration
  • Block diagram showing the automatic evaluation unit Table showing the data structure of the criteria record
  • Block diagram showing the evaluation interpreter Block diagram showing the initial function part Plan view showing the operation of a car robot Block diagram showing configurations in different configurations Block diagram showing the trajectory calculation unit
  • notations such as “first”, “second”, and “third” are attached to identify the constituent elements, and do not necessarily limit the number or order.
  • a number for identifying a component is used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Further, it does not preclude that a component identified by a certain number also functions as a component identified by another number.
  • FIG. 2 is a perspective view of an example of the SCARA robot.
  • the SCARA robot is, for example, a robot 200 having four joints as shown in FIG.
  • the shape of the robot is not limited to the configuration shown in FIG. 2, and may have five or more joints, or may be provided with other driving units such as a gripper.
  • the robot 200 can constitute all or part of the operation unit 106. Alternatively, the robot 200 may be remote and operate according to a command from the operation unit 106.
  • FIG. 1 shows an example of a system for controlling the robot 200 having the configuration shown in FIG.
  • This system can be configured, for example, by a server 201 connected to the robot 200 of FIG. 2 directly or via a network.
  • the server 201 includes a processing device 202, a storage device 203, an input device 204, and an output device 205.
  • the storage device 203 can be configured by a known magnetic disk device, a semiconductor memory, or a combination thereof.
  • functions such as calculation and control are realized in cooperation with other hardware by executing a program (software) stored in the storage device 203 by the processing device 202.
  • the A program executed by the server 201, its function, or means for realizing the function may be referred to as “function”, “means”, “part”, “unit”, “module”, or the like.
  • a function in which the storage device 203 of the server 201 stores specific data or a means for realizing the function may be referred to as a “recording unit”.
  • the trajectory generation unit 101 generates one or more trajectory candidates that the robot 200 can take in order to reach a predetermined target state (target position) from a predetermined start state.
  • the trajectory candidates are selected so as to satisfy the conditions such that the robot 200 does not collide with surrounding objects and the robot 200 itself, and each joint follows the robot motion model. This will be described in detail later with reference to FIG.
  • the trajectory feature extraction unit 102 extracts the motion features of each candidate trajectory (an index that well represents the trajectory properties) by referring to the values of variables that define the trajectory.
  • the trajectory selection criterion recording unit 104 records, as trajectory selection criteria, a calculation criterion for calculating, as a quantitative value, whether or not the trajectory is favorable for the operator from the motion characteristics of the robot motion.
  • the calculation standard can be expressed by, for example, a weight parameter between each neuron of the neural network. However, it is not limited to the above as long as it is a criterion for evaluating the appropriateness from the motion characteristics of the trajectory.
  • the trajectory cost calculation unit 103 receives the motion features extracted by the trajectory feature extraction unit 102 from the candidate trajectories generated by the trajectory generation unit 101.
  • the trajectory cost calculation unit 103 calculates the appropriateness of the candidate trajectory as a quantitative numerical value by applying the trajectory selection criterion recorded in the trajectory selection criterion recording unit 104 to the input motion feature, and calculates this value. Output as cost.
  • the track cost calculation unit 103 will be described in detail later with reference to FIG.
  • the trajectory selection reference recording unit 104 records the conversion parameters for calculating the appropriateness of the trajectory candidates from the motion features extracted by the trajectory feature extraction unit 102 from the trajectory candidates generated by the trajectory generation unit 101. Yes.
  • the trajectory cost calculation unit 103 calculates a trajectory cost that is an index of the appropriateness of trajectory candidates based on the conversion parameter.
  • the trajectory calculation unit 105 determines the trajectory of the robot 200 so as to reduce the cost, and outputs a motor signal that moves the robot 200 according to the trajectory. This will be described in detail later with reference to FIG.
  • the operation unit 106 drives the four joints of the SCARA robot 200 based on the motor signal. Based on the operation of the robot 200 generated by the operation unit, for example, the operator gives an input of “good” or “bad” by the evaluation input unit 107. This will be described in detail later with reference to FIG.
  • the evaluation interpretation unit 108 newly determines a cost corresponding to the motion feature of the motion that has been evaluated by using the tendency of the evaluation based on a plurality of evaluations from the operator.
  • the new cost output by the evaluation interpretation unit 108 increases the cost of the feature of the motion trajectory that received the “bad” evaluation, and decreases the cost of the feature of the motion trajectory that received the “good” rating.
  • the cost of the motion trajectory that was irrelevant to the evaluation is determined so as not to change.
  • the learning unit 109 changes the trajectory selection criterion of the trajectory selection criterion recording unit 104 so that the cost calculated by the trajectory cost calculation unit 103 matches the cost calculated by the evaluation interpretation unit 108. By repeating the increase / decrease in costs and learning, the trajectory selection criteria for the motion features that show a strong evaluation tendency are changed.
  • the motion feature input to the trajectory cost calculation unit 103 is, for example, a distance x between the tip of the robot and the operation target.
  • the cost calculated by the trajectory cost calculation unit 103 can be calculated by c (x) which is a function for calculating the cost.
  • the trajectory selection criterion recording unit 104 stores the definition of c (x) as a calculation criterion. For example, if the distance x is 10 mm, 20 mm, and 30 mm, the cost for each is c (10), c (20), and c (30). Here, it is assumed that the cost calculated by the trajectory cost calculation unit 103 is c (10) ⁇ c (20) ⁇ c (30).
  • the cost indicates that the smaller value is the “good” trajectory.
  • the operator's evaluation input by the evaluation input unit 107 is 20 mm, and the evaluation is “good” in the order of 10 mm and 30 mm thereafter.
  • the operator's evaluation is an order scale indicating the order of candidates.
  • the learning unit 109 corrects the calculation criterion c (x) of the trajectory selection criterion recording unit 104 so as to approach the operator's evaluation.
  • c (x) is changed so that c (20) ⁇ c (10) ⁇ c (30).
  • the robot 200 having three joints and one push-down member (pusher) in FIG. 2 will be described as an example.
  • the trajectory generation unit 101 connects the current posture [ ⁇ 0 0 , ⁇ 1 0 , ⁇ 2 0 , t 0 ] to the target posture [ ⁇ 0 L , ⁇ 1 L , ⁇ 2 L , t L ] given by the operator.
  • these postures and trajectories may have differential elements (velocity, acceleration) of the respective variables and conditions other than the posture, and are not limited to the above as long as they are elements for constituting the trajectory of the robot 200.
  • the target posture may be a set of a plurality of postures. The setting of the target state is not limited to that given by the operator, and may be automatically given from a robot control system or the like.
  • ⁇ Sampling of postures need not be limited to random, such as sampling at regular intervals or sampling based on a predetermined rule such as a quasi-random number.
  • the posture pair generation unit 302 generates a plurality of posture pairs from neighboring (Equation 3) postures. [Equation 3]
  • the definition of the neighborhood may be a primary norm. It is assumed that ⁇ uses a threshold set in advance by an operator or the like. The larger this set value ⁇ , the more the number of posture pairs increases, leading to an increase in calculation time, while the smaller the set value ⁇ , the smaller the number of posture pairs and the better the trajectory cannot be obtained.
  • a value may be given such that the number of pairs of predetermined postures is about 10 on average. Note that the value setting example is not limited to the above example.
  • the trajectory feature extraction unit 102 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory.
  • the feature quantity q is the Euclidean distance d between the positions of the hand (pusher tip) between the two postures, the change amount ⁇ i of the joint angle between the two postures, the velocity ⁇ i of each joint between the two postures, the obstacle from the robot Or at least one selected from the minimum distance l (clearance) or a combination thereof.
  • the feature is information representing the characteristics of the trajectory involved in the robot operation, various features can be adopted without being limited to the above feature.
  • the trajectory cost calculation unit 103 will be described with reference to FIG.
  • the input preprocessing unit 401 applies PCA (principal component analysis) to the feature quantity q, and performs decorrelation of the input.
  • the calculation unit 402 receives the uncorrelated feature value as an input, and uses the neural network weight parameter recorded in the trajectory selection reference recording unit 104, and outputs a numerical value (cost) corresponding to the feature value q by the neural network.
  • the cost calculation index used by the calculation unit 402 is not limited to the neural network as long as it is a function that can change the contribution rate of each numerical value of the feature value by a parameter such as linear combination of the numerical values of the feature value q or random forest regression.
  • the trajectory selection reference recording unit 104 records the coefficient of each feature amount, and if it is random forest regression, records the configuration of the decision tree. In this way, the cost is calculated for each trajectory candidate.
  • FIG. 5 shows a modification of the embodiment of FIG.
  • the present embodiment may include a candidate trajectory recording unit 501 as shown in FIG.
  • the trajectory generation unit 101 generates a candidate trajectory for each execution, but the candidate trajectory recording unit 501 records the trajectory candidates and feature quantities q calculated in advance by the trajectory generation unit 101 and the trajectory feature extraction unit 102.
  • the trajectory cost calculation unit 103 uses the recorded trajectory candidates and the feature quantity q for cost calculation.
  • FIG. 6 shows the data structure of the candidate trajectory recording unit 501.
  • Reference numeral 601 in FIG. 6A denotes a posture data structure sampled by the posture sampling unit 301.
  • Each row of 601 indicates all variables of one posture. For example, in posture A, the angle of joint 0 is 0.1, the angle of joint 1 is 2.1, the angle of joint 2 is 0.5, and the pusher elongation t is 5.0.
  • FIG. 6B records a posture 604 that forms a posture pair with each posture 603, and a memory address 605 that records data of the posture pair.
  • posture B is connected to postures A, D, and E, and information on the connection is recorded in memories 0001, 0002, and 0003, respectively.
  • 606 records the feature quantity 607 of the posture pair.
  • the connection from posture B to posture A is stored in the memory 0001
  • the Euclidean distance d of the hand position between the two postures at that time is 0.6
  • the angle change amount ⁇ 0 of the joint 0 between the two postures is ⁇ 0.1. .
  • the trajectory calculation unit 105 will be described with reference to FIG.
  • the minimum cost route search unit 701 searches for a trajectory that minimizes the cost.
  • the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route with the obstacle is excluded from the search.
  • the operation characteristic recording unit 703 records various operation characteristics of the robot 200 necessary for converting the trajectory into a drive signal for the actuator of the robot 200. It is assumed that the robot operation characteristic data is acquired in advance and stored in the storage device 203.
  • the operation signal generation unit 704 receives the trajectory of the minimum cost, uses the operation characteristics of the robot 200, and outputs a PWM (pulse width modulation) signal as an operation signal for operating the robot 200.
  • the operation signal is not limited to the PWM signal as long as it is a signal for driving the actuator of the robot.
  • FIG. 7 is used again to describe the trajectory calculation unit 105 in a configuration in which the trajectory is a set of posture pairs.
  • the cost of a candidate trajectory can be regarded as a graph in which the vertices of posture are connected by edges with weights of costs.
  • the minimum cost route search unit 701 determines the trajectory using the Dijkstra method so that the total cost is minimized. However, the Dijkstra method may be an arbitrary minimum cost search algorithm.
  • the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search.
  • the operation characteristic recording unit 703 and the operation signal generation unit 704 have the same configuration as described above.
  • the sensor unit 801 observes the state of the robot with encoders of each joint of the robot 200. However, the sensor unit may be either an internal sensor or an external sensor.
  • the controller unit 802 receives the difference between the state of the robot 200 and the command signal, and determines the motor output by PID control (Proportional-Integral-Derivative Controller).
  • PID control Proportional-Integral-Derivative Controller
  • the actuator unit 803 inputs the motor output and drives the actuator of the robot 200.
  • FIG. 9 shows a situation where evaluation is performed by the evaluation input unit 107 (901).
  • the evaluation input unit 901 is a switch having a binary input (Equation 4) of “good” and “bad”. [Equation 4]
  • the operator 900 can make an evaluation by directly viewing the operation of the robot 902 on site. Further, the robot 902 is in a remote place, and it is possible to perform evaluation by transmitting voice and video data acquired in the remote place and monitoring by an operator.
  • the robot 200 (902) may have various shapes such as the one shown in FIG. 2 and the one shown in FIG.
  • the evaluation value is not limited to 2 as long as it is an ordinal scale.
  • an operation that the robot 902 can correctly accomplish the purpose is appropriate, and an operation that the robot 902 cannot complete the purpose or an unfavorable operation such as not securing sufficient clearance is inappropriate. It can be considered.
  • the evaluation input unit 107 (901) in a different configuration will be described with reference to FIG.
  • the evaluation input unit 107 receives an operation ranking input obtained by the operator 900 observing the operation of the robot 902 by referring to the past operation. For example, the smaller the numerical value in a three-step evaluation, the better. In the example of FIG. 10, for example, the evaluation of motion Motion ⁇ A evaluated in the past can be referred to, and compared with this, the motion Motion D can be evaluated. However, the ranking may include equal and inequality signs.
  • the evaluation input unit 107 may receive an operation ranking input obtained by the operator 900 comparing the operations of two or more robots 902 having the same specifications.
  • FIG. 11 shows an example of a different configuration that does not require operator involvement.
  • an automatic evaluation unit 1100 is provided.
  • the motion sensing unit 1101 is an optical three-dimensional motion measurement device that measures the tip position of the robot 902, for example.
  • the motion sensing unit is not limited to the optical three-dimensional motion measurement device as long as it is a sensor that measures the robot 902 itself.
  • the environment sensing unit 1102 is, for example, a camera that observes a change in the position of the operation target of the robot 902. For example, the environment sensing unit 1102 determines whether the part is correctly inserted into a different part.
  • environment sensing is not limited to a camera as long as it is a sensor that measures the environment surrounding the robot 902. In the above example, two types of sensing units are provided, but only one of the sensing units may be provided, or another sensing unit may be added.
  • FIG. 12 shows an example of the data structure of the criterion recording unit 1103. “Good” when both the criterion “A” 1201 indicating whether the tip position of the robot 902 is more than the predetermined distance from the work target and the criterion “B” 1202 indicating whether the component is correctly inserted into the component are satisfied. Criteria for evaluating ("RESULT" number "1") are recorded.
  • the criterion may include an arbitrary calculation formula or conditional branch. The operation can be evaluated by applying information that can be acquired from various sensing units to the conditions defined by the determination criteria.
  • the determination unit 1104 outputs an evaluation value of the operation based on the inputs from the operation sensing unit 1101 and the environment sensing unit 1102 and the determination standard 1200.
  • the evaluation value may be either an order scale or a ranking of actions.
  • the evaluation interpretation unit 108 will be described with reference to FIG.
  • the evaluation organizing unit 1301 divides the operation / evaluation pair by the evaluation of the operator 900 into two groups based on a threshold set by the user. In this embodiment, the two groups are “good” and “bad”, which are the same as the evaluation of the operator 900. If the evaluation is “very good”, “good”, “bad”, “very bad”, etc., it is divided into two groups: “very good” “good” and “bad” “very bad”. You just have to decide.
  • the cost update amount determination unit 1302 determines the cost difference r ′ k before and after learning so that the cost decreases with a “good” evaluation and increases with a “bad” evaluation. However, r 'k is updated before the neural network function f, and the neural network after the update f' r 'k is taken as is defined by equation (5). [Equation 5]
  • the cost associated with these motion features is simply reduced by a “good” evaluation and increased by a “bad” evaluation. Changes unstable.
  • the change amount ⁇ i of the joint angle between the two postures can be considered as the operation feature universally involved in the costs of various operations. Therefore, in addition to the cost being reduced by the “good” evaluation and the cost being increased by the “bad” evaluation, the cost related to the operation feature that is not “good” or “bad” that was not related to the operator's evaluation does not change.
  • Determine the cost update amount For example, with the weight of the neural network irrelevant to the evaluation, a cost update amount that cancels the update amount is determined.
  • the evaluation interpretation unit 108 in the configuration in which the evaluation input unit outputs the ranking outputs the evaluation order in which the circulation and contradiction are arranged using the Schulze method with the set of evaluation orders as an input.
  • the rearrangement of evaluation rank is not limited to Schulze method as long as it is an election method that can arrange the circulation and contradiction of the set of evaluation orders.
  • the learning unit 109 updates the trajectory selection reference recording unit using the steepest descent method so as to minimize (Equation 7) from the difference r ′ k . [Equation 7]
  • trajectory selection reference recording unit may be updated using different optimization algorithms such as AdaGrad and RMSProp.
  • the learning unit 109 in the configuration in which the evaluation unit outputs rankings is output by the evaluation interpretation unit, and the combination of ranked feature quantities ⁇ [q g , q b ] 0 , [q g , q b ] 1 ,... ⁇ From (Equation 8), the steepest descent method is used to minimize each weight of the neural network.
  • is a sigmoid function.
  • FIG. 14 is a diagram showing another modification of the embodiment.
  • a cost initialization unit 1401 that outputs an initial value of the trajectory selection reference recording unit with an operator-defined initial objective function as an input may be provided.
  • the initial objective function is the total movement amount of the joint (Equation 9). [Equation 9]
  • and ⁇ t
  • the initial objective function is not limited to the movement amount of the joint as long as it is a mapping from the feature value to the real number.
  • the trajectory selection criterion is calculated using the learning unit so that the output for the feature quantity q i is f init (q i ).
  • (Equation 10) is a planar coordinate
  • (Equation 11) is the direction of the robot. [Equation 10] [Equation 11]
  • the trajectory generation unit 101, the trajectory cost calculation unit 103, the trajectory selection reference recording unit 104, the evaluation input unit 107, the evaluation interpretation unit 108, and the learning unit 109 are implemented by performing the corresponding processing of the first embodiment (FIG. 1). It becomes possible.
  • the trajectory feature extraction unit 1602 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory.
  • the feature quantity q is the Euclidean distance (Equation 12) between the two postures, the radius of rotation R of the orbit, and the speeds ⁇ 0 and ⁇ 1 in the two postures.
  • the feature is not limited to the above feature as long as it is information representing the characteristics of the trajectory involved in the operation of the robot. [Equation 12]
  • the trajectory calculation unit 1605 will be described with reference to FIG.
  • the minimum cost path search unit 1701 searches for a trajectory with the minimum cost calculated by the trajectory cost calculation unit 103 from the candidate trajectories generated by the trajectory generation unit 101, and outputs the trajectory.
  • the collision determination unit 1702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search.
  • the simulation process 1611 simulates the operation of the robot when the robot performs the trajectory calculated by the trajectory calculation unit 1605 in the real world.
  • the simulation for example, the robot operation itself and the influence of the robot operation on a predetermined virtual physical environment are output as simulation results.
  • the simulation is an event that is expected when the robot operates in the real world, such as the deviation between the trajectory calculated by the trajectory calculation unit 1605 and the trajectory in the simulation due to the disturbance that occurs in the robot, and the impact of the robot on the environment. If there is, it is not limited to the operation of the robot.
  • the display unit 1612 presents the trajectory calculated by the trajectory calculation unit 1605 to the operator on the display.
  • a trajectory on a plane can be displayed as shown in FIG.
  • the car robot starts from the position 1501, takes a trajectory passing through the position 1503, and arrives at the position 1502.
  • the display form may be converted into a stereoscopic image and displayed.
  • the example of the vehicle type robot has been described.
  • the SCARA robot as shown in FIG.
  • the same effects as in the first embodiment can be obtained without actually preparing or operating the robot.
  • the server 201 described in the above embodiments may be configured by a single computer, or any part of the input device 204, the output device 205, the processing device 202, and the storage device 203 are connected via a network. You may comprise with another computer. The idea of the invention is equivalent and unchanged.
  • functions equivalent to those configured by software can be realized by hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Such an embodiment is also included in the scope of the present invention.
  • the present invention is not limited to the above-described embodiment, and includes various modifications.
  • a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment.
  • Orbit generator 101 Orbit feature extraction unit: 102 Orbit cost calculator: 103 Orbit selection reference recording unit: 104 Orbit calculation unit: 105 Operating part: 106 Input section: 107 Evaluation interpretation section: 108 Learning Department: 109 Robot: 200 Server: 201

Abstract

Disclosed is a robot operation plan learning method which comprises: a trajectory generation process for generating one or more candidates for the trajectory to be tracked by a robot so that the robot is brought into a predetermined target position from a start position; a trajectory characteristic extraction process for extracting an operational characteristic, serving as a trajectory candidate property, from each of the trajectory candidates; a trajectory cost calculation process for calculating trajectory costs, which serve as an index of the appropriateness of the candidate trajectory from operational characteristics, on the basis of a trajectory selection reference for calculating the appropriateness of the trajectory candidate from the operational characteristic; a trajectory calculation process for determining a trajectory candidate to be adopted as the operation trajectory of the robot using the trajectory cost; a demonstration process for operating the robot on the basis of the robot operation trajectory determined by the trajectory calculation process, and/or a simulation process for performing a simulation of the operation of the robot on the basis of the robot operation trajectory; and a learning process for receiving an evaluation, by ordinal scale, performed on the demonstration process and/or the simulation process, and changing the trajectory selection reference on the basis of the evaluation input.

Description

ロボットシステム、ロボットの最適化システム、およびロボットの動作計画学習方法Robot system, robot optimization system, and robot motion plan learning method
 本発明は、ロボットの動作計画法および動作計画システムに関する。 The present invention relates to a robot motion planning method and motion planning system.
 ロボットの動作生成は最も簡単にはティーチングプレイバックで行われる。ティーチングプレイバックでは、ティーチングペンダントと呼ばれるリモートコントローラを使用するか、人が直接ロボットを掴み操作することで、軌道(初期姿勢と目標姿勢を補間する姿勢組)を記録し(ティーチング)、ロボットはその軌道を忠実に再生する(プレイバック)(特許文献1)。 Robot movement generation is most easily performed by teaching playback. In teaching playback, a remote controller called teaching pendant is used, or a person directly grabs and operates the robot to record the trajectory (posture set that interpolates the initial posture and the target posture) (teaching). The trajectory is reproduced faithfully (playback) (Patent Document 1).
 環境の変化や、目標姿勢の変更に対応するために、環境や初期姿勢・目標姿勢にあわせて動的に軌道を生成する動作計画法という技術が知られている。動作計画とは、初期姿勢と目標姿勢と周囲環境の情報を入力に、何らかの基準に基づいてロボットの動作の軌道を決定する技術である。動作計画問題は車型ロボットにおいてはある地点から異なる地点までの道筋を探す問題である。動作計画問題は産業用腕型ロボットにおいてはどのように腕を動かせば指定の姿勢になるのかを探す問題である。 In order to respond to changes in the environment and changes in the target posture, a technique called motion planning that dynamically generates a trajectory according to the environment, initial posture, and target posture is known. The motion plan is a technology that determines the trajectory of the motion of the robot based on some criteria with the input of the initial posture, the target posture and the surrounding environment. The motion planning problem is a problem of searching for a route from a certain point to a different point in a car robot. The motion planning problem is a problem of searching for how to move to the specified posture in an industrial arm type robot.
 現在使用されている動作計画法は初期姿勢と目標姿勢を補間する姿勢を無数にサンプリングし、所定のコスト関数に基づいてコストが最小になるように軌道を決定する手法である(特許文献2)。強化学習に基づくアプローチでは明示的に動作の軌道を求めることなく、ロボットの現姿勢とそのときとるべき行動対を生成する(特許文献3)。 The motion planning method currently used is a method of sampling an infinite number of postures that interpolate an initial posture and a target posture, and determining a trajectory so as to minimize the cost based on a predetermined cost function (Patent Document 2). . In the approach based on reinforcement learning, the current posture of the robot and the action pair to be taken at that time are generated without explicitly obtaining the motion trajectory (Patent Document 3).
 学習などの手段によってロボットの動作生成を支援する方法を開示した先行技術としては、オペレータが期待する反応の選択をオペレータ評価から最適化するペットロボット(特許文献4)などが存在する。 As a prior art disclosing a method for supporting generation of a robot motion by means such as learning, there is a pet robot (Patent Document 4) that optimizes selection of a reaction expected by an operator from operator evaluation.
特開2006-346792号公報JP 2006-346792 A 特開2015-160253号公報Japanese Patent Laying-Open No. 2015-160253 特開2005-56185号公報JP 2005-56185 A 特開平11-175132号公報JP-A-11-175132
 特許文献2で開示されている姿勢のサンプリングに基づく動作計画法においては、作業対象物の形状や位置ごとに、コスト関数を数式や制御構文を使い記述する必要がある。ロボットの動作の軌道は無数に考えられるため、記述するための基準は自由度が極めて高い。最適なコスト関数を決定し、コスト関数の形式で記述するには、高度な専門知識と試行錯誤に基づくコスト関数の調整作業が必須であった。 In the motion planning method based on the attitude sampling disclosed in Patent Document 2, it is necessary to describe the cost function using mathematical formulas and control syntax for each shape and position of the work object. Since the trajectory of the robot's movement is innumerable, the criteria for describing it are extremely flexible. In order to determine an optimal cost function and describe it in the form of a cost function, adjustment of the cost function based on advanced expertise and trial and error is essential.
 特許文献3に開示されている強化学習に基づく動作計画法では、初期姿勢と目標姿勢が固定的に設定される。したがって、作業対象物の形状や位置、作業環境が変化したときに、設定作業をやり直す必要があり、前記の動作計画法を代替することはできなかった。 In the motion planning method based on reinforcement learning disclosed in Patent Document 3, the initial posture and the target posture are fixedly set. Therefore, when the shape and position of the work object and the work environment change, it is necessary to redo the setting work, and the operation planning method cannot be substituted.
 特許文献4に開示されている行動を外部評価から最適化する方法では、動作は所定のリストから選択されるのみであり、明示的に与えられていない動作を獲得する方法は開示していない。 In the method of optimizing the behavior disclosed in Patent Document 4 from the external evaluation, the operation is only selected from a predetermined list, and the method for acquiring the operation not explicitly given is not disclosed.
 上記課題を解決するための本願発明の一側面は、ロボットの開始状態から所定の目標状態にたどり着くために、ロボットが取りうる軌道の候補を1つ以上生成する軌道生成部と、軌道の候補から、軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出部と、動作特徴から軌道の候補の適切さを算出するための軌道選択基準を記録する軌道選択基準記録部と、動作特徴から、軌道選択基準に基づいて、軌道の候補の適切さの指標である軌道コストを算出する軌道コスト計算部と、軌道コストを用いてロボットの動作軌道として採用する軌道の候補を決定し、操作信号を出力する軌道算出部と、操作信号をもとにロボットを動作させる動作部と、ロボットの動作結果に対するオペレータの順序尺度に基づく評価の入力を受ける評価入力部と、入力された前記評価の傾向から、評価対象となった動作結果に対応する軌道の候補の軌道コストの更新量を決定する評価解釈部と、更新量で更新された後の軌道コストに、軌道コスト計算部で算出するコストが一致するように、軌道選択基準記録部に記録した前記軌道選択基準を変更する学習部を備えた、ロボットシステムである。 One aspect of the present invention for solving the above problems is that a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate From the trajectory feature extraction unit that extracts the motion feature that is the characteristic of the trajectory candidate, the trajectory selection reference recording unit that records the trajectory selection criterion for calculating the appropriateness of the trajectory candidate from the motion feature, and the motion feature, Based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost, which is an index of the appropriateness of trajectory candidates, and a trajectory candidate to be used as a robot motion trajectory are determined using the trajectory cost, and an operation signal is determined. An output trajectory calculation unit, an operation unit that operates the robot based on the operation signal, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the operation result of the robot Based on the input evaluation trend, the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result to be evaluated, and the orbit cost after being updated with the update amount The robot system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the costs calculated by the calculation unit coincide.
 本願発明の他の一側面は、ロボットの開始状態から所定の目標状態にたどり着くために、ロボットが取りうる軌道の候補を1つ以上生成する軌道生成部と、軌道の候補から前記軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出部と、動作特徴から軌道の候補の適切さを算出するための軌道選択基準を記録する軌道選択基準記録部と、動作特徴から、軌道選択基準に基づいて、軌道の候補の適切さの指標である軌道コストを算出する軌道コスト計算部と、軌道コストを用いてロボットの動作軌道として採用する候補の軌道を決定する軌道算出部と、動作軌道を行うロボットの動作およびロボットの動作が所定の仮想物理環境に与える影響の少なくともひとつを所定の計算モデルに基づき計算するシミュレーション部と、ロボットの動作およびロボットの動作が所定の仮想物理環境に与える影響の少なくともひとつをシミュレーション結果として可視化する表示部と、表示部のシミュレーション結果に対するオペレータの順序尺度に基づく評価の入力を受ける評価入力部と、入力された評価の傾向から、評価対象となった動作結果に対応する軌道の候補の軌道コストの更新量を決定する評価解釈部と、更新量で更新された後の軌道コストに、軌道コスト計算部で算出するコストが一致するように、軌道選択基準記録部に記録した軌道選択基準を変更する学習部を備えた、ロボットの最適化システムである。 Another aspect of the present invention provides a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and the trajectory candidates from the trajectory candidates. Based on the trajectory selection criteria, the trajectory selection extraction unit that extracts the trajectory feature that is the characteristic, the trajectory selection criteria recording unit that records the trajectory selection criteria for calculating the appropriateness of the trajectory candidate from the motion features, The trajectory cost calculation unit that calculates the trajectory cost that is an index of the appropriateness of the trajectory candidate, the trajectory calculation unit that determines the candidate trajectory to be adopted as the robot trajectory using the trajectory cost, and the motion trajectory A simulation unit that calculates at least one of the robot movement and the influence of the robot movement on the predetermined virtual physical environment based on a predetermined calculation model, and the robot movement And a display unit that visualizes at least one of the effects of robot motion on a predetermined virtual physical environment as a simulation result, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the simulation result of the display unit. From the evaluation trend, the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result that is the evaluation target, and the orbit cost after the update amount is updated by the orbit cost calculation unit The robot optimization system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the calculated costs match.
 本願発明の他の一側面は、ロボットの開始状態から所定の目標状態にたどり着くために、ロボットが取りうる軌道の候補を1つ以上生成する軌道生成処理と、軌道の候補から、軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出処理と、前記動作特徴から軌道の候補の適切さを算出するための軌道選択基準に基づいて、動作特徴から候補軌道の適切さの指標である軌道コストを算出する軌道コスト計算処理と、軌道コストを用いてロボットの動作軌道として採用する軌道の候補を決定する軌道算出処理と、軌道算出処理により決定されたロボットの動作軌道に基づいてロボットを動作する実演処理、および、ロボットの動作軌道に基づいてロボットの動作をシミュレーションするシミュレーション処理の少なくともひとつと、実演処理およびシミュレーション処理の少なくともひとつに対する、順序尺度による評価入力を受け、評価入力に基づいて軌道選択基準を変更する学習処理と、を行う、ロボットの動作計画学習方法である。 Another aspect of the present invention provides a trajectory generation process for generating one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate from a trajectory candidate. The trajectory cost, which is an index of the appropriateness of the candidate trajectory from the motion features, based on the trajectory feature extraction process for extracting the motion features that are characteristics and the trajectory selection criteria for calculating the appropriateness of the trajectory candidates from the motion features The trajectory cost calculation process for calculating the trajectory, the trajectory calculation process for determining the trajectory candidate to be adopted as the robot trajectory using the trajectory cost, and operating the robot based on the robot trajectory determined by the trajectory calculation process Demonstration process and at least one of simulation process for simulating robot movement based on robot trajectory and demonstration process And for at least one of the simulation process, receiving the evaluation input by ordinal scale, performing a learning process to change the trajectory selection criteria based on the evaluation input is an operation plan learning method of the robot.
 目標姿勢に依存しないロボットの動き方を、ロボットの動作手順の明示的な記述を必要とせず容易に設定する方法を提供する。 ∙ Provide a method for easily setting the robot movement method that does not depend on the target posture, without requiring explicit description of the robot operation procedure.
本発明の実施例の構成図Configuration diagram of an embodiment of the present invention SCARAロボット全体斜視図SCARA robot perspective view 軌道生成部を示すブロック図Block diagram showing the trajectory generator 軌道コスト計算部を示すブロック図Block diagram showing the orbital cost calculator 候補軌道記録部を示すブロック図Block diagram showing the candidate trajectory recording unit 候補軌道記録部におけるデータ構造を示す表図Table showing the data structure of the candidate trajectory recording unit 軌道算出部を示すブロック図Block diagram showing the trajectory calculation unit 動作部を示すブロック図Block diagram showing the operating unit 評価入力部を使用する状況を示す斜視図The perspective view which shows the condition which uses an evaluation input part 異なる構成による評価入力部の平面図Plan view of evaluation input unit with different configuration 自動評価部を示すブロック図Block diagram showing the automatic evaluation unit 判定基準記録のデータ構造を示す表図Table showing the data structure of the criteria record 評価解釈部を示すブロック図Block diagram showing the evaluation interpreter 初期関数部を示すブロック図Block diagram showing the initial function part 車型ロボットの動作を示す平面図Plan view showing the operation of a car robot 異なる構成における構成を示すブロック図Block diagram showing configurations in different configurations 軌道算出部を示すブロック図Block diagram showing the trajectory calculation unit
 実施の形態について、図面を用いて詳細に説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 Embodiments will be described in detail with reference to the drawings. However, the present invention is not construed as being limited to the description of the embodiments below. Those skilled in the art will readily understand that the specific configuration can be changed without departing from the spirit or the spirit of the present invention.
 以下に説明する発明の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。 In the structure of the invention described below, the same portions or portions having similar functions are denoted by the same reference numerals in different drawings, and redundant description may be omitted.
 本明細書等における「第1」、「第2」、「第3」などの表記は、構成要素を識別するために付するものであり、必ずしも、数または順序を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 In this specification and the like, notations such as “first”, “second”, and “third” are attached to identify the constituent elements, and do not necessarily limit the number or order. In addition, a number for identifying a component is used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Further, it does not preclude that a component identified by a certain number also functions as a component identified by another number.
 図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each component shown in the drawings and the like may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. For this reason, the present invention is not necessarily limited to the position, size, shape, range, and the like disclosed in the drawings and the like.
 <1.全体構成>
 図1と図2を使い、本発明の一実施形態によるSCARA(Selective Compliance Assembly Robot Arm:水平多関節)ロボットシステムを説明する。
<1. Overall configuration>
A SCARA (Selective Compliance Assembly Robot Arm: horizontal articulated) robot system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2.
 図2はSCARAロボットの一例の斜視図である。SCARAロボットは、例えば図2のような4つの関節を有するロボット200である。ロボットの形状は図2の構成に限らず、5以上の関節を持ってもよいし、グリッパ―など他の駆動部を備えていてもよい。ロボット200は動作部106の全部または一部を構成することができる。あるいは、ロボット200は遠隔にあり、動作部106の指令で動作するよう構成してもよい。 FIG. 2 is a perspective view of an example of the SCARA robot. The SCARA robot is, for example, a robot 200 having four joints as shown in FIG. The shape of the robot is not limited to the configuration shown in FIG. 2, and may have five or more joints, or may be provided with other driving units such as a gripper. The robot 200 can constitute all or part of the operation unit 106. Alternatively, the robot 200 may be remote and operate according to a command from the operation unit 106.
 図1は例えば図2の構成のロボット200の制御を行うシステムの一例である。このシステムは、例えば図2のロボット200と直接あるいはネットワークを介して接続されたサーバ201により構成することができる。周知のようにサーバ201は、処理装置202、記憶装置203、入力装置204、出力装置205を備える。記憶装置203は周知の磁気ディスク装置や半導体メモリ、あるいはこれらの組合せで構成することができる。 FIG. 1 shows an example of a system for controlling the robot 200 having the configuration shown in FIG. This system can be configured, for example, by a server 201 connected to the robot 200 of FIG. 2 directly or via a network. As is well known, the server 201 includes a processing device 202, a storage device 203, an input device 204, and an output device 205. The storage device 203 can be configured by a known magnetic disk device, a semiconductor memory, or a combination thereof.
 本実施例では計算や制御等の機能は、記憶装置203に格納されたプログラム(ソフトウェア)が処理装置202によって実行されることで、定められた処理を他のハードウェアと協働して実現される。サーバ201が実行するプログラム、その機能、あるいはその機能を実現する手段を、「機能」、「手段」、「部」、「ユニット」、「モジュール」等と呼ぶ場合がある。また、サーバ201の記憶装置203が特定のデータを格納する機能、あるいはその機能を実現する手段を、「記録部」と呼ぶ場合がある。 In the present embodiment, functions such as calculation and control are realized in cooperation with other hardware by executing a program (software) stored in the storage device 203 by the processing device 202. The A program executed by the server 201, its function, or means for realizing the function may be referred to as “function”, “means”, “part”, “unit”, “module”, or the like. In addition, a function in which the storage device 203 of the server 201 stores specific data or a means for realizing the function may be referred to as a “recording unit”.
 軌道生成部101は所定の開始状態から所定の目標状態(目標位置)にたどり着くために、ロボット200が取りうる軌道の候補を1つ以上生成する。軌道の候補は、ロボット200が周囲の存在物やロボット200自身に衝突しない、各関節がロボットの動作モデルに従うなどの条件を満たすように選ばれる。後に図3で詳述する。 The trajectory generation unit 101 generates one or more trajectory candidates that the robot 200 can take in order to reach a predetermined target state (target position) from a predetermined start state. The trajectory candidates are selected so as to satisfy the conditions such that the robot 200 does not collide with surrounding objects and the robot 200 itself, and each joint follows the robot motion model. This will be described in detail later with reference to FIG.
 軌道特徴抽出部102は各候補軌道の動作特徴(軌道の性質をよく表すような指標)を、軌道を定義する変数の値を参照する事によって抽出する。 The trajectory feature extraction unit 102 extracts the motion features of each candidate trajectory (an index that well represents the trajectory properties) by referring to the values of variables that define the trajectory.
 軌道選択基準記録部104はロボット動作の軌道の動作特徴から、軌道がオペレータにとって好ましい動作となっているかどうかの適切さを定量的な値として算出するための算出基準を、軌道選択基準として記録する。算出基準は、例えばニューラルネットワークの各ニューロン間の重みパラメータなどによって表す事ができる。ただし、軌道の動作特徴からその適切さを評価する基準であれば、上記に限らないものとする。 The trajectory selection criterion recording unit 104 records, as trajectory selection criteria, a calculation criterion for calculating, as a quantitative value, whether or not the trajectory is favorable for the operator from the motion characteristics of the robot motion. . The calculation standard can be expressed by, for example, a weight parameter between each neuron of the neural network. However, it is not limited to the above as long as it is a criterion for evaluating the appropriateness from the motion characteristics of the trajectory.
 軌道コスト計算部103は、軌道生成部101で生成した候補軌道から軌道特徴抽出部102によって抽出した動作特徴を入力とする。軌道コスト計算部103は、入力された動作特徴に対して軌道選択基準記録部104に記録した軌道選択基準を適用する事で、候補軌道の適切さを定量的な数値として算出し、この値をコストとして出力する。軌道コスト計算部103は、後に図4で詳述する。 The trajectory cost calculation unit 103 receives the motion features extracted by the trajectory feature extraction unit 102 from the candidate trajectories generated by the trajectory generation unit 101. The trajectory cost calculation unit 103 calculates the appropriateness of the candidate trajectory as a quantitative numerical value by applying the trajectory selection criterion recorded in the trajectory selection criterion recording unit 104 to the input motion feature, and calculates this value. Output as cost. The track cost calculation unit 103 will be described in detail later with reference to FIG.
 すなわち、軌道選択基準記録部104は、軌道生成部101で生成した軌道の候補より軌道特徴抽出部102で抽出した動作特徴から、軌道の候補の適切さを算出するための変換パラメータを記録している。軌道コスト計算部103は、変換パラメータに基づいて軌道の候補の適切さの指標である軌道コストを計算することになる。 That is, the trajectory selection reference recording unit 104 records the conversion parameters for calculating the appropriateness of the trajectory candidates from the motion features extracted by the trajectory feature extraction unit 102 from the trajectory candidates generated by the trajectory generation unit 101. Yes. The trajectory cost calculation unit 103 calculates a trajectory cost that is an index of the appropriateness of trajectory candidates based on the conversion parameter.
 軌道算出部105はコストが小さくなるようにロボット200の軌道を決定し、ロボット200が軌道通り動くようなモータ信号を出力する。後に図7で詳述する。 The trajectory calculation unit 105 determines the trajectory of the robot 200 so as to reduce the cost, and outputs a motor signal that moves the robot 200 according to the trajectory. This will be described in detail later with reference to FIG.
 動作部106はモータ信号をもとにSCARA ロボット200の4つの関節を駆動する。動作部によって生じたロボット200の動作をもとにオペレータは例えば、「良い」あるいは「悪い」の入力を評価入力部107によって与える。後に図8で詳述する。 The operation unit 106 drives the four joints of the SCARA robot 200 based on the motor signal. Based on the operation of the robot 200 generated by the operation unit, for example, the operator gives an input of “good” or “bad” by the evaluation input unit 107. This will be described in detail later with reference to FIG.
 評価解釈部108はオペレータからの複数の評価をもとに、評価の傾向を使うことで、評価の対象となった動作の動作特徴に対応するコストを新しく決定する。具体例としては、評価解釈部108が出力する新しいコストは、「悪い」の評価を受けた動作軌道の特徴のコストが増加し、「良い」の評価を受けた動作軌道の特徴のコストが低下し、評価に無関係だった動作軌道の特徴のコストが変化しないように決定される。 The evaluation interpretation unit 108 newly determines a cost corresponding to the motion feature of the motion that has been evaluated by using the tendency of the evaluation based on a plurality of evaluations from the operator. As a specific example, the new cost output by the evaluation interpretation unit 108 increases the cost of the feature of the motion trajectory that received the “bad” evaluation, and decreases the cost of the feature of the motion trajectory that received the “good” rating. In addition, the cost of the motion trajectory that was irrelevant to the evaluation is determined so as not to change.
 学習部109は、評価解釈部108で算出したコストに軌道コスト計算部103で算出したコストが一致するように、軌道選択基準記録部104の軌道選択基準を変更する。コストの増減および学習を繰り返す事で、評価の傾向が強く表れる動作特徴に対する軌道選択基準が変更される。 The learning unit 109 changes the trajectory selection criterion of the trajectory selection criterion recording unit 104 so that the cost calculated by the trajectory cost calculation unit 103 matches the cost calculated by the evaluation interpretation unit 108. By repeating the increase / decrease in costs and learning, the trajectory selection criteria for the motion features that show a strong evaluation tendency are changed.
 以下、簡単な例で学習部109の機能を説明する。軌道コスト計算部103に入力される動作特徴を、例えばロボットの先端と操作対象物との距離xとする。軌道コスト計算部103で計算されるコストは、コストを計算する関数であるc(x)により計算できる。軌道選択規準記録部104には、算出基準としてc(x)の定義が格納されている。例えば距離xとして、10mm、20mm、30mmがあると、其々に対するコストはc(10)、c(20)、c(30)のようになる。ここで、軌道コスト計算部103で計算されたコストが、c(10)<c(20)<c(30)だったとする。コストは、値が小さいほうが「良い」軌道であることを示す。一方、評価入力部107で入力されたオペレータの評価は、20mmが一番良く、以降10mm、30mmの順番で「良い」評価であったとする。オペレータの評価は、候補の順序を示す順序尺度となっている。この場合、学習部109はオペレータの評価に近づくように、軌道選択基準記録部104の算出基準c(x)を修正する。上記の例では、c(20)<c(10)<c(30)になるように、c(x)を変更する。以上の例は簡単な例であるが、評価特徴はもっと複雑な指標を採用することもできる。 Hereinafter, the function of the learning unit 109 will be described with a simple example. The motion feature input to the trajectory cost calculation unit 103 is, for example, a distance x between the tip of the robot and the operation target. The cost calculated by the trajectory cost calculation unit 103 can be calculated by c (x) which is a function for calculating the cost. The trajectory selection criterion recording unit 104 stores the definition of c (x) as a calculation criterion. For example, if the distance x is 10 mm, 20 mm, and 30 mm, the cost for each is c (10), c (20), and c (30). Here, it is assumed that the cost calculated by the trajectory cost calculation unit 103 is c (10) <c (20) <c (30). The cost indicates that the smaller value is the “good” trajectory. On the other hand, it is assumed that the operator's evaluation input by the evaluation input unit 107 is 20 mm, and the evaluation is “good” in the order of 10 mm and 30 mm thereafter. The operator's evaluation is an order scale indicating the order of candidates. In this case, the learning unit 109 corrects the calculation criterion c (x) of the trajectory selection criterion recording unit 104 so as to approach the operator's evaluation. In the above example, c (x) is changed so that c (20) <c (10) <c (30). Although the above example is a simple example, a more complicated index can be adopted as the evaluation feature.
 <2.軌道生成部>
 図2の3つの関節と1つの押し下げ部材(プッシャー)を持つロボット200を例に説明する。軌道生成部101は、現姿勢[θ0 0,θ1 0,θ2 0,t0]とオペレータから与えられた目標姿勢[θ0 L,θ1 L,θ2 L,tL]を繋ぐ軌道{[θ0 i,θ1 i,θ2 i,ti] | i = 0, 1, …, L}を複数生成する。ただし、これらの姿勢や軌道はそれぞれの変数の微分要素(速度、加速度)や、姿勢以外の条件を持っていてもよく、ロボット200の軌道を構成するための要素であれば上記に限定されない。また、目標姿勢は複数の姿勢の集合であってもよい。目標状態の設定はオペレータが与えるものに限らず、ロボットの管制システムなどから自動で与えられても良い。
<2. Orbit generator>
The robot 200 having three joints and one push-down member (pusher) in FIG. 2 will be described as an example. The trajectory generation unit 101 connects the current posture [θ 0 0 , θ 1 0 , θ 2 0 , t 0 ] to the target posture [θ 0 L , θ 1 L , θ 2 L , t L ] given by the operator. A plurality of trajectories {[θ 0 i , θ 1 i , θ 2 i , t i ] | i = 0, 1,..., L} are generated. However, these postures and trajectories may have differential elements (velocity, acceleration) of the respective variables and conditions other than the posture, and are not limited to the above as long as they are elements for constituting the trajectory of the robot 200. Further, the target posture may be a set of a plurality of postures. The setting of the target state is not limited to that given by the operator, and may be automatically given from a robot control system or the like.
 図3を使い、軌道生成部101の構成例を説明する。姿勢サンプリング部301はロボットの取り得る姿勢の集合から成るコンフィギュレーション空間から姿勢x=[θ0,θ1,θ2,t]をランダムにサンプリングする。 A configuration example of the trajectory generation unit 101 will be described with reference to FIG. The posture sampling unit 301 randomly samples posture x = [θ 0 , θ 1 , θ 2 , t] from a configuration space composed of a set of postures that the robot can take.
 この実施例では図2のロボット200を例に説明しており、図2に示すように、θiは各関節の角度(数1)、tはロボット先端の押し込み量(数2)である。
[数1]
Figure JPOXMLDOC01-appb-I000001
[数2]
Figure JPOXMLDOC01-appb-I000002
In this embodiment, the robot 200 in FIG. 2 is described as an example. As shown in FIG. 2, θ i is the angle of each joint (Equation 1), and t is the pushing amount of the robot tip (Equation 2).
[Equation 1]
Figure JPOXMLDOC01-appb-I000001
[Equation 2]
Figure JPOXMLDOC01-appb-I000002
 姿勢のサンプリングは、等間隔でのサンプリングや準乱数などの所定のルールに基づいたサンプリングなど、ランダムに限る必要はない。 ¡Sampling of postures need not be limited to random, such as sampling at regular intervals or sampling based on a predetermined rule such as a quasi-random number.
 姿勢対生成部302は、近傍(数3)の姿勢から姿勢の対を複数生成する。
[数3]
Figure JPOXMLDOC01-appb-I000003
The posture pair generation unit 302 generates a plurality of posture pairs from neighboring (Equation 3) postures.
[Equation 3]
Figure JPOXMLDOC01-appb-I000003
 ただし、近傍の定義は一次ノルムなどでもよい。εはオペレータ等が予め設定する閾値を用いるものとする。本設定値εは大きいほど姿勢対の数が増加し計算時間増に繋がり、一方小さいほど姿勢対の数が減少し良好な軌道が得られなくなる。値を設定する基準としては、例えば所定の姿勢の対の数が、平均して10個程度となるような値を与えればよいこととする。なお、値の設定例としては上記例に限らない。 However, the definition of the neighborhood may be a primary norm. It is assumed that ε uses a threshold set in advance by an operator or the like. The larger this set value ε, the more the number of posture pairs increases, leading to an increase in calculation time, while the smaller the set value ε, the smaller the number of posture pairs and the better the trajectory cannot be obtained. As a reference for setting the value, for example, a value may be given such that the number of pairs of predetermined postures is about 10 on average. Note that the value setting example is not limited to the above example.
 <3.軌道特徴抽出部>
 軌道特徴抽出部102は、軌道から、軌道に含まれる姿勢の値を集計することで特徴量qを作成する。本実施例において特徴量qは、2姿勢間の手先(プッシャー先端)位置のユークリッド距離d、2姿勢間の関節角度の変化量δi、2姿勢間の各関節の速度νi、ロボットから障害物までの最小距離l(クリアランス)等から選択される少なくとも一つあるいはこれらの組合せである。ただし、特徴はロボットの動作に関与している軌道の特性を表す情報であれば、前記の特徴に限らず種々の特徴を採用することができる。
<3. Orbital feature extraction unit>
The trajectory feature extraction unit 102 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory. In this embodiment, the feature quantity q is the Euclidean distance d between the positions of the hand (pusher tip) between the two postures, the change amount δ i of the joint angle between the two postures, the velocity ν i of each joint between the two postures, the obstacle from the robot Or at least one selected from the minimum distance l (clearance) or a combination thereof. However, as long as the feature is information representing the characteristics of the trajectory involved in the robot operation, various features can be adopted without being limited to the above feature.
 <4.軌道コスト計算部>
 図4を使い、軌道コスト計算部103を説明する。入力前処理部401は特徴量qにPCA(principal component analysis: 主成分分析)を適用し、入力の無相関化を行う。計算部402は無相関化した特徴量を入力とし、軌道選択基準記録部104に記録されたニューラルネットワークの重みパラメータを用い、ニューラルネットワークにより特徴量qに対応する数値(コスト)を出力する。ただし、計算部402が使用するコスト計算指標は、特徴量qの各数値の線形結合やrandom forest regressionなど、パラメータにより特徴量の各数値の寄与率を変化できる関数であればニューラルネットワークに限らない。例えば、計算部402による処理が線形結合である場合、軌道選択基準記録部104は各特徴量の係数を記録し、random forest regressionであれば決定木の構成を記録する。このようにして各軌道の候補についてコストが計算される。
<4. Orbital Cost Calculator>
The trajectory cost calculation unit 103 will be described with reference to FIG. The input preprocessing unit 401 applies PCA (principal component analysis) to the feature quantity q, and performs decorrelation of the input. The calculation unit 402 receives the uncorrelated feature value as an input, and uses the neural network weight parameter recorded in the trajectory selection reference recording unit 104, and outputs a numerical value (cost) corresponding to the feature value q by the neural network. However, the cost calculation index used by the calculation unit 402 is not limited to the neural network as long as it is a function that can change the contribution rate of each numerical value of the feature value by a parameter such as linear combination of the numerical values of the feature value q or random forest regression. . For example, when the processing by the calculation unit 402 is linear combination, the trajectory selection reference recording unit 104 records the coefficient of each feature amount, and if it is random forest regression, records the configuration of the decision tree. In this way, the cost is calculated for each trajectory candidate.
 <5.候補軌道記録部(オプション)>
 図5は、図1の実施例の変形例を示す。本実施例は図5に示すような候補軌道記録部501を備えていても良い。図4の構成において軌道生成部101は実施毎に候補軌道を生成するが、候補軌道記録部501は軌道生成部101と軌道特徴抽出部102で事前に計算した軌道候補と特徴量qを記録し、ロボットの動作時に軌道コスト計算部103は記録された軌道候補と特徴量qをコスト計算に使用する。
<5. Candidate orbit recording part (option)>
FIG. 5 shows a modification of the embodiment of FIG. The present embodiment may include a candidate trajectory recording unit 501 as shown in FIG. In the configuration of FIG. 4, the trajectory generation unit 101 generates a candidate trajectory for each execution, but the candidate trajectory recording unit 501 records the trajectory candidates and feature quantities q calculated in advance by the trajectory generation unit 101 and the trajectory feature extraction unit 102. During the operation of the robot, the trajectory cost calculation unit 103 uses the recorded trajectory candidates and the feature quantity q for cost calculation.
 図6に、候補軌道記録部501のデータ構造を示す。図6(a)の601は姿勢サンプリング部301でサンプリングされた姿勢のデータ構造である。601の各行は1つの姿勢の全変数を示している。例えば姿勢Aは関節0の角度が0.1、関節1の角度が2.1、関節2の角度が0.5、プッシャーの伸び量tが5.0である。 FIG. 6 shows the data structure of the candidate trajectory recording unit 501. Reference numeral 601 in FIG. 6A denotes a posture data structure sampled by the posture sampling unit 301. Each row of 601 indicates all variables of one posture. For example, in posture A, the angle of joint 0 is 0.1, the angle of joint 1 is 2.1, the angle of joint 2 is 0.5, and the pusher elongation t is 5.0.
 図6(b)の602は各姿勢603と姿勢対をなしている姿勢604と、その姿勢対のデータを記録しているメモリアドレス605を記録している。例えば、姿勢Bは姿勢A、D、Eに接続されており、その接続の情報はそれぞれメモリ0001、0002、0003に記録されている。 602 in FIG. 6B records a posture 604 that forms a posture pair with each posture 603, and a memory address 605 that records data of the posture pair. For example, posture B is connected to postures A, D, and E, and information on the connection is recorded in memories 0001, 0002, and 0003, respectively.
 図6(c)の606は前記姿勢対の特徴量607を記録している。たとえば、姿勢Bから姿勢Aの接続はメモリ0001に格納されており、そのときの2姿勢間の手先位置のユークリッド距離dは0.6、2姿勢間の関節0の角度変化量δ0は-0.1である。 In FIG. 6C, 606 records the feature quantity 607 of the posture pair. For example, the connection from posture B to posture A is stored in the memory 0001, the Euclidean distance d of the hand position between the two postures at that time is 0.6, and the angle change amount δ 0 of the joint 0 between the two postures is −0.1. .
 <6.軌道算出部>
 図7を使い軌道算出部105を説明する。コスト最小経路探索部701はコストが最小になる軌道を探索する。このとき、経路ごとに衝突判定部702を使い障害物と接触するかを判定し、障害物との接触がある経路は探索から除外する。
<6. Orbit calculation unit>
The trajectory calculation unit 105 will be described with reference to FIG. The minimum cost route search unit 701 searches for a trajectory that minimizes the cost. At this time, for each route, the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route with the obstacle is excluded from the search.
 動作特性記録部703は軌道をロボット200のアクチュエータの駆動信号に変換するのに必要な、ロボット200の種々の動作特性を記録する。ロボットの動作特性のデータは、予め取得して記憶装置203に格納しておくこととする。 The operation characteristic recording unit 703 records various operation characteristics of the robot 200 necessary for converting the trajectory into a drive signal for the actuator of the robot 200. It is assumed that the robot operation characteristic data is acquired in advance and stored in the storage device 203.
 動作信号生成部704は最小コストの軌道を入力とし、ロボット200の動作特性を使い、ロボット200を動作させるための操作信号としてPWM(pulse width modulation: パルス幅変調)信号を出力する。操作信号は、ロボットのアクチュエータを駆動するための信号であれば、PWM信号に制限されない。 The operation signal generation unit 704 receives the trajectory of the minimum cost, uses the operation characteristics of the robot 200, and outputs a PWM (pulse width modulation) signal as an operation signal for operating the robot 200. The operation signal is not limited to the PWM signal as long as it is a signal for driving the actuator of the robot.
 図7を再度使い、軌道が姿勢対の集合である構成における軌道算出部105を説明する。候補軌道のコストは、姿勢という頂点がコストという重みがついた辺によってつながれたグラフだとみなせる。コスト最小経路探索部701は合計コストが最小になるようにダイクストラ法を使い軌道を決定する。ただし、ダイクストラ法は任意の最小コスト探索アルゴリズムでよい。このとき、経路ごとに衝突判定部702を使い障害物と接触するかを判定し、障害物との接触がある経路は探索から除外する。動作特性記録部703と動作信号生成部704は前記の構成と同様である。 FIG. 7 is used again to describe the trajectory calculation unit 105 in a configuration in which the trajectory is a set of posture pairs. The cost of a candidate trajectory can be regarded as a graph in which the vertices of posture are connected by edges with weights of costs. The minimum cost route search unit 701 determines the trajectory using the Dijkstra method so that the total cost is minimized. However, the Dijkstra method may be an arbitrary minimum cost search algorithm. At this time, for each route, the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search. The operation characteristic recording unit 703 and the operation signal generation unit 704 have the same configuration as described above.
 <7.動作部>
 図8により動作部106の例を説明する。センサ部801はロボット200の各関節のエンコーダでロボットの状態を観測する。ただし、センサ部は内界センサ、外界センサのいずれでもよい。コントローラ部802は、ロボット200の状態と命令信号の差分を入力とし、PID制御(Proportional-Integral-Derivative Controller)でモータ出力を決定する。アクチュエータ部803はモータ出力を入力にして、ロボット200のアクチュエータを駆動する。
<7. Operation part>
An example of the operation unit 106 will be described with reference to FIG. The sensor unit 801 observes the state of the robot with encoders of each joint of the robot 200. However, the sensor unit may be either an internal sensor or an external sensor. The controller unit 802 receives the difference between the state of the robot 200 and the command signal, and determines the motor output by PID control (Proportional-Integral-Derivative Controller). The actuator unit 803 inputs the motor output and drives the actuator of the robot 200.
 <8.評価入力部>
 図9に評価入力部107(901)により評価を行う状況を示す。この例では、評価入力部901は「良い」「悪い」の二値入力(数4)を持つスイッチである。
[数4]
Figure JPOXMLDOC01-appb-I000004
<8. Evaluation input section>
FIG. 9 shows a situation where evaluation is performed by the evaluation input unit 107 (901). In this example, the evaluation input unit 901 is a switch having a binary input (Equation 4) of “good” and “bad”.
[Equation 4]
Figure JPOXMLDOC01-appb-I000004
 評価入力部901はオペレータ900がロボット200(902)の動作を観察して得た各候補軌道の評価入力r(この例では「良い」(r=1)、「悪い」(r=0)の二値)を受け取ることで、動作と評価の対を得る。 The evaluation input unit 901 is an evaluation input r k (“good” (r k = 1 in this example), “bad” (r k = in this example) obtained by the operator 900 observing the operation of the robot 200 (902). By receiving the binary value of 0), a pair of action and evaluation is obtained.
 オペレータ900はロボット902の動作を現場で直視することにより、評価をすることができる。また、ロボット902は遠隔地にあり、遠隔地で取得された音声や映像のデータを伝送し、オペレータがモニタすることにより評価をすることもできる。ロボット200(902)の形態は、図2に示したものや、図9に示したものなど種々の形状が考えられる。 The operator 900 can make an evaluation by directly viewing the operation of the robot 902 on site. Further, the robot 902 is in a remote place, and it is possible to perform evaluation by transmitting voice and video data acquired in the remote place and monitoring by an operator. The robot 200 (902) may have various shapes such as the one shown in FIG. 2 and the one shown in FIG.
 評価値は順序尺度であれば2値に限らない。オペレータ900による適切さの評価の例として、ロボット902が正しく目的を完遂できる動作を適切であるとし、ロボット902が目的を完遂できない動作や十分なクリアランスを確保しないなど好ましくない動作を不適切であるとすることが考えられる。 評 価 The evaluation value is not limited to 2 as long as it is an ordinal scale. As an example of appropriateness evaluation by the operator 900, an operation that the robot 902 can correctly accomplish the purpose is appropriate, and an operation that the robot 902 cannot complete the purpose or an unfavorable operation such as not securing sufficient clearance is inappropriate. It can be considered.
 図10を使い、異なる構成における評価入力部107(901)を説明する。評価入力部107は、オペレータ900が過去の動作を参照することでロボット902の動作を観察して得た動作の順位付けの入力を受ける。例えば3段階評価で数値が小さいほど「良い」とする。図10の例では、例えば過去に評価した動作Motion Aの評価を参照し、これと比較して、動作Motion Dの評価をすることができる。ただし、順位付けは等号と不等号を含んでいてよい。評価入力部107はオペレータ900が2つ以上の同仕様のロボット902の動作を比較して得た動作の順位付けの入力を受けてもよい。 The evaluation input unit 107 (901) in a different configuration will be described with reference to FIG. The evaluation input unit 107 receives an operation ranking input obtained by the operator 900 observing the operation of the robot 902 by referring to the past operation. For example, the smaller the numerical value in a three-step evaluation, the better. In the example of FIG. 10, for example, the evaluation of motion Motion 動作 A evaluated in the past can be referred to, and compared with this, the motion Motion D can be evaluated. However, the ranking may include equal and inequality signs. The evaluation input unit 107 may receive an operation ranking input obtained by the operator 900 comparing the operations of two or more robots 902 having the same specifications.
 <9.自動評価部>
 図11はオペレータの関与を不要とする異なる構成の例である。この例においては自動評価部1100を備えている。動作センシング部1101は、例えばロボット902の先端位置を測定する光学式3次元動作測定装置である。ただし、動作センシング部はロボット902自体を測定するセンサであれば、光学式3次元動作測定装置に限らない。環境センシング部1102は、例えばロボット902の動作対象物の位置変化を観測するカメラであり、例えば部品が正しく異なる部品に差込まれているかを判定する。ただし、環境センシングはロボット902の周囲環境を測定するセンサであればカメラに限らない。上の例ではセンシング部を二種類設けているが、センシング部はいずれか片方だけでもよいし、さらに他のセンシング部を追加してもよい。
<9. Automatic evaluation section>
FIG. 11 shows an example of a different configuration that does not require operator involvement. In this example, an automatic evaluation unit 1100 is provided. The motion sensing unit 1101 is an optical three-dimensional motion measurement device that measures the tip position of the robot 902, for example. However, the motion sensing unit is not limited to the optical three-dimensional motion measurement device as long as it is a sensor that measures the robot 902 itself. The environment sensing unit 1102 is, for example, a camera that observes a change in the position of the operation target of the robot 902. For example, the environment sensing unit 1102 determines whether the part is correctly inserted into a different part. However, environment sensing is not limited to a camera as long as it is a sensor that measures the environment surrounding the robot 902. In the above example, two types of sensing units are provided, but only one of the sensing units may be provided, or another sensing unit may be added.
 図12に判定基準記録部1103のデータ構造の例を示す。ロボット902の先端位置が作業対象から所定の距離以上離れているかの基準「A」1201と、部品が正しく部品に挿入されているかの基準「B」1202の両方が満たされた場合、「良い」(「RESULT」の数字の“1”)の評価を行う判定基準が記録されている。判定基準は任意の計算式や条件分岐を含んでも良い。種々のセンシング部から取得できる情報を、判定基準で定義した条件に当てはめることにより、動作の評価を行うことができる。 FIG. 12 shows an example of the data structure of the criterion recording unit 1103. “Good” when both the criterion “A” 1201 indicating whether the tip position of the robot 902 is more than the predetermined distance from the work target and the criterion “B” 1202 indicating whether the component is correctly inserted into the component are satisfied. Criteria for evaluating ("RESULT" number "1") are recorded. The criterion may include an arbitrary calculation formula or conditional branch. The operation can be evaluated by applying information that can be acquired from various sensing units to the conditions defined by the determination criteria.
 判定部1104は動作センシング部1101と環境センシング部1102からの入力と、判定基準1200をもとに動作の評価値を出力する。ここで、評価値は順序尺度であっても、動作の順位付けのいずれでも良い。 The determination unit 1104 outputs an evaluation value of the operation based on the inputs from the operation sensing unit 1101 and the environment sensing unit 1102 and the determination standard 1200. Here, the evaluation value may be either an order scale or a ranking of actions.
 <10.評価解釈部>
 図13を用いて、評価解釈部108を説明する。評価整理部1301はオペレータ900の評価による動作と評価の対を、ユーザの設定した閾値をもとに2つのグループに分割する。本実施例においては、2つのグループはオペレータ900の評価と同一の「良い」「悪い」である。評価が「とても良い」「良い」「悪い」「とても悪い」などの場合は「とても良い」「良い」のグループと「悪い」「とても悪い」の2つのグループにわけるなど、言葉の意味から閾値を決定すれば良い。
<10. Evaluation Interpretation Section>
The evaluation interpretation unit 108 will be described with reference to FIG. The evaluation organizing unit 1301 divides the operation / evaluation pair by the evaluation of the operator 900 into two groups based on a threshold set by the user. In this embodiment, the two groups are “good” and “bad”, which are the same as the evaluation of the operator 900. If the evaluation is “very good”, “good”, “bad”, “very bad”, etc., it is divided into two groups: “very good” “good” and “bad” “very bad”. You just have to decide.
 コスト更新量決定部1302は、「良い」評価でコストが低下し、「悪い」評価でコストが増加するように学習前後のコストの差分r'kを決定する。ただし、r'kは更新前のニューラルネットワークを関数f、更新後のニューラルネットワークをf’としたときにr'kは、(数5)によって定義される。
[数5]
Figure JPOXMLDOC01-appb-I000005
The cost update amount determination unit 1302 determines the cost difference r ′ k before and after learning so that the cost decreases with a “good” evaluation and increases with a “bad” evaluation. However, r 'k is updated before the neural network function f, and the neural network after the update f' r 'k is taken as is defined by equation (5).
[Equation 5]
Figure JPOXMLDOC01-appb-I000005
 ただし、様々な動作のコストに普遍的に関与する動作特徴が存在するため、「良い」評価でコストが低下し、「悪い」評価でコストが増加するだけでは、これらの動作特徴に関係するコストが不安定に変化してしまう。様々な動作のコストに普遍的に関与する動作特徴は、本実施例においては例えば2姿勢間の関節角度の変化量δiなどが考えられる。そこで、「良い」評価でコストが低下し、「悪い」評価でコストが増加することに加え、オペレータの評価に無関係であった「良い」「悪い」いずれでもない動作特徴に関するコストが変化しないようにコストの更新量を決定する。例えば、評価に無関係なニューラルネットワークの重みでは、更新量が打ち消しあうようなコスト更新量を決定する。出力の変化は各重みに線形に作用するため、「良い」、「悪い」の評価がそれぞれNg、Nb回入力されたとき評価に無関係なニューラルネットワークの重みwで更新量Δwが打ち消しあうために、r'k、は所定の学習率αを使い(数6)によって決定される。
[数6]
Figure JPOXMLDOC01-appb-I000006
However, because there are motion features that are universally involved in the costs of various motions, the cost associated with these motion features is simply reduced by a “good” evaluation and increased by a “bad” evaluation. Changes unstable. In the present embodiment, for example, the change amount δ i of the joint angle between the two postures can be considered as the operation feature universally involved in the costs of various operations. Therefore, in addition to the cost being reduced by the “good” evaluation and the cost being increased by the “bad” evaluation, the cost related to the operation feature that is not “good” or “bad” that was not related to the operator's evaluation does not change. Determine the cost update amount. For example, with the weight of the neural network irrelevant to the evaluation, a cost update amount that cancels the update amount is determined. Since the change in output acts linearly on each weight, when the evaluation of “good” and “bad” is input N g and N b times, respectively, the update amount Δw cancels with the weight w of the neural network unrelated to the evaluation. Therefore, r ′ k is determined by (Equation 6) using a predetermined learning rate α.
[Equation 6]
Figure JPOXMLDOC01-appb-I000006
 評価入力部が順位付けを出力する構成における評価解釈部108は、評価順の集合を入力に、Schulze methodを用いて、循環と矛盾が整理された評価順位を出力する。ただし、評価順位の整理は、評価順の集合の循環と矛盾が整理できる選挙方法であればSchulze methodに限らない。 The evaluation interpretation unit 108 in the configuration in which the evaluation input unit outputs the ranking outputs the evaluation order in which the circulation and contradiction are arranged using the Schulze method with the set of evaluation orders as an input. However, the rearrangement of evaluation rank is not limited to Schulze method as long as it is an election method that can arrange the circulation and contradiction of the set of evaluation orders.
 <11.学習部>
 最後に、学習部109は差分r’から(数7)を最小化するように最急降下法を使い軌道選択基準記録部を更新する。
[数7]
Figure JPOXMLDOC01-appb-I000007
<11. Learning Department>
Finally, the learning unit 109 updates the trajectory selection reference recording unit using the steepest descent method so as to minimize (Equation 7) from the difference r ′ k .
[Equation 7]
Figure JPOXMLDOC01-appb-I000007
 ただし、軌道選択基準記録部の更新はAdaGrad、RMSPropなど異なる最適化アルゴリズムでもよい。 However, the trajectory selection reference recording unit may be updated using different optimization algorithms such as AdaGrad and RMSProp.
 評価部が順位付けを出力する構成における学習部109は評価解釈部で出力され、順位付けされた特徴量の組み合わせ{[qg,qb]0, [qg,qb]1, …}から(数8)を最小化するように最急降下法を使い、ニューラルネットワークの各重みを更新する。
ただし、σはシグモイド関数である。
[数8]
Figure JPOXMLDOC01-appb-I000008
The learning unit 109 in the configuration in which the evaluation unit outputs rankings is output by the evaluation interpretation unit, and the combination of ranked feature quantities {[q g , q b ] 0 , [q g , q b ] 1 ,…} From (Equation 8), the steepest descent method is used to minimize each weight of the neural network.
Where σ is a sigmoid function.
[Equation 8]
Figure JPOXMLDOC01-appb-I000008
 <12.コスト初期化部>
 図14は実施例の他の変形例を示す図である。本実施例では図14に示すように、オペレータ定義の初期目的関数を入力に軌道選択基準記録部の初期値を出力するコスト初期化部1401を備えていても良い。初期目的関数は関節の総移動量(数9)である。
[数9]
Figure JPOXMLDOC01-appb-I000009
<12. Cost initialization department>
FIG. 14 is a diagram showing another modification of the embodiment. In this embodiment, as shown in FIG. 14, a cost initialization unit 1401 that outputs an initial value of the trajectory selection reference recording unit with an operator-defined initial objective function as an input may be provided. The initial objective function is the total movement amount of the joint (Equation 9).
[Equation 9]
Figure JPOXMLDOC01-appb-I000009
 ただし、ΔθiとΔtは2姿勢間の関節角度の変化量Δθi=|θ’i-θi|およびΔt=|t’-t|である。初期目的関数は特徴量から実数への写像であれば関節の移動量に限らない。特徴量qを使い、特徴量qiに対する出力がfinit(qi)となるように学習部を使い軌道選択基準を計算する。 However, Δθ i and Δt are the amount of change Δθ i = | θ ′ i −θ i | and Δt = | t′−t | between the two postures. The initial objective function is not limited to the movement amount of the joint as long as it is a mapping from the feature value to the real number. Using the feature quantity q, the trajectory selection criterion is calculated using the learning unit so that the output for the feature quantity q i is f init (q i ).
 図15により、本発明の一実施形態による平面上の車型ロボットシステムについて説明する。車型ロボットは図15のように、X=[x,y,θ]の3自由度のコンフィギュレーション空間を有するロボットである。ただし、(数10)は平面状の座標、(数11)はロボットの向きである。
[数10]
Figure JPOXMLDOC01-appb-I000010
[数11]
Figure JPOXMLDOC01-appb-I000011
With reference to FIG. 15, a planar robotic system according to an embodiment of the present invention will be described. As shown in FIG. 15, the vehicle-type robot is a robot having a configuration space with three degrees of freedom of X = [x, y, θ]. However, (Equation 10) is a planar coordinate, and (Equation 11) is the direction of the robot.
[Equation 10]
Figure JPOXMLDOC01-appb-I000010
[Equation 11]
Figure JPOXMLDOC01-appb-I000011
 図16により平面上の車型ロボットシステムの実施例を説明する。なお、軌道生成部101、軌道コスト計算部103、軌道選択基準記録部104、評価入力部107、評価解釈部108、学習部109は実施例1(図1)の対応する処理を行うことで実施可能になる。 Referring to FIG. 16, an embodiment of a planar robot system will be described. The trajectory generation unit 101, the trajectory cost calculation unit 103, the trajectory selection reference recording unit 104, the evaluation input unit 107, the evaluation interpretation unit 108, and the learning unit 109 are implemented by performing the corresponding processing of the first embodiment (FIG. 1). It becomes possible.
 軌道特徴抽出部1602は、前記軌道から、軌道に含まれる姿勢の値を集計することで特徴量qを作成する。本実施例において特徴量qは2姿勢間のユークリッド距離(数12)、軌道の回転半径R、2姿勢における速度ν0、ν1である。ただし、特徴はロボットの動作に関与している軌道の特性を表す情報であれば、前記の特徴に限らない。
[数12]
Figure JPOXMLDOC01-appb-I000012
The trajectory feature extraction unit 1602 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory. In this embodiment, the feature quantity q is the Euclidean distance (Equation 12) between the two postures, the radius of rotation R of the orbit, and the speeds ν 0 and ν 1 in the two postures. However, the feature is not limited to the above feature as long as it is information representing the characteristics of the trajectory involved in the operation of the robot.
[Equation 12]
Figure JPOXMLDOC01-appb-I000012
 図17を使い軌道算出部1605を説明する。コスト最小経路探索部1701は、軌道生成部101で生成された候補軌道から、軌道コスト計算部103で計算したコストが最小なる軌道を探索し、その軌道を出力する。このとき、経路ごとに衝突判定部1702を使い障害物と接触するかを判定し、障害物との接触がある経路は探索から除外する。 The trajectory calculation unit 1605 will be described with reference to FIG. The minimum cost path search unit 1701 searches for a trajectory with the minimum cost calculated by the trajectory cost calculation unit 103 from the candidate trajectories generated by the trajectory generation unit 101, and outputs the trajectory. At this time, for each route, the collision determination unit 1702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search.
 シミュレーション処理1611は、ロボットが軌道算出部1605で算出した軌道を実世界で行った場合のロボットの動作をシミュレーションする。シミュレーションでは、例えばロボットの動作自体や、ロボットの動作が所定の仮想物理環境に与える影響をシミュレーション結果として出力する。すなわち、シミュレーションは、ロボットに生じる外乱に伴う軌道算出部1605で算出した軌道とシミュレーション内の軌道のずれや、ロボットが環境に与える影響など、ロボットが実世界で動作したときに期待される事象であればロボットの動作に限らない。 The simulation process 1611 simulates the operation of the robot when the robot performs the trajectory calculated by the trajectory calculation unit 1605 in the real world. In the simulation, for example, the robot operation itself and the influence of the robot operation on a predetermined virtual physical environment are output as simulation results. In other words, the simulation is an event that is expected when the robot operates in the real world, such as the deviation between the trajectory calculated by the trajectory calculation unit 1605 and the trajectory in the simulation due to the disturbance that occurs in the robot, and the impact of the robot on the environment. If there is, it is not limited to the operation of the robot.
 表示部1612はディスプレイ上に前記軌道算出部1605で算出した軌道をオペレータに提示する。ディスプレイ上の表示形態としては、図15のように平面上の軌道を表示することができる。例えば車型ロボットは、1501の位置から出発し、1503の位置を経た軌道をとり、1502の位置に到着する。また、表示形態は立体画像に変換して表示してもよい。実施例2では、車型ロボットの例を説明したが、図2のようなSCARAロボットでも同様に実施することができる。 The display unit 1612 presents the trajectory calculated by the trajectory calculation unit 1605 to the operator on the display. As a display form on the display, a trajectory on a plane can be displayed as shown in FIG. For example, the car robot starts from the position 1501, takes a trajectory passing through the position 1503, and arrives at the position 1502. The display form may be converted into a stereoscopic image and displayed. In the second embodiment, the example of the vehicle type robot has been described. However, the SCARA robot as shown in FIG.
 実施例2によれば、実際にロボットを準備あるいは動作させることなく、実施例1と同様の効果を得ることができる。 According to the second embodiment, the same effects as in the first embodiment can be obtained without actually preparing or operating the robot.
 以上説明した本発明の実施例によれば、動作の良し悪しという簡易な評価を入力とし、システムの評価基準を更新することによって、ロボットの知識を持たない作業者でも、想定する動き方を実現することができるという効果がある。入力は、良い悪いといった順序尺度を用いることができるので、入力値を数値化するような複雑な操作も不要である。 According to the embodiment of the present invention described above, a simple evaluation of whether the operation is good or bad is input, and the evaluation standard of the system is updated, so that even an operator who does not have the knowledge of the robot can realize the assumed movement method. There is an effect that can be done. Since an order scale such as good or bad can be used for input, a complicated operation for quantifying the input value is also unnecessary.
 以上の実施例で説明したサーバ201は、単体のコンピュータで構成してもよいし、あるいは、入力装置204、出力装置205、処理装置202、記憶装置203の任意の部分が、ネットワークで接続された他のコンピュータで構成されてもよい。発明の思想としては等価であり、変わるところがない。 The server 201 described in the above embodiments may be configured by a single computer, or any part of the input device 204, the output device 205, the processing device 202, and the storage device 203 are connected via a network. You may comprise with another computer. The idea of the invention is equivalent and unchanged.
 本実施例中、ソフトウェアで構成した機能と同等の機能は、FPGA(Field Programmable Gate Array)、ASIC(Application Specific Integrated Circuit)などのハードウェアでも実現できる。そのような態様も本願発明の範囲に含まれる。 In this embodiment, functions equivalent to those configured by software can be realized by hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Such an embodiment is also included in the scope of the present invention.
 本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることが可能である。また、各実施例の構成の一部について、他の実施例の構成の追加・削除・置換をすることが可能である。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace the configurations of other embodiments with respect to a part of the configurations of the embodiments.
 ロボットの動作計画法および動作計画システムに利用することができる。 It can be used for robot motion planning and motion planning systems.
 軌道生成部:101
 軌道特徴抽出部:102
 軌道コスト計算部:103
 軌道選択基準記録部:104
 軌道算出部:105
 動作部:106
 入力部:107
 評価解釈部:108
 学習部:109
 ロボット:200
 サーバ:201
Orbit generator: 101
Orbit feature extraction unit: 102
Orbit cost calculator: 103
Orbit selection reference recording unit: 104
Orbit calculation unit: 105
Operating part: 106
Input section: 107
Evaluation interpretation section: 108
Learning Department: 109
Robot: 200
Server: 201

Claims (15)

  1.  ロボットの開始状態から所定の目標状態にたどり着くために、前記ロボットが取りうる軌道の候補を1つ以上生成する軌道生成部と、
     前記軌道の候補から、前記軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出部と、
     前記動作特徴から前記軌道の候補の適切さを算出するための軌道選択基準を記録する軌道選択基準記録部と、
     前記動作特徴から、前記軌道選択基準に基づいて、前記軌道の候補の適切さの指標である軌道コストを算出する軌道コスト計算部と、
     前記軌道コストを用いて前記ロボットの動作軌道として採用する軌道の候補を決定し、操作信号を出力する軌道算出部と、
     前記操作信号をもとに前記ロボットを動作させる動作部と、
     前記ロボットの動作結果に対するオペレータの順序尺度に基づく評価の入力を受ける評価入力部と、
     入力された前記評価の傾向から、評価対象となった前記動作結果に対応する前記軌道の候補の軌道コストの更新量を決定する評価解釈部と、
     前記更新量で更新された後の軌道コストに、前記軌道コスト計算部で算出するコストが一致するように、前記軌道選択基準記録部に記録した前記軌道選択基準を変更する学習部を備えた、
     ロボットシステム。
    A trajectory generation unit that generates one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
    From the trajectory candidates, a trajectory feature extraction unit that extracts motion characteristics that are characteristics of the trajectory candidates;
    A trajectory selection criterion recording unit that records trajectory selection criteria for calculating appropriateness of the trajectory candidates from the operation characteristics;
    From the operation characteristics, based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost that is an index of the appropriateness of the trajectory candidates;
    A trajectory calculation unit that determines a trajectory candidate to be adopted as an operation trajectory of the robot using the trajectory cost, and outputs an operation signal;
    An operation unit for operating the robot based on the operation signal;
    An evaluation input unit for receiving an input of an evaluation based on an operator's order scale for the operation result of the robot;
    An evaluation interpretation unit that determines an update amount of a trajectory cost of the trajectory candidate corresponding to the operation result that is an evaluation target from the input evaluation tendency;
    A learning unit that changes the trajectory selection reference recorded in the trajectory selection reference recording unit so that the cost calculated by the trajectory cost calculation unit coincides with the trajectory cost after being updated with the update amount,
    Robot system.
  2.  請求項1のロボットシステムにおいて、
     前記評価入力部は、
     複数回の前記ロボットの動作結果に対して、動作の適切さの順序の入力を受け、
     前記評価解釈部は、
     評価の対象となっている前記動作結果の新しい評価順を決定する、
     請求項1記載のロボットシステム。
    The robot system according to claim 1, wherein
    The evaluation input unit
    In response to the movement results of the robot a plurality of times, an input of the order of appropriateness of movement is received,
    The evaluation interpretation unit is
    Determining a new evaluation order of the operation results to be evaluated;
    The robot system according to claim 1.
  3.  前記軌道生成部は、
     前記ロボットの取りうる姿勢の候補をサンプリングし、サンプリングした前記姿勢の候補の組み合わせから前記軌道の候補を生成する、
     請求項1に記載のロボットシステム。
    The trajectory generator is
    Sampling the possible postures of the robot, and generating the trajectory candidates from the sampled combinations of the posture candidates;
    The robot system according to claim 1.
  4.  前記軌道選択基準記録部は、
     前記軌道生成部で生成した前記軌道の候補より前記軌道特徴抽出部で抽出した動作特徴から、前記軌道の候補の適切さを算出するためのパラメータを記録し、
     前記軌道コスト計算部は、
     前記軌道選択基準記録部に記録された前記パラメータに基づいて該軌道の候補の適切さの指標である前記軌道コストを計算する
     請求項1に記載のロボットシステム。
    The trajectory selection reference recording unit
    From the motion features extracted by the trajectory feature extraction unit from the trajectory candidates generated by the trajectory generation unit, record parameters for calculating the appropriateness of the trajectory candidates,
    The orbit cost calculator
    The robot system according to claim 1, wherein the trajectory cost, which is an index of appropriateness of the trajectory candidate, is calculated based on the parameter recorded in the trajectory selection reference recording unit.
  5.  前記軌道選択基準記録部は、
     前記パラメータとしてニューラルネットワークの重みパラメータを記録し、
     前記軌道コスト計算部は、
     前記ニューラルネットワークの重みパラメータを用い、ニューラルネットワークを使って、前記軌道コストを出力する、
     請求項4に記載のロボットシステム。
    The trajectory selection reference recording unit
    Record the neural network weight parameters as the parameters,
    The orbit cost calculator
    Using the weight parameter of the neural network, the neural network is used to output the trajectory cost.
    The robot system according to claim 4.
  6.  前記評価解釈部は、
     前記評価入力部に入力された順序尺度に基づく前記評価を所定の閾値で分割する評価整理部と、
     前記評価整理部で分割された評価のうち適切な動作とされる側に属する該評価に対応する前記軌道の候補では前記軌道コストが低下するように、もう片側に属する評価に対応する前記軌道の候補では前記軌道コストが増加するように、前記評価入力部で評価対象となった動作特徴のコストの更新量を決定するコスト更新量決定部
    を備えた、
     請求項1記載のロボットシステム。
    The evaluation interpretation unit is
    An evaluation organizing unit that divides the evaluation based on the order scale input to the evaluation input unit by a predetermined threshold;
    Among the evaluations divided by the evaluation organizing unit, the trajectory candidate corresponding to the evaluation belonging to the side that is considered to be appropriate is reduced in the trajectory cost so that the trajectory cost is reduced. The candidate includes a cost update amount determination unit that determines the update amount of the cost of the operation feature that is the evaluation target in the evaluation input unit so that the trajectory cost increases.
    The robot system according to claim 1.
  7.  請求項1に記載のロボットシステムにおいて、
     前記動作部によって生じたロボットの動作に対する評価を決定し、前記評価入力部に入力する自動評価部を備えたロボットシステム。
    The robot system according to claim 1,
    A robot system including an automatic evaluation unit that determines an evaluation of a robot motion generated by the operation unit and inputs the evaluation to the evaluation input unit.
  8.  請求項1に記載のロボットシステムにおいて、
     さらに前記ロボットの軌道の候補を事前に計算し保持する候補軌道記録部を備えたロボットシステム。
    The robot system according to claim 1,
    The robot system further includes a candidate trajectory recording unit that calculates and holds the robot trajectory candidates in advance.
  9.  請求項6に記載のロボットシステムにおいて、
     前記コスト更新量決定部は、さらに、
     前記評価入力部で評価対象となったが該評価に無関係であった動作特徴に関する前記軌道コストが変化しないように、前記評価入力部で評価対象となった動作特徴のコストの更新量を決定する、ロボットシステム。
    The robot system according to claim 6, wherein
    The cost update amount determination unit further includes:
    The update amount of the cost of the motion feature that is the evaluation target is determined by the evaluation input unit so that the trajectory cost related to the motion feature that is the evaluation target in the evaluation input unit but is not related to the evaluation does not change. , Robot system.
  10.  ロボットの開始状態から所定の目標状態にたどり着くために、前記ロボットが取りうる軌道の候補を1つ以上生成する軌道生成部と、
     前記軌道の候補から前記軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出部と、
     前記動作特徴から前記軌道の候補の適切さを算出するための軌道選択基準を記録する軌道選択基準記録部と、
     前記動作特徴から、前記軌道選択基準に基づいて、前記軌道の候補の適切さの指標である軌道コストを算出する軌道コスト計算部と、
     前記軌道コストを用いて前記ロボットの動作軌道として採用する候補の軌道を決定する軌道算出部と、
     前記動作軌道を行う前記ロボットの動作および前記ロボットの動作が所定の仮想物理環境に与える影響の少なくともひとつを所定の計算モデルに基づき計算するシミュレーション部と、
     前記ロボットの動作および前記ロボットの動作が所定の仮想物理環境に与える影響の少なくともひとつをシミュレーション結果として可視化する表示部と、
     前記表示部の前記シミュレーション結果に対するオペレータの順序尺度に基づく評価の入力を受ける評価入力部と、
     入力された前記評価の傾向から、評価対象となった前記動作結果に対応する前記軌道の候補の軌道コストの更新量を決定する評価解釈部と、
     前記更新量で更新された後の軌道コストに、前記軌道コスト計算部で算出するコストが一致するように、前記軌道選択基準記録部に記録した前記軌道選択基準を変更する学習部を備えた、
     ロボットの最適化システム。
    A trajectory generator that generates one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
    A trajectory feature extraction unit that extracts motion features that are characteristics of the trajectory candidates from the trajectory candidates;
    A trajectory selection criterion recording unit that records trajectory selection criteria for calculating appropriateness of the trajectory candidates from the operation characteristics;
    From the operation characteristics, based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost that is an index of the appropriateness of the trajectory candidates;
    A trajectory calculation unit that determines a candidate trajectory to be adopted as the motion trajectory of the robot using the trajectory cost;
    A simulation unit that calculates at least one of the movement of the robot performing the movement trajectory and the influence of the movement of the robot on a predetermined virtual physical environment based on a predetermined calculation model;
    A display unit for visualizing at least one of the movement of the robot and the influence of the movement of the robot on a predetermined virtual physical environment as a simulation result;
    An evaluation input unit that receives an input of an evaluation based on an operator's order scale for the simulation result of the display unit;
    An evaluation interpretation unit that determines an update amount of a trajectory cost of the trajectory candidate corresponding to the operation result that is an evaluation target from the input evaluation tendency;
    A learning unit that changes the trajectory selection reference recorded in the trajectory selection reference recording unit so that the cost calculated by the trajectory cost calculation unit coincides with the trajectory cost after being updated with the update amount,
    Robot optimization system.
  11.  ロボットの開始状態から所定の目標状態にたどり着くために、前記ロボットが取りうる軌道の候補を1つ以上生成する軌道生成処理と、
     前記軌道の候補から、前記軌道の候補の特性となる動作特徴を抽出する軌道特徴抽出処理と、
     前記動作特徴から前記軌道の候補の適切さを算出するための軌道選択基準に基づいて、前記動作特徴から該候補軌道の適切さの指標である軌道コストを算出する軌道コスト計算処理と、
     前記軌道コストを用いて前記ロボットの動作軌道として採用する軌道の候補を決定する軌道算出処理と、
     前記軌道算出処理により決定された前記ロボットの動作軌道に基づいて前記ロボットを動作する実演処理、および、前記ロボットの動作軌道に基づいて前記ロボットの動作をシミュレーションするシミュレーション処理の少なくともひとつと、
     前記実演処理および前記シミュレーション処理の少なくともひとつに対する、順序尺度による評価入力を受け、前記評価入力に基づいて前記軌道選択基準を変更する学習処理と、を行う、
     ロボットの動作計画学習方法。
    A trajectory generation process for generating one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
    A trajectory feature extraction process for extracting, from the trajectory candidates, motion features that are characteristics of the trajectory candidates;
    Based on a trajectory selection criterion for calculating the appropriateness of the trajectory candidate from the motion feature, a trajectory cost calculation process for calculating a trajectory cost that is an index of the appropriateness of the candidate trajectory from the motion feature;
    A trajectory calculation process for determining a trajectory candidate to be adopted as an operation trajectory of the robot using the trajectory cost;
    At least one of a demonstration process for operating the robot based on the motion trajectory of the robot determined by the trajectory calculation process, and a simulation process for simulating the motion of the robot based on the motion trajectory of the robot;
    A learning process that receives an evaluation input based on an order scale for at least one of the demonstration process and the simulation process, and changes the trajectory selection criterion based on the evaluation input.
    Robot motion plan learning method.
  12.  前記順序尺度による評価入力に基づいて、評価対象となった前記動作結果に対応する前記動作特徴の軌道コストの更新量を決定する評価解釈処理を行い、
     前記学習処理において、
     前記評価解釈処理で決定した更新後のコストに前記軌道コスト計算処理で算出するコストが一致するように前記軌道選択基準を変更する、
     請求項11記載のロボットの動作計画学習方法。
    Based on the evaluation input by the order scale, performing an evaluation interpretation process for determining an update amount of the trajectory cost of the motion feature corresponding to the motion result that is an evaluation target,
    In the learning process,
    Changing the trajectory selection criteria so that the cost calculated in the trajectory cost calculation process matches the updated cost determined in the evaluation interpretation process;
    The robot motion plan learning method according to claim 11.
  13.  前記軌道生成処理は、
     前記ロボットの取りうる姿勢の候補をサンプリングし、サンプリングした前記姿勢の候補の組み合わせから前記軌道の候補を生成する、
     請求項11記載のロボットの動作計画学習方法。
    The trajectory generation process is
    Sampling the possible postures of the robot, and generating the trajectory candidates from the sampled combinations of the posture candidates;
    The robot motion plan learning method according to claim 11.
  14.  前記シミュレーション処理は、
     前記動作軌道を行う前記ロボットの動作および前記ロボットの動作が所定の仮想物理環境に与える影響の少なくともひとつを所定の計算モデルに基づき計算する、
     請求項11記載のロボットの動作計画学習方法。
    The simulation process includes
    Calculating at least one of the movement of the robot performing the movement trajectory and the influence of the movement of the robot on a predetermined virtual physical environment based on a predetermined calculation model;
    The robot motion plan learning method according to claim 11.
  15.  前記評価入力を行うために自動評価処理を行い、
     前記自動評価処理は、
     前記動作軌道を行う前記ロボットの動作の状態および前記ロボットの動作環境の状態を検出し、検出結果を所定の判定基準に基づいて評価する、
     請求項11記載のロボットの動作計画学習方法。
    An automatic evaluation process is performed to perform the evaluation input,
    The automatic evaluation process includes:
    Detecting the state of operation of the robot performing the operation trajectory and the state of the operation environment of the robot, and evaluating the detection result based on a predetermined criterion;
    The robot motion plan learning method according to claim 11.
PCT/JP2016/052979 2016-02-02 2016-02-02 Robot system, robot optimization system, and robot operation plan learning method WO2017134735A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/052979 WO2017134735A1 (en) 2016-02-02 2016-02-02 Robot system, robot optimization system, and robot operation plan learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/052979 WO2017134735A1 (en) 2016-02-02 2016-02-02 Robot system, robot optimization system, and robot operation plan learning method

Publications (1)

Publication Number Publication Date
WO2017134735A1 true WO2017134735A1 (en) 2017-08-10

Family

ID=59500337

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/052979 WO2017134735A1 (en) 2016-02-02 2016-02-02 Robot system, robot optimization system, and robot operation plan learning method

Country Status (1)

Country Link
WO (1) WO2017134735A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019155554A (en) * 2018-03-14 2019-09-19 オムロン株式会社 Control device of robot
CN111195906A (en) * 2018-11-20 2020-05-26 西门子工业软件有限公司 Method and system for predicting motion trajectory of robot
CN112894822A (en) * 2021-02-01 2021-06-04 配天机器人技术有限公司 Robot motion trajectory planning method, robot and computer storage medium
CN113260936A (en) * 2018-12-26 2021-08-13 三菱电机株式会社 Mobile body control device, mobile body control learning device, and mobile body control method
US20220143829A1 (en) * 2020-11-10 2022-05-12 Kabushiki Kaisha Yaskawa Denki Determination of robot posture
CN113260936B (en) * 2018-12-26 2024-05-07 三菱电机株式会社 Moving object control device, moving object control learning device, and moving object control method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04112302A (en) * 1990-09-03 1992-04-14 Matsushita Electric Ind Co Ltd Fuzzy inference device
JPH0561844A (en) * 1991-08-01 1993-03-12 Fujitsu Ltd Self-learning processing system for adaptive data processor
JPH11175132A (en) * 1997-12-15 1999-07-02 Omron Corp Robot, robot system, learning method for robot, learning method for robot system, and recording medium
JP2002269530A (en) * 2001-03-13 2002-09-20 Sony Corp Robot, behavior control method of the robot, program and storage medium
JP2013193194A (en) * 2012-03-22 2013-09-30 Toyota Motor Corp Track generating apparatus, moving body, track generating method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04112302A (en) * 1990-09-03 1992-04-14 Matsushita Electric Ind Co Ltd Fuzzy inference device
JPH0561844A (en) * 1991-08-01 1993-03-12 Fujitsu Ltd Self-learning processing system for adaptive data processor
JPH11175132A (en) * 1997-12-15 1999-07-02 Omron Corp Robot, robot system, learning method for robot, learning method for robot system, and recording medium
JP2002269530A (en) * 2001-03-13 2002-09-20 Sony Corp Robot, behavior control method of the robot, program and storage medium
JP2013193194A (en) * 2012-03-22 2013-09-30 Toyota Motor Corp Track generating apparatus, moving body, track generating method, and program

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019155554A (en) * 2018-03-14 2019-09-19 オムロン株式会社 Control device of robot
WO2019176477A1 (en) * 2018-03-14 2019-09-19 オムロン株式会社 Robot control device
US11673266B2 (en) 2018-03-14 2023-06-13 Omron Corporation Robot control device for issuing motion command to robot on the basis of motion sequence of basic motions
CN111195906A (en) * 2018-11-20 2020-05-26 西门子工业软件有限公司 Method and system for predicting motion trajectory of robot
CN111195906B (en) * 2018-11-20 2023-11-28 西门子工业软件有限公司 Method and system for predicting motion trail of robot
CN113260936A (en) * 2018-12-26 2021-08-13 三菱电机株式会社 Mobile body control device, mobile body control learning device, and mobile body control method
CN113260936B (en) * 2018-12-26 2024-05-07 三菱电机株式会社 Moving object control device, moving object control learning device, and moving object control method
US20220143829A1 (en) * 2020-11-10 2022-05-12 Kabushiki Kaisha Yaskawa Denki Determination of robot posture
US11717965B2 (en) * 2020-11-10 2023-08-08 Kabushiki Kaisha Yaskawa Denki Determination of robot posture
CN112894822A (en) * 2021-02-01 2021-06-04 配天机器人技术有限公司 Robot motion trajectory planning method, robot and computer storage medium
CN112894822B (en) * 2021-02-01 2023-12-15 配天机器人技术有限公司 Robot motion trail planning method, robot and computer storage medium

Similar Documents

Publication Publication Date Title
Long et al. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning
Francis et al. Long-range indoor navigation with prm-rl
Kyrarini et al. Robot learning of industrial assembly task via human demonstrations
Bency et al. Neural path planning: Fixed time, near-optimal path generation via oracle imitation
Fu et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors
JP6951659B2 (en) Task execution system, task execution method, and its learning device and learning method
US9387589B2 (en) Visual debugging of robotic tasks
US20180036882A1 (en) Layout setting method and layout setting apparatus
Petrič et al. Smooth continuous transition between tasks on a kinematic control level: Obstacle avoidance as a control problem
WO2017134735A1 (en) Robot system, robot optimization system, and robot operation plan learning method
Frank et al. Efficient motion planning for manipulation robots in environments with deformable objects
WO2019009350A1 (en) Route output method, route output system and route output program
Kshirsagar et al. Specifying and synthesizing human-robot handovers
JP7295421B2 (en) Control device and control method
CN115605326A (en) Method for controlling a robot and robot controller
JP2020508888A (en) System, apparatus and method for robot to learn and execute skills
Frank et al. Using gaussian process regression for efficient motion planning in environments with deformable objects
Thomaz et al. Mobile robot path planning using genetic algorithms
JP2020196102A (en) Control device, system, learning device and control method
Sturm et al. Unsupervised body scheme learning through self-perception
Tarbouriech et al. Bi-objective motion planning approach for safe motions: Application to a collaborative robot
KR102332314B1 (en) A coordinate value calibration device between robot and camera and a calibration method thereof
JP7263987B2 (en) Control device, control method, and control program
Sturm et al. Adaptive body scheme models for robust robotic manipulation.
Maldonado-Valencia et al. Planning and visual-servoing for robotic manipulators in ROS

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16889225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16889225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP