WO2017134735A1

WO2017134735A1 - Robot system, robot optimization system, and robot operation plan learning method

Info

Publication number: WO2017134735A1
Application number: PCT/JP2016/052979
Authority: WO
Inventors: 祐太是枝; 敬介藤本; 潔人伊藤; 宣隆木村
Original assignee: 株式会社日立製作所
Priority date: 2016-02-02
Filing date: 2016-02-02
Publication date: 2017-08-10

Abstract

Disclosed is a robot operation plan learning method which comprises: a trajectory generation process for generating one or more candidates for the trajectory to be tracked by a robot so that the robot is brought into a predetermined target position from a start position; a trajectory characteristic extraction process for extracting an operational characteristic, serving as a trajectory candidate property, from each of the trajectory candidates; a trajectory cost calculation process for calculating trajectory costs, which serve as an index of the appropriateness of the candidate trajectory from operational characteristics, on the basis of a trajectory selection reference for calculating the appropriateness of the trajectory candidate from the operational characteristic; a trajectory calculation process for determining a trajectory candidate to be adopted as the operation trajectory of the robot using the trajectory cost; a demonstration process for operating the robot on the basis of the robot operation trajectory determined by the trajectory calculation process, and/or a simulation process for performing a simulation of the operation of the robot on the basis of the robot operation trajectory; and a learning process for receiving an evaluation, by ordinal scale, performed on the demonstration process and/or the simulation process, and changing the trajectory selection reference on the basis of the evaluation input.

Description

Robot system, robot optimization system, and robot motion plan learning method

The present invention relates to a robot motion planning method and motion planning system.

Robot movement generation is most easily performed by teaching playback. In teaching playback, a remote controller called teaching pendant is used, or a person directly grabs and operates the robot to record the trajectory (posture set that interpolates the initial posture and the target posture) (teaching). The trajectory is reproduced faithfully (playback) (Patent Document 1).

In order to respond to changes in the environment and changes in the target posture, a technique called motion planning that dynamically generates a trajectory according to the environment, initial posture, and target posture is known. The motion plan is a technology that determines the trajectory of the motion of the robot based on some criteria with the input of the initial posture, the target posture and the surrounding environment. The motion planning problem is a problem of searching for a route from a certain point to a different point in a car robot. The motion planning problem is a problem of searching for how to move to the specified posture in an industrial arm type robot.

The motion planning method currently used is a method of sampling an infinite number of postures that interpolate an initial posture and a target posture, and determining a trajectory so as to minimize the cost based on a predetermined cost function (Patent Document 2). . In the approach based on reinforcement learning, the current posture of the robot and the action pair to be taken at that time are generated without explicitly obtaining the motion trajectory (Patent Document 3).

As a prior art disclosing a method for supporting generation of a robot motion by means such as learning, there is a pet robot (Patent Document 4) that optimizes selection of a reaction expected by an operator from operator evaluation.

JP 2006-346792 A Japanese Patent Laying-Open No. 2015-160253 JP 2005-56185 A JP-A-11-175132

In the motion planning method based on the attitude sampling disclosed in Patent Document 2, it is necessary to describe the cost function using mathematical formulas and control syntax for each shape and position of the work object. Since the trajectory of the robot's movement is innumerable, the criteria for describing it are extremely flexible. In order to determine an optimal cost function and describe it in the form of a cost function, adjustment of the cost function based on advanced expertise and trial and error is essential.

In the motion planning method based on reinforcement learning disclosed in Patent Document 3, the initial posture and the target posture are fixedly set. Therefore, when the shape and position of the work object and the work environment change, it is necessary to redo the setting work, and the operation planning method cannot be substituted.

In the method of optimizing the behavior disclosed in Patent Document 4 from the external evaluation, the operation is only selected from a predetermined list, and the method for acquiring the operation not explicitly given is not disclosed.

One aspect of the present invention for solving the above problems is that a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate From the trajectory feature extraction unit that extracts the motion feature that is the characteristic of the trajectory candidate, the trajectory selection reference recording unit that records the trajectory selection criterion for calculating the appropriateness of the trajectory candidate from the motion feature, and the motion feature, Based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost, which is an index of the appropriateness of trajectory candidates, and a trajectory candidate to be used as a robot motion trajectory are determined using the trajectory cost, and an operation signal is determined. An output trajectory calculation unit, an operation unit that operates the robot based on the operation signal, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the operation result of the robot Based on the input evaluation trend, the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result to be evaluated, and the orbit cost after being updated with the update amount The robot system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the costs calculated by the calculation unit coincide.

Another aspect of the present invention provides a trajectory generation unit that generates one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and the trajectory candidates from the trajectory candidates. Based on the trajectory selection criteria, the trajectory selection extraction unit that extracts the trajectory feature that is the characteristic, the trajectory selection criteria recording unit that records the trajectory selection criteria for calculating the appropriateness of the trajectory candidate from the motion features, The trajectory cost calculation unit that calculates the trajectory cost that is an index of the appropriateness of the trajectory candidate, the trajectory calculation unit that determines the candidate trajectory to be adopted as the robot trajectory using the trajectory cost, and the motion trajectory A simulation unit that calculates at least one of the robot movement and the influence of the robot movement on the predetermined virtual physical environment based on a predetermined calculation model, and the robot movement And a display unit that visualizes at least one of the effects of robot motion on a predetermined virtual physical environment as a simulation result, and an evaluation input unit that receives an input of an evaluation based on an operator's order scale for the simulation result of the display unit. From the evaluation trend, the evaluation interpretation unit that determines the update amount of the orbit cost of the orbit candidate corresponding to the operation result that is the evaluation target, and the orbit cost after the update amount is updated by the orbit cost calculation unit The robot optimization system includes a learning unit that changes the trajectory selection criterion recorded in the trajectory selection criterion recording unit so that the calculated costs match.

Another aspect of the present invention provides a trajectory generation process for generating one or more trajectory candidates that the robot can take in order to reach a predetermined target state from the start state of the robot, and a trajectory candidate from a trajectory candidate. The trajectory cost, which is an index of the appropriateness of the candidate trajectory from the motion features, based on the trajectory feature extraction process for extracting the motion features that are characteristics and the trajectory selection criteria for calculating the appropriateness of the trajectory candidates from the motion features The trajectory cost calculation process for calculating the trajectory, the trajectory calculation process for determining the trajectory candidate to be adopted as the robot trajectory using the trajectory cost, and operating the robot based on the robot trajectory determined by the trajectory calculation process Demonstration process and at least one of simulation process for simulating robot movement based on robot trajectory and demonstration process And for at least one of the simulation process, receiving the evaluation input by ordinal scale, performing a learning process to change the trajectory selection criteria based on the evaluation input is an operation plan learning method of the robot.

∙ Provide a method for easily setting the robot movement method that does not depend on the target posture, without requiring explicit description of the robot operation procedure.

Configuration diagram of an embodiment of the present invention SCARA robot perspective view Block diagram showing the trajectory generator Block diagram showing the orbital cost calculator Block diagram showing the candidate trajectory recording unit Table showing the data structure of the candidate trajectory recording unit Block diagram showing the trajectory calculation unit Block diagram showing the operating unit The perspective view which shows the condition which uses an evaluation input part Plan view of evaluation input unit with different configuration Block diagram showing the automatic evaluation unit Table showing the data structure of the criteria record Block diagram showing the evaluation interpreter Block diagram showing the initial function part Plan view showing the operation of a car robot Block diagram showing configurations in different configurations Block diagram showing the trajectory calculation unit

Embodiments will be described in detail with reference to the drawings. However, the present invention is not construed as being limited to the description of the embodiments below. Those skilled in the art will readily understand that the specific configuration can be changed without departing from the spirit or the spirit of the present invention.

In the structure of the invention described below, the same portions or portions having similar functions are denoted by the same reference numerals in different drawings, and redundant description may be omitted.

In this specification and the like, notations such as “first”, “second”, and “third” are attached to identify the constituent elements, and do not necessarily limit the number or order. In addition, a number for identifying a component is used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Further, it does not preclude that a component identified by a certain number also functions as a component identified by another number.

The position, size, shape, range, etc. of each component shown in the drawings and the like may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. For this reason, the present invention is not necessarily limited to the position, size, shape, range, and the like disclosed in the drawings and the like.

<1. Overall configuration>
A SCARA (Selective Compliance Assembly Robot Arm: horizontal articulated) robot system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2.

FIG. 2 is a perspective view of an example of the SCARA robot. The SCARA robot is, for example, a robot 200 having four joints as shown in FIG. The shape of the robot is not limited to the configuration shown in FIG. 2, and may have five or more joints, or may be provided with other driving units such as a gripper. The robot 200 can constitute all or part of the operation unit 106. Alternatively, the robot 200 may be remote and operate according to a command from the operation unit 106.

FIG. 1 shows an example of a system for controlling the robot 200 having the configuration shown in FIG. This system can be configured, for example, by a server 201 connected to the robot 200 of FIG. 2 directly or via a network. As is well known, the server 201 includes a processing device 202, a storage device 203, an input device 204, and an output device 205. The storage device 203 can be configured by a known magnetic disk device, a semiconductor memory, or a combination thereof.

In the present embodiment, functions such as calculation and control are realized in cooperation with other hardware by executing a program (software) stored in the storage device 203 by the processing device 202. The A program executed by the server 201, its function, or means for realizing the function may be referred to as “function”, “means”, “part”, “unit”, “module”, or the like. In addition, a function in which the storage device 203 of the server 201 stores specific data or a means for realizing the function may be referred to as a “recording unit”.

The trajectory generation unit 101 generates one or more trajectory candidates that the robot 200 can take in order to reach a predetermined target state (target position) from a predetermined start state. The trajectory candidates are selected so as to satisfy the conditions such that the robot 200 does not collide with surrounding objects and the robot 200 itself, and each joint follows the robot motion model. This will be described in detail later with reference to FIG.

The trajectory feature extraction unit 102 extracts the motion features of each candidate trajectory (an index that well represents the trajectory properties) by referring to the values of variables that define the trajectory.

The trajectory selection criterion recording unit 104 records, as trajectory selection criteria, a calculation criterion for calculating, as a quantitative value, whether or not the trajectory is favorable for the operator from the motion characteristics of the robot motion. . The calculation standard can be expressed by, for example, a weight parameter between each neuron of the neural network. However, it is not limited to the above as long as it is a criterion for evaluating the appropriateness from the motion characteristics of the trajectory.

The trajectory cost calculation unit 103 receives the motion features extracted by the trajectory feature extraction unit 102 from the candidate trajectories generated by the trajectory generation unit 101. The trajectory cost calculation unit 103 calculates the appropriateness of the candidate trajectory as a quantitative numerical value by applying the trajectory selection criterion recorded in the trajectory selection criterion recording unit 104 to the input motion feature, and calculates this value. Output as cost. The track cost calculation unit 103 will be described in detail later with reference to FIG.

That is, the trajectory selection reference recording unit 104 records the conversion parameters for calculating the appropriateness of the trajectory candidates from the motion features extracted by the trajectory feature extraction unit 102 from the trajectory candidates generated by the trajectory generation unit 101. Yes. The trajectory cost calculation unit 103 calculates a trajectory cost that is an index of the appropriateness of trajectory candidates based on the conversion parameter.

The trajectory calculation unit 105 determines the trajectory of the robot 200 so as to reduce the cost, and outputs a motor signal that moves the robot 200 according to the trajectory. This will be described in detail later with reference to FIG.

The operation unit 106 drives the four joints of the SCARA robot 200 based on the motor signal. Based on the operation of the robot 200 generated by the operation unit, for example, the operator gives an input of “good” or “bad” by the evaluation input unit 107. This will be described in detail later with reference to FIG.

The evaluation interpretation unit 108 newly determines a cost corresponding to the motion feature of the motion that has been evaluated by using the tendency of the evaluation based on a plurality of evaluations from the operator. As a specific example, the new cost output by the evaluation interpretation unit 108 increases the cost of the feature of the motion trajectory that received the “bad” evaluation, and decreases the cost of the feature of the motion trajectory that received the “good” rating. In addition, the cost of the motion trajectory that was irrelevant to the evaluation is determined so as not to change.

The learning unit 109 changes the trajectory selection criterion of the trajectory selection criterion recording unit 104 so that the cost calculated by the trajectory cost calculation unit 103 matches the cost calculated by the evaluation interpretation unit 108. By repeating the increase / decrease in costs and learning, the trajectory selection criteria for the motion features that show a strong evaluation tendency are changed.

Hereinafter, the function of the learning unit 109 will be described with a simple example. The motion feature input to the trajectory cost calculation unit 103 is, for example, a distance x between the tip of the robot and the operation target. The cost calculated by the trajectory cost calculation unit 103 can be calculated by c (x) which is a function for calculating the cost. The trajectory selection criterion recording unit 104 stores the definition of c (x) as a calculation criterion. For example, if the distance x is 10 mm, 20 mm, and 30 mm, the cost for each is c (10), c (20), and c (30). Here, it is assumed that the cost calculated by the trajectory cost calculation unit 103 is c (10) <c (20) <c (30). The cost indicates that the smaller value is the “good” trajectory. On the other hand, it is assumed that the operator's evaluation input by the evaluation input unit 107 is 20 mm, and the evaluation is “good” in the order of 10 mm and 30 mm thereafter. The operator's evaluation is an order scale indicating the order of candidates. In this case, the learning unit 109 corrects the calculation criterion c (x) of the trajectory selection criterion recording unit 104 so as to approach the operator's evaluation. In the above example, c (x) is changed so that c (20) <c (10) <c (30). Although the above example is a simple example, a more complicated index can be adopted as the evaluation feature.

<2. Orbit generator>
The robot 200 having three joints and one push-down member (pusher) in FIG. 2 will be described as an example. The trajectory generation unit 101 connects the current posture [θ ₀ ⁰ , θ ₁ ⁰ , θ ₂ ⁰ , t ⁰ ] to the target posture [θ ₀ ^L , θ ₁ ^L , θ ₂ ^L , t ^L ] given by the operator. A plurality of trajectories {[θ ₀ ⁱ , θ ₁ ⁱ , θ ₂ ⁱ , t ⁱ ] | i = 0, 1,..., L} are generated. However, these postures and trajectories may have differential elements (velocity, acceleration) of the respective variables and conditions other than the posture, and are not limited to the above as long as they are elements for constituting the trajectory of the robot 200. Further, the target posture may be a set of a plurality of postures. The setting of the target state is not limited to that given by the operator, and may be automatically given from a robot control system or the like.

A configuration example of the trajectory generation unit 101 will be described with reference to FIG. The posture sampling unit 301 randomly samples posture x = [θ ₀ , θ ₁ , θ ₂ , t] from a configuration space composed of a set of postures that the robot can take.

In this embodiment, the robot 200 in FIG. 2 is described as an example. As shown in FIG. 2, θ _i is the angle of each joint (Equation 1), and t is the pushing amount of the robot tip (Equation 2).
[Equation 1]

[Equation 2]

¡Sampling of postures need not be limited to random, such as sampling at regular intervals or sampling based on a predetermined rule such as a quasi-random number.

The posture pair generation unit 302 generates a plurality of posture pairs from neighboring (Equation 3) postures.
[Equation 3]

However, the definition of the neighborhood may be a primary norm. It is assumed that ε uses a threshold set in advance by an operator or the like. The larger this set value ε, the more the number of posture pairs increases, leading to an increase in calculation time, while the smaller the set value ε, the smaller the number of posture pairs and the better the trajectory cannot be obtained. As a reference for setting the value, for example, a value may be given such that the number of pairs of predetermined postures is about 10 on average. Note that the value setting example is not limited to the above example.

<3. Orbital feature extraction unit>
The trajectory feature extraction unit 102 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory. In this embodiment, the feature quantity q is the Euclidean distance d between the positions of the hand (pusher tip) between the two postures, the change amount δ _i of the joint angle between the two postures, the velocity ν _i of each joint between the two postures, the obstacle from the robot Or at least one selected from the minimum distance l (clearance) or a combination thereof. However, as long as the feature is information representing the characteristics of the trajectory involved in the robot operation, various features can be adopted without being limited to the above feature.

<4. Orbital Cost Calculator>
The trajectory cost calculation unit 103 will be described with reference to FIG. The input preprocessing unit 401 applies PCA (principal component analysis) to the feature quantity q, and performs decorrelation of the input. The calculation unit 402 receives the uncorrelated feature value as an input, and uses the neural network weight parameter recorded in the trajectory selection reference recording unit 104, and outputs a numerical value (cost) corresponding to the feature value q by the neural network. However, the cost calculation index used by the calculation unit 402 is not limited to the neural network as long as it is a function that can change the contribution rate of each numerical value of the feature value by a parameter such as linear combination of the numerical values of the feature value q or random forest regression. . For example, when the processing by the calculation unit 402 is linear combination, the trajectory selection reference recording unit 104 records the coefficient of each feature amount, and if it is random forest regression, records the configuration of the decision tree. In this way, the cost is calculated for each trajectory candidate.

<5. Candidate orbit recording part (option)>
FIG. 5 shows a modification of the embodiment of FIG. The present embodiment may include a candidate trajectory recording unit 501 as shown in FIG. In the configuration of FIG. 4, the trajectory generation unit 101 generates a candidate trajectory for each execution, but the candidate trajectory recording unit 501 records the trajectory candidates and feature quantities q calculated in advance by the trajectory generation unit 101 and the trajectory feature extraction unit 102. During the operation of the robot, the trajectory cost calculation unit 103 uses the recorded trajectory candidates and the feature quantity q for cost calculation.

FIG. 6 shows the data structure of the candidate trajectory recording unit 501. Reference numeral 601 in FIG. 6A denotes a posture data structure sampled by the posture sampling unit 301. Each row of 601 indicates all variables of one posture. For example, in posture A, the angle of joint 0 is 0.1, the angle of joint 1 is 2.1, the angle of joint 2 is 0.5, and the pusher elongation t is 5.0.

602 in FIG. 6B records a posture 604 that forms a posture pair with each posture 603, and a memory address 605 that records data of the posture pair. For example, posture B is connected to postures A, D, and E, and information on the connection is recorded in

memories

0001, 0002, and 0003, respectively.

In FIG. 6C, 606 records the feature quantity 607 of the posture pair. For example, the connection from posture B to posture A is stored in the memory 0001, the Euclidean distance d of the hand position between the two postures at that time is 0.6, and the angle change amount δ ₀ of the joint 0 between the two postures is −0.1. .

<6. Orbit calculation unit>
The trajectory calculation unit 105 will be described with reference to FIG. The minimum cost route search unit 701 searches for a trajectory that minimizes the cost. At this time, for each route, the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route with the obstacle is excluded from the search.

The operation characteristic recording unit 703 records various operation characteristics of the robot 200 necessary for converting the trajectory into a drive signal for the actuator of the robot 200. It is assumed that the robot operation characteristic data is acquired in advance and stored in the storage device 203.

The operation signal generation unit 704 receives the trajectory of the minimum cost, uses the operation characteristics of the robot 200, and outputs a PWM (pulse width modulation) signal as an operation signal for operating the robot 200. The operation signal is not limited to the PWM signal as long as it is a signal for driving the actuator of the robot.

FIG. 7 is used again to describe the trajectory calculation unit 105 in a configuration in which the trajectory is a set of posture pairs. The cost of a candidate trajectory can be regarded as a graph in which the vertices of posture are connected by edges with weights of costs. The minimum cost route search unit 701 determines the trajectory using the Dijkstra method so that the total cost is minimized. However, the Dijkstra method may be an arbitrary minimum cost search algorithm. At this time, for each route, the collision determination unit 702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search. The operation characteristic recording unit 703 and the operation signal generation unit 704 have the same configuration as described above.

<7. Operation part>
An example of the operation unit 106 will be described with reference to FIG. The sensor unit 801 observes the state of the robot with encoders of each joint of the robot 200. However, the sensor unit may be either an internal sensor or an external sensor. The controller unit 802 receives the difference between the state of the robot 200 and the command signal, and determines the motor output by PID control (Proportional-Integral-Derivative Controller). The actuator unit 803 inputs the motor output and drives the actuator of the robot 200.

<8. Evaluation input section>
FIG. 9 shows a situation where evaluation is performed by the evaluation input unit 107 (901). In this example, the evaluation input unit 901 is a switch having a binary input (Equation 4) of “good” and “bad”.
[Equation 4]

The evaluation input unit 901 is an evaluation input r _k (“good” (r _k = 1 in this example), “bad” (r _k = in this example) obtained by the operator 900 observing the operation of the robot 200 (902). By receiving the binary value of 0), a pair of action and evaluation is obtained.

The operator 900 can make an evaluation by directly viewing the operation of the robot 902 on site. Further, the robot 902 is in a remote place, and it is possible to perform evaluation by transmitting voice and video data acquired in the remote place and monitoring by an operator. The robot 200 (902) may have various shapes such as the one shown in FIG. 2 and the one shown in FIG.

評価 The evaluation value is not limited to 2 as long as it is an ordinal scale. As an example of appropriateness evaluation by the operator 900, an operation that the robot 902 can correctly accomplish the purpose is appropriate, and an operation that the robot 902 cannot complete the purpose or an unfavorable operation such as not securing sufficient clearance is inappropriate. It can be considered.

The evaluation input unit 107 (901) in a different configuration will be described with reference to FIG. The evaluation input unit 107 receives an operation ranking input obtained by the operator 900 observing the operation of the robot 902 by referring to the past operation. For example, the smaller the numerical value in a three-step evaluation, the better. In the example of FIG. 10, for example, the evaluation of motion Motion 動作 A evaluated in the past can be referred to, and compared with this, the motion Motion D can be evaluated. However, the ranking may include equal and inequality signs. The evaluation input unit 107 may receive an operation ranking input obtained by the operator 900 comparing the operations of two or more robots 902 having the same specifications.

<9. Automatic evaluation section>
FIG. 11 shows an example of a different configuration that does not require operator involvement. In this example, an automatic evaluation unit 1100 is provided. The motion sensing unit 1101 is an optical three-dimensional motion measurement device that measures the tip position of the robot 902, for example. However, the motion sensing unit is not limited to the optical three-dimensional motion measurement device as long as it is a sensor that measures the robot 902 itself. The environment sensing unit 1102 is, for example, a camera that observes a change in the position of the operation target of the robot 902. For example, the environment sensing unit 1102 determines whether the part is correctly inserted into a different part. However, environment sensing is not limited to a camera as long as it is a sensor that measures the environment surrounding the robot 902. In the above example, two types of sensing units are provided, but only one of the sensing units may be provided, or another sensing unit may be added.

FIG. 12 shows an example of the data structure of the criterion recording unit 1103. “Good” when both the criterion “A” 1201 indicating whether the tip position of the robot 902 is more than the predetermined distance from the work target and the criterion “B” 1202 indicating whether the component is correctly inserted into the component are satisfied. Criteria for evaluating ("RESULT" number "1") are recorded. The criterion may include an arbitrary calculation formula or conditional branch. The operation can be evaluated by applying information that can be acquired from various sensing units to the conditions defined by the determination criteria.

The determination unit 1104 outputs an evaluation value of the operation based on the inputs from the operation sensing unit 1101 and the environment sensing unit 1102 and the determination standard 1200. Here, the evaluation value may be either an order scale or a ranking of actions.

<10. Evaluation Interpretation Section>
The evaluation interpretation unit 108 will be described with reference to FIG. The evaluation organizing unit 1301 divides the operation / evaluation pair by the evaluation of the operator 900 into two groups based on a threshold set by the user. In this embodiment, the two groups are “good” and “bad”, which are the same as the evaluation of the operator 900. If the evaluation is “very good”, “good”, “bad”, “very bad”, etc., it is divided into two groups: “very good” “good” and “bad” “very bad”. You just have to decide.

The cost update amount determination unit 1302 determines the cost difference r ′ _k before and after learning so that the cost decreases with a “good” evaluation and increases with a “bad” evaluation. However, r _'k is updated before the neural network function f, and the neural network after the update f' r _'k is taken as is defined by equation (5).
[Equation 5]

However, because there are motion features that are universally involved in the costs of various motions, the cost associated with these motion features is simply reduced by a “good” evaluation and increased by a “bad” evaluation. Changes unstable. In the present embodiment, for example, the change amount δ _i of the joint angle between the two postures can be considered as the operation feature universally involved in the costs of various operations. Therefore, in addition to the cost being reduced by the “good” evaluation and the cost being increased by the “bad” evaluation, the cost related to the operation feature that is not “good” or “bad” that was not related to the operator's evaluation does not change. Determine the cost update amount. For example, with the weight of the neural network irrelevant to the evaluation, a cost update amount that cancels the update amount is determined. Since the change in output acts linearly on each weight, when the evaluation of “good” and “bad” is input N _g and N _b times, respectively, the update amount Δw cancels with the weight w of the neural network unrelated to the evaluation. Therefore, r ′ _k is determined by (Equation 6) using a predetermined learning rate α.
[Equation 6]

The evaluation interpretation unit 108 in the configuration in which the evaluation input unit outputs the ranking outputs the evaluation order in which the circulation and contradiction are arranged using the Schulze method with the set of evaluation orders as an input. However, the rearrangement of evaluation rank is not limited to Schulze method as long as it is an election method that can arrange the circulation and contradiction of the set of evaluation orders.

<11. Learning Department>
Finally, the learning unit 109 updates the trajectory selection reference recording unit using the steepest descent method so as to minimize (Equation 7) from the difference r ′ _k .
[Equation 7]

However, the trajectory selection reference recording unit may be updated using different optimization algorithms such as AdaGrad and RMSProp.

The learning unit 109 in the configuration in which the evaluation unit outputs rankings is output by the evaluation interpretation unit, and the combination of ranked feature quantities {[q _g , q _b ] ₀ , [q _g , q _b ] ₁ ,…} From (Equation 8), the steepest descent method is used to minimize each weight of the neural network.
Where σ is a sigmoid function.
[Equation 8]

<12. Cost initialization department>
FIG. 14 is a diagram showing another modification of the embodiment. In this embodiment, as shown in FIG. 14, a cost initialization unit 1401 that outputs an initial value of the trajectory selection reference recording unit with an operator-defined initial objective function as an input may be provided. The initial objective function is the total movement amount of the joint (Equation 9).
[Equation 9]

However, Δθ _i and Δt are the amount of change Δθ _i = | θ ′ _i −θ _i | and Δt = | t′−t | between the two postures. The initial objective function is not limited to the movement amount of the joint as long as it is a mapping from the feature value to the real number. Using the feature quantity q, the trajectory selection criterion is calculated using the learning unit so that the output for the feature quantity q _i is f _init (q _i ).

With reference to FIG. 15, a planar robotic system according to an embodiment of the present invention will be described. As shown in FIG. 15, the vehicle-type robot is a robot having a configuration space with three degrees of freedom of X = [x, y, θ]. However, (Equation 10) is a planar coordinate, and (Equation 11) is the direction of the robot.
[Equation 10]

[Equation 11]

Referring to FIG. 16, an embodiment of a planar robot system will be described. The trajectory generation unit 101, the trajectory cost calculation unit 103, the trajectory selection reference recording unit 104, the evaluation input unit 107, the evaluation interpretation unit 108, and the learning unit 109 are implemented by performing the corresponding processing of the first embodiment (FIG. 1). It becomes possible.

The trajectory feature extraction unit 1602 creates a feature quantity q by aggregating posture values included in the trajectory from the trajectory. In this embodiment, the feature quantity q is the Euclidean distance (Equation 12) between the two postures, the radius of rotation R of the orbit, and the speeds ν ₀ and ν ₁ in the two postures. However, the feature is not limited to the above feature as long as it is information representing the characteristics of the trajectory involved in the operation of the robot.
[Equation 12]

The trajectory calculation unit 1605 will be described with reference to FIG. The minimum cost path search unit 1701 searches for a trajectory with the minimum cost calculated by the trajectory cost calculation unit 103 from the candidate trajectories generated by the trajectory generation unit 101, and outputs the trajectory. At this time, for each route, the collision determination unit 1702 is used to determine whether or not the object is in contact with the obstacle, and the route that is in contact with the obstacle is excluded from the search.

The simulation process 1611 simulates the operation of the robot when the robot performs the trajectory calculated by the trajectory calculation unit 1605 in the real world. In the simulation, for example, the robot operation itself and the influence of the robot operation on a predetermined virtual physical environment are output as simulation results. In other words, the simulation is an event that is expected when the robot operates in the real world, such as the deviation between the trajectory calculated by the trajectory calculation unit 1605 and the trajectory in the simulation due to the disturbance that occurs in the robot, and the impact of the robot on the environment. If there is, it is not limited to the operation of the robot.

The display unit 1612 presents the trajectory calculated by the trajectory calculation unit 1605 to the operator on the display. As a display form on the display, a trajectory on a plane can be displayed as shown in FIG. For example, the car robot starts from the position 1501, takes a trajectory passing through the position 1503, and arrives at the position 1502. The display form may be converted into a stereoscopic image and displayed. In the second embodiment, the example of the vehicle type robot has been described. However, the SCARA robot as shown in FIG.

According to the second embodiment, the same effects as in the first embodiment can be obtained without actually preparing or operating the robot.

According to the embodiment of the present invention described above, a simple evaluation of whether the operation is good or bad is input, and the evaluation standard of the system is updated, so that even an operator who does not have the knowledge of the robot can realize the assumed movement method. There is an effect that can be done. Since an order scale such as good or bad can be used for input, a complicated operation for quantifying the input value is also unnecessary.

The server 201 described in the above embodiments may be configured by a single computer, or any part of the input device 204, the output device 205, the processing device 202, and the storage device 203 are connected via a network. You may comprise with another computer. The idea of the invention is equivalent and unchanged.

In this embodiment, functions equivalent to those configured by software can be realized by hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Such an embodiment is also included in the scope of the present invention.

The present invention is not limited to the above-described embodiment, and includes various modifications. For example, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace the configurations of other embodiments with respect to a part of the configurations of the embodiments.

It can be used for robot motion planning and motion planning systems.

Orbit generator: 101
Orbit feature extraction unit: 102
Orbit cost calculator: 103
Orbit selection reference recording unit: 104
Orbit calculation unit: 105
Operating part: 106
Input section: 107
Evaluation interpretation section: 108
Learning Department: 109
Robot: 200
Server: 201

Claims

A trajectory generation unit that generates one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
From the trajectory candidates, a trajectory feature extraction unit that extracts motion characteristics that are characteristics of the trajectory candidates;
A trajectory selection criterion recording unit that records trajectory selection criteria for calculating appropriateness of the trajectory candidates from the operation characteristics;
From the operation characteristics, based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost that is an index of the appropriateness of the trajectory candidates;
A trajectory calculation unit that determines a trajectory candidate to be adopted as an operation trajectory of the robot using the trajectory cost, and outputs an operation signal;
An operation unit for operating the robot based on the operation signal;
An evaluation input unit for receiving an input of an evaluation based on an operator's order scale for the operation result of the robot;
An evaluation interpretation unit that determines an update amount of a trajectory cost of the trajectory candidate corresponding to the operation result that is an evaluation target from the input evaluation tendency;
A learning unit that changes the trajectory selection reference recorded in the trajectory selection reference recording unit so that the cost calculated by the trajectory cost calculation unit coincides with the trajectory cost after being updated with the update amount,
Robot system.
The robot system according to claim 1, wherein
The evaluation input unit
In response to the movement results of the robot a plurality of times, an input of the order of appropriateness of movement is received,
The evaluation interpretation unit is
Determining a new evaluation order of the operation results to be evaluated;
The robot system according to claim 1.
The trajectory generator is
Sampling the possible postures of the robot, and generating the trajectory candidates from the sampled combinations of the posture candidates;
The robot system according to claim 1.
The trajectory selection reference recording unit
From the motion features extracted by the trajectory feature extraction unit from the trajectory candidates generated by the trajectory generation unit, record parameters for calculating the appropriateness of the trajectory candidates,
The orbit cost calculator
The robot system according to claim 1, wherein the trajectory cost, which is an index of appropriateness of the trajectory candidate, is calculated based on the parameter recorded in the trajectory selection reference recording unit.
The trajectory selection reference recording unit
Record the neural network weight parameters as the parameters,
The orbit cost calculator
Using the weight parameter of the neural network, the neural network is used to output the trajectory cost.
The robot system according to claim 4.
The evaluation interpretation unit is
An evaluation organizing unit that divides the evaluation based on the order scale input to the evaluation input unit by a predetermined threshold;
Among the evaluations divided by the evaluation organizing unit, the trajectory candidate corresponding to the evaluation belonging to the side that is considered to be appropriate is reduced in the trajectory cost so that the trajectory cost is reduced. The candidate includes a cost update amount determination unit that determines the update amount of the cost of the operation feature that is the evaluation target in the evaluation input unit so that the trajectory cost increases.
The robot system according to claim 1.
The robot system according to claim 1,
A robot system including an automatic evaluation unit that determines an evaluation of a robot motion generated by the operation unit and inputs the evaluation to the evaluation input unit.
The robot system according to claim 1,
The robot system further includes a candidate trajectory recording unit that calculates and holds the robot trajectory candidates in advance.
The robot system according to claim 6, wherein
The cost update amount determination unit further includes:
The update amount of the cost of the motion feature that is the evaluation target is determined by the evaluation input unit so that the trajectory cost related to the motion feature that is the evaluation target in the evaluation input unit but is not related to the evaluation does not change. , Robot system.
A trajectory generator that generates one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
A trajectory feature extraction unit that extracts motion features that are characteristics of the trajectory candidates from the trajectory candidates;
A trajectory selection criterion recording unit that records trajectory selection criteria for calculating appropriateness of the trajectory candidates from the operation characteristics;
From the operation characteristics, based on the trajectory selection criteria, a trajectory cost calculation unit that calculates a trajectory cost that is an index of the appropriateness of the trajectory candidates;
A trajectory calculation unit that determines a candidate trajectory to be adopted as the motion trajectory of the robot using the trajectory cost;
A simulation unit that calculates at least one of the movement of the robot performing the movement trajectory and the influence of the movement of the robot on a predetermined virtual physical environment based on a predetermined calculation model;
A display unit for visualizing at least one of the movement of the robot and the influence of the movement of the robot on a predetermined virtual physical environment as a simulation result;
An evaluation input unit that receives an input of an evaluation based on an operator's order scale for the simulation result of the display unit;
An evaluation interpretation unit that determines an update amount of a trajectory cost of the trajectory candidate corresponding to the operation result that is an evaluation target from the input evaluation tendency;
A learning unit that changes the trajectory selection reference recorded in the trajectory selection reference recording unit so that the cost calculated by the trajectory cost calculation unit coincides with the trajectory cost after being updated with the update amount,
Robot optimization system.
A trajectory generation process for generating one or more candidates of trajectories that the robot can take in order to reach a predetermined target state from the start state of the robot;
A trajectory feature extraction process for extracting, from the trajectory candidates, motion features that are characteristics of the trajectory candidates;
Based on a trajectory selection criterion for calculating the appropriateness of the trajectory candidate from the motion feature, a trajectory cost calculation process for calculating a trajectory cost that is an index of the appropriateness of the candidate trajectory from the motion feature;
A trajectory calculation process for determining a trajectory candidate to be adopted as an operation trajectory of the robot using the trajectory cost;
At least one of a demonstration process for operating the robot based on the motion trajectory of the robot determined by the trajectory calculation process, and a simulation process for simulating the motion of the robot based on the motion trajectory of the robot;
A learning process that receives an evaluation input based on an order scale for at least one of the demonstration process and the simulation process, and changes the trajectory selection criterion based on the evaluation input.
Robot motion plan learning method.
Based on the evaluation input by the order scale, performing an evaluation interpretation process for determining an update amount of the trajectory cost of the motion feature corresponding to the motion result that is an evaluation target,
In the learning process,
Changing the trajectory selection criteria so that the cost calculated in the trajectory cost calculation process matches the updated cost determined in the evaluation interpretation process;
The robot motion plan learning method according to claim 11.
The trajectory generation process is
Sampling the possible postures of the robot, and generating the trajectory candidates from the sampled combinations of the posture candidates;
The robot motion plan learning method according to claim 11.
The simulation process includes
Calculating at least one of the movement of the robot performing the movement trajectory and the influence of the movement of the robot on a predetermined virtual physical environment based on a predetermined calculation model;
The robot motion plan learning method according to claim 11.
An automatic evaluation process is performed to perform the evaluation input,
The automatic evaluation process includes:
Detecting the state of operation of the robot performing the operation trajectory and the state of the operation environment of the robot, and evaluating the detection result based on a predetermined criterion;
The robot motion plan learning method according to claim 11.