WO2020246482A1 - Control device, system, learning device, and control method - Google Patents

Control device, system, learning device, and control method Download PDF

Info

Publication number
WO2020246482A1
WO2020246482A1 PCT/JP2020/021831 JP2020021831W WO2020246482A1 WO 2020246482 A1 WO2020246482 A1 WO 2020246482A1 JP 2020021831 W JP2020021831 W JP 2020021831W WO 2020246482 A1 WO2020246482 A1 WO 2020246482A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning
control device
moving body
model
Prior art date
Application number
PCT/JP2020/021831
Other languages
French (fr)
Japanese (ja)
Inventor
裕紀 森
亮太 鳥島
哲也 尾形
城志 高橋
大輔 岡野原
Original Assignee
株式会社Preferred Networks
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Preferred Networks filed Critical 株式会社Preferred Networks
Publication of WO2020246482A1 publication Critical patent/WO2020246482A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/06Safety devices
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • Embodiments of the present invention relate to control devices, systems, learning devices and control methods.
  • Marija Jegorova, Stephane Doncieux, and Timothy Hospedales. "Behavioural Repertoire via Generative Adversarial Policy Network", arXiv: 1811.02945, 6 Mar 2019. Oussama Khatib. , "Real-time obstacle avoidance for manipulators and mobile robots.”, In Robotics and Automation. Proceedings. 1985 IEEE International Conference on, Vol. 2, pp.500-505. IEEE, 1985. Sertac Karaman and Emilio Frazzoli. , “Incremental sampling-based algorithms for optimal motion planning.”, Robotics Science and Systems VI, Vol. 104, p. 2, 2010.
  • the problem to be solved by the invention is to more easily find a trajectory to avoid obstacles.
  • the control device includes an inference unit.
  • the inference unit inputs input data on the latent space showing the potential features of at least one of the position and orientation of the moving body, and indicates at least one of the positions and postures of the moving body that does not come into contact with obstacles in the real space.
  • a plurality of first input data on the latent space are input to the first model that outputs the output data, and a plurality of first output data output by the first model is obtained.
  • FIG. 1 is a diagram showing a hardware configuration example of a robot system including the control device of the present embodiment.
  • FIG. 2 is a diagram showing a configuration example of a robot that is a two-link arm robot.
  • FIG. 3 is a hardware block diagram of the control device.
  • FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device.
  • FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment.
  • FIG. 6 is a flowchart showing an example of the learning process in the present embodiment.
  • FIG. 7 is a flowchart showing an example of the control process in the present embodiment.
  • FIG. 8 is a diagram showing an example of mapping a straight line trajectory to the joint angle space on the latent space.
  • FIG. 9 is a diagram showing an example of the trajectory of the robot.
  • FIG. 10 is a diagram showing an example of mapping a straight line trajectory to a joint angle space on a latent representation.
  • FIG. 11 is a diagram showing an example of a trajectory of a robot operated based on a trajectory in the joint angle space.
  • FIG. 12 is a diagram showing a configuration example of VAE.
  • FIG. 13 is a diagram for explaining an example of learning data.
  • the trajectory for avoiding obstacles is calculated using the generative model.
  • GAN Geneative Adversarial Networks
  • GAN has an advantage that the training data can be dropped into a lower-dimensional latent representation (data represented on the latent space) based on the manifold hypothesis.
  • at least one of the position and the posture in which the robot avoids the obstacle is acquired in the latent space included in the GAN, and the trajectory is specified in the latent space. This makes it easier to realize a trajectory plan that avoids obstacles. Obstacle avoidance can be realized by a simple design, so that not only engineers but also users without skilled knowledge can handle the robot, for example.
  • FIG. 1 is a diagram showing a hardware configuration example of the robot system 1 including the control device 100 of the present embodiment.
  • the robot system 1 includes a control device 100, a controller 200, a robot 300, and a sensor 400.
  • the robot 300 is an example of a moving body in which at least one of the position and the posture (orbit) is controlled by the control device 100 and moves.
  • the robot 300 includes, for example, a plurality of links, a plurality of joints, and a plurality of driving devices (motors and the like) for driving each of the joints.
  • a robot 300 which is a two-link arm robot having two joints and two links, will be described as an example.
  • FIG. 2 is a diagram showing a configuration example of a robot 300 which is a two-link arm robot.
  • the robot 300 includes a base member 321 and two joints 301 and 302, and two links 311 and 312.
  • the joints 301 and 302 rotate about an axis in the direction perpendicular to the paper surface of FIG.
  • the joint 301 rotates about an axis fixed to the base member 321.
  • the links 311 and 312 move according to the rotation of the joints 301 and 302.
  • FIG. 2 shows how the links 31 and 312 move as the joints 301 and 302 rotate counterclockwise, respectively.
  • the applicable robot is not limited to this, and any robot (moving body) may be used.
  • it may be a robot having three or more joints and links, a mobile manipulator, and a mobile trolley.
  • the robot may be provided with a drive device for moving the entire robot in parallel in an arbitrary direction in the real space.
  • the moving body may be an object whose overall position changes in this way, or an object such as the link arm robot of FIG. 2 in which a part of the position is fixed and at least one of the position and the posture of the other part changes. It may be.
  • the sensor 400 detects information to be used for controlling the operation of the robot 300.
  • the sensor 400 is, for example, both or one of an imaging device (camera) that captures an image of the surroundings of the robot 300 and a depth sensor (depth sensor) that detects depth information up to an object around the robot 300.
  • the sensor 400 is not limited to these, and may be, for example, a sensor capable of acquiring information (position information) regarding the position of an obstacle.
  • the controller 200 controls the drive of the robot 300 in response to an instruction from the control device 100.
  • the controller 200 controls a drive device (motor or the like) that drives the joints of the robot 300 so as to rotate in the rotation direction and rotation speed specified by the control device 100.
  • the control device 100 is connected to the controller 200, the robot 300, and the sensor 400, and controls the entire robot system 1.
  • the control device 100 controls the operation of the robot 300.
  • the control of the motion of the robot 300 includes the calculation of the trajectory using the generative model.
  • the control device 100 outputs an operation command for operating the robot 300 according to the calculated trajectory to the controller 200.
  • the control device 100 may have a function of learning a generative model. In this case, the control device 100 also functions as a learning device for learning the generative model.
  • FIG. 3 is a hardware block diagram of the control device 100.
  • the control device 100 is realized by a hardware configuration similar to that of a general computer (information processing device) as shown in FIG.
  • the control device 100 may be realized by one computer as shown in FIG. 3, or may be realized by a plurality of computers operating in cooperation with each other.
  • the control device 100 includes a memory 204, one or more hardware processors 206, a storage device 208, an operation device 210, a display device 212, and a communication device 214. Each part is connected by a bus.
  • the memory 204 includes, for example, a ROM 222 and a RAM 224.
  • the ROM 222 stores the program used for controlling the control device 100, various setting information, and the like in a non-rewritable manner.
  • the RAM 224 is a volatile storage medium such as SDRAM (Synchronous Dynamic Random Access Memory).
  • SDRAM Serial Dynamic Random Access Memory
  • the RAM 224 serves as a work area for one or more hardware processors 206.
  • One or more hardware processors 206 are connected to memory 204 (ROM 222 and RAM 224) via a bus.
  • the one or more hardware processors 206 may be, for example, one or a plurality of CPUs (Central Processing Units) or one or a plurality of GPUs (Graphics Processing Units). Further, the one or more hardware processors 206 may be a semiconductor device or the like including a dedicated processing circuit for realizing a neural network.
  • the one or more hardware processors 206 execute various processes in cooperation with various programs stored in ROM 222 or the storage device 208 in advance using a predetermined area of the RAM 224 as a work area, and operate each part constituting the control device 100. Is controlled comprehensively. Further, one or more hardware processors 206 control the operation device 210, the display device 212, the communication device 214, and the like in cooperation with the program stored in the ROM 222 or the storage device 208 in advance.
  • the storage device 208 is a rewritable recording device such as a semiconductor storage medium such as a flash memory or a magnetically or optically recordable storage medium.
  • the storage device 208 stores a program used for controlling the control device 100, various setting information, and the like.
  • the operation device 210 is an input device such as a mouse and a keyboard.
  • the operation device 210 receives the information input from the user and outputs the received information to one or more hardware processors 206.
  • the display device 212 displays information to the user.
  • the display device 212 receives information or the like from one or more hardware processors 206, and displays the received information.
  • the control device 100 does not have to include the display device 212.
  • the communication device 214 communicates with an external device and transmits / receives information via a network or the like.
  • the program executed by the control device 100 of the present embodiment is a file in an installable format or an executable format on a computer such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital Versatile Disk). It is recorded on a readable recording medium and provided as a computer program product.
  • a computer such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital Versatile Disk). It is recorded on a readable recording medium and provided as a computer program product.
  • the program executed by the control device 100 of the present embodiment may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided or distributed via a network such as the Internet. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided by incorporating it into a ROM or the like in advance.
  • the program executed by the control device 100 according to the present embodiment can make the computer function as each part of the control device 100 described later.
  • the computer can read and execute a program on the main memory from a computer-readable storage medium by the hardware processor 206.
  • the hardware configuration shown in FIG. 1 is an example, and is not limited to this.
  • One device may be configured to include a part or all of the control device 100, the controller 200, the robot 300, and the sensor 400.
  • the robot 300 may be configured to also include the functions of the control device 100, the controller 200, and the sensor 400.
  • the control device 100 may be configured to have one or both functions of the controller 200 and the sensor 400.
  • the control device 100 can also function as a learning device, the control device 100 and the learning device may be realized by physically different devices.
  • FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device 100.
  • the control device 100 includes an acquisition unit 101, a learning unit 102, an inference unit 103, a movement control unit 104, and a storage unit 121.
  • the acquisition unit 101 acquires various information used in various processes executed by the control device 100.
  • the acquisition unit 101 acquires learning data for learning the generative model.
  • the learning data can be acquired by any method, but the acquisition unit 101 acquires, for example, the learning data created in advance from an external device via a network or the like, or from a storage medium.
  • the learning unit 102 learns the generative model (first model) using the learning data.
  • the learning unit 102 learns two neural networks, a generator and a discriminator that constitute GAN.
  • the generator inputs input data in latent space that indicates the potential features of at least one of the position and orientation of the moving object, and indicates at least one of the positions and orientations of the moving object that does not come into contact with obstacles in real space. It corresponds to the generative model (first model) that outputs the output data. The details of the learning method will be described later.
  • the inference unit 103 executes inference using the learned generative model. For example, the inference unit 103 inputs a plurality of input data (first input data) forming a line in the latent space to the generative model, and outputs a plurality of output data (first output data) output by the generative model. obtain.
  • the movement control unit 104 controls the movement of the robot 300.
  • the movement control unit 104 controls the movement of the robot 300 by using the output data obtained by the inference unit 103 as trajectory data indicating a trajectory in which the robot 300 does not come into contact with an obstacle in the real space. More specifically, the movement control unit 104 moves the robot 300 by generating an operation command for operating the robot 300 according to the trajectory data and transmitting the operation command to the controller 200.
  • the storage unit 121 stores various information used in the control device 100.
  • the storage unit 121 stores parameters (weighting factors, biases, etc.) of the neural network (generator and discriminator) constituting the GAN, and learning data for learning the neural network constituting the GAN.
  • the storage unit 121 is realized by, for example, the storage device 208 of FIG.
  • Each of the above units is realized by, for example, one or more hardware processors 206.
  • each of the above parts may be realized by having one or a plurality of CPUs execute a program, that is, by software.
  • Each of the above parts may be realized by a hardware processor such as a dedicated IC (Integrated Circuit), that is, hardware.
  • Each of the above parts may be realized by using software and hardware in combination. When a plurality of processors are used, each processor may realize one of each part, or may realize two or more of each part.
  • FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment.
  • the GAN includes two neural networks, a generator 501 and a classifier 502.
  • the generator 501 outputs false data (high-dimensional data) of the training data with respect to the input latent variable z in the low-dimensional latent space.
  • the generator 501 is trained so that the distribution of the output false data is close to the distribution of the true training data.
  • the classifier 502 discriminates whether the input data is true training data or false data, and is trained to improve the discrimination accuracy.
  • low-dimensional (latent space) data latent variables
  • high-dimensional (real space) data learning data, fake data
  • the angles (joint angles) of the joints 301 and 302 are set to ⁇ 0 and ⁇ 1 , respectively.
  • the length (link length) of the links 311 and 312 is, for example, 1.
  • High-dimensional data includes 6-dimensional data ( ⁇ 0, ⁇ 1, x 0 , y 0 , x 1 , y) including the angles of the two joints 301 and 302, the position of the joint 302, and the position of the tip of the link 312. It is expressed as 1 ).
  • (x 0 , y 0 , x 1 , y 1 ) is potentially generated by forward kinematics if two-dimensional information ( ⁇ 0, ⁇ 1 ) is given. it can. Therefore, low-dimensional data (latent variables) can be defined as two-dimensional data (z 0 , z 1 ).
  • low-dimensional data latent variables
  • n driving parts integers of n ⁇ 3
  • n-dimensional data should be used as a latent variable. Can be done.
  • the low-dimensional (latent space) data and the high-dimensional (real space) data as described above are examples, and are not limited to these.
  • a latent variable with a number of dimensions larger than the degree of freedom of the joint may be used.
  • the GAN shown in FIG. 5 can input a specified condition (Condition) out of a plurality of conditions to each of the generator 501 and the classifier 502. As a result, the generator 501 and the classifier 502 can output data (fake data or discrimination result) according to the conditions.
  • a GAN capable of inputting conditions in this way may be called a Conditional GAN.
  • GAN which does not input a condition may be used.
  • the condition shown in FIG. 5 indicates that an obstacle indicated by a black rectangle exists within the movable range of the robot 300.
  • the condition can be specified by any method. As shown in FIG. 5, information indicating whether or not an obstacle exists in each area (rectangle) in which the movable range of the robot 300 is divided into a plurality of areas (16 in FIG. 5) (for example, an obstacle exists).
  • the condition in which 1 is specified in the case and 0) is specified in the case where it does not exist may be used.
  • One or both of the image of the surroundings of the robot 300 and the depth information to the objects around the robot 300 may be used as the conditions for obstacles. In this case, the image information and the depth information detected by the sensor 400 (imaging device, depth sensor) can be used as conditions for obstacles.
  • information indicating the position of the obstacle may or may not be explicitly given as a condition regarding the obstacle.
  • the sensor 400 capable of acquiring the position information of the obstacle may be used, and only the position information of the obstacle may be used as a condition regarding the obstacle.
  • every expression in the world can be expressed as a lower dimensional manifold.
  • GAN is expected to acquire low-dimensional latent expressions from high-dimensional expressions based on the manifold hypothesis.
  • the conditional generation model (generator 501) of GAN When applied to a trajectory plan that avoids obstacles as in this embodiment, by learning the conditional generation model (generator 501) of GAN, at least one of the positions and attitudes of the robot 300 that does not come into contact with obstacles can be determined.
  • the low-dimensional latent representation shown is acquired.
  • the trained generator 501 represents data indicating at least one of the positions and orientations in which the robot avoids obstacles in real space from the specified latent representation (data indicating at least one of the positions and attitudes in the latent space). Can be generated.
  • GAN learns so that the adversarial relationship of data (latent variable) in the latent space corresponds to the adversarial relationship of the generated data. Therefore, when a designated trajectory (at least one of a plurality of adjacent positions and postures) is mapped on the low-dimensional latent space by the generator 501, at least the position and posture in which the robot 300 avoids obstacles in the real space. One orbit is obtained.
  • FIG. 6 is a flowchart showing an example of the learning process in the present embodiment.
  • the acquisition unit 101 acquires the learning data (step S101).
  • the acquisition unit 101 acquires the learning data acquired from, for example, an external device via a network or the like and stored in the storage unit 121.
  • the learning process is repeatedly executed a plurality of times.
  • the acquisition unit 101 may acquire a part of the plurality of learning data as learning data (batch) used for each learning.
  • the learning unit 102 generates fake data by the GAN generator 501 (step S102).
  • the learning unit 102 inputs the generated fake data or the learning data (true learning data) acquired in step S101 into the classifier 502, and obtains the discrimination result output by the classifier 502 (step S103).
  • the learning unit 102 updates the parameters of the generator 501 and the classifier 502 using the discrimination result (step S104). For example, the learning unit 102 updates the parameters of the generator 501 so as to minimize the value of the loss function, which is small enough for the classifier 502 to erroneously identify false data as true learning data. Further, the learning unit 102 updates the parameters of the classifier 502 so as to minimize the value of the loss function in which the discrimination result of the classifier 502 becomes smaller enough to be correct.
  • the learning unit 102 may use any algorithm for learning, and for example, learning can be performed using Adam (Adaptive moment estimation).
  • the learning unit 102 determines whether or not to end learning (step S105). For example, the learning unit 102 ends learning depending on whether all the learning data has been processed, whether the improvement of the loss function is smaller than the threshold value, or whether the number of learnings has reached the upper limit. To judge.
  • step S105: No If the learning is not completed (step S105: No), the process returns to step S101 and the process is repeated for the new learning data. When it is determined that the learning is completed (step S105: Yes), the learning process is terminated.
  • the learning unit 102 may use a method for stabilizing learning, for example, a method of applying normalization (Spectral Normalization or the like) to each layer of the generator 501 and the classifier 502.
  • normalization Spectral Normalization or the like
  • a generative model (generator 501) that outputs output data indicating at least one of a position and a posture in which the robot 300 does not come into contact with an obstacle in the real space with respect to the input data in the latent space by the above learning process. Is obtained.
  • the generator 501 thus generated is used.
  • FIG. 7 is a flowchart showing an example of the control process in the present embodiment.
  • the inference unit 103 calculates the start position (movement start position) and end position (movement end position) of the robot 300 on the latent space (step S201). It is assumed that the start position and end position of the robot 300 in the real space are given in advance.
  • the inference unit 103 randomly generates a latent variable z in the latent space, inputs the latent variable z to the generator 501, and determines whether the data obtained matches the start position given in the real space. To do. Note that the match may include not only the case where the values match completely but also the case where the difference between the values is within the threshold value. If they match, the inference unit 103 estimates the data input to the generator 501 as the start position in the latent space. If they do not match, the latent variable z is randomly generated again and the process is repeated. The inference unit 103 can estimate the end position in the latent space in the same way.
  • the inference unit 103 may calculate (estimate) the start position and end position of the robot 300 in the latent space by using a model (second model) different from the generator 501.
  • the learning unit 102 sets a neural network model (second model) that inputs data in real space (such as fake data generated by generator 501) and outputs data in latent space to the generator 501 and The classifier 502 is learned at the same time as or independently.
  • the inference unit 103 inputs the start position and the end position given in the real space to the neural network model learned in this way, and inputs the output data to the start of the robot 300 in the latent space. Estimated as position and end position.
  • the inference unit 103 inputs a plurality of input data in the latent space corresponding to the determined trajectory to the generator 501, and obtains a plurality of output data output by the generator 501 (step S203).
  • This output data corresponds to a trajectory in which the robot 300 moves in real space without contacting an obstacle.
  • the movement control unit 104 controls the movement of the robot 300 so as to move according to the calculated trajectory (step S204).
  • the trajectory of the robot 300 can be calculated using the generation model (generator 501) obtained by learning GAN. In such a method, it is not necessary to design a complicated function like the potential method. In addition, since the orbit is calculated in a latent space having a lower dimension than the training data, the calculation cost can be reduced.
  • the training data is calculated as follows.
  • the range of the joint angle ⁇ 0 of the joint 301 is ⁇ 90 ° to + 90 °.
  • the range of the joint angle ⁇ 1 of the joint 302 is 0 ° to + 150 °.
  • the step size of each joint angle is 1 °. From the joint angles ⁇ 0 and ⁇ 01 , the above (x 0 , y 0 , x 1 , y 1 ) including the coordinates of the tips of the joint 302 and the link 312 can be obtained by forward kinematics.
  • the six-dimensional data ( ⁇ 0, ⁇ 1, x 0 , y 0 , x 1 , y 1 ) obtained in this way is used as training data.
  • the batch size (the number of training data each time) is 2056, the optimization method is Adam, and 100,000 learnings are performed.
  • FIG. 8 shows an example of mapping a linear trajectory to the joint angle space on the latent space input together with condition 1 (no obstacles) for the generator 501 after learning according to the above assumptions. It is a figure.
  • the joint angle space is a two-dimensional space represented by joint angles ⁇ 0 and ⁇ 1 in the six-dimensional data.
  • the upper part of FIG. 8 shows an example of a straight track connecting the start position 801 designated on the latent space and the end position 802.
  • the lower part of FIG. 8 shows an example of a trajectory connecting the start position 811 and the end position 812 mapped on the joint angle space.
  • FIG. 9 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.
  • FIG. 10 is a diagram showing an example of a mapping of a linear trajectory to the joint angle space on the latent expression input with the condition 2 (with obstacles) for the generator 501.
  • FIG. 11 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.
  • the adjacency on the latent representation corresponds to the adjacency of at least one of the robot position and posture, and the smooth trajectory specified on the latent representation becomes a smooth trajectory even in the joint angle space, and at least the robot position and posture. One changes smoothly.
  • the robot can move while avoiding the area where the obstacle 1101 is located.
  • the generated joint angle is similar to the joint angle of FIG. 10 when the information with obstacles is input as a condition. ing. Since the same neural network (generative model) is used for different conditions in Conditional GAN, it is considered that one of the causes is that it is affected by other conditions. When only two conditions are used as in this example, there is a possibility that the two conditions affect each other and similar output data is output. If the number of conditions is increased, it is expected that a generative model trained (high generalization performance) will be obtained so as not to be affected by specific conditions. For example, the mapping of the linear trajectory to the joint angle space on the latent space input together with the condition 1 (without obstacles) may be a mapping without distortion as shown in the lower part of FIG.
  • the generative model is not limited to the generators included in the GAN. Any model may be used as long as it is a generative model that can obtain a low-dimensional latent expression from the training data.
  • a VAE Virtual Autoencoder
  • an autoencoder or a flow-based generative model may be used instead of GAN.
  • FIG. 12 is a diagram showing a configuration example of VAE.
  • 6-dimensional data ( ⁇ 0, ⁇ 1, x 0 , y 0 , x 1 , y 1 ) is input to the encoder 1201, and the encoder 1201 outputs the latent variable z in the latent space.
  • a variable z'with a condition given to the latent variable z is input to the decoder 1202, and the decoder 1202 generates and outputs new 6-dimensional data.
  • decoder 1202 is used as the generative model.
  • the start position and end position in the latent space can be obtained by inputting the start position and end position of the robot in the real space to the encoder 1201. That is, the encoder 1201 can be used as a model (second model) for estimating the start position and end position of the robot in the latent space.
  • the encoder 1201 can be used as a model (second model) for estimating the start position and end position of the robot in the latent space.
  • FIG. 13 is a diagram for explaining an example of learning data for learning a model used for trajectory planning of a two-link arm robot.
  • the circles represent, for example, the positions of the tips of the joints 301, 302 or link 312 of the two-link arm robot of FIG.
  • the coordinates (0,0) are the positions of the joints 301
  • the lengths of the links 311 and 312 are 1.
  • the angles of the joints 301 and 302 are constant from the state where the positions of the tips of the joint 302 and the link 312 are the coordinates (0, -1.0) and the coordinates (0, -2.0), respectively.
  • the change in the position of each part of the robot when changed by the width is shown.
  • Six-dimensional data corresponding to each position shown in FIG. 13 is used as learning data.
  • each learning multiple learning data given as a batch are used. For example, a fixed number of training data randomly selected from the entire training data is used as a batch for each training.
  • the learning result may not be stable.
  • the latent space obtained after the learning may also be biased.
  • the learning data may be selected so as to avoid such a problem and make the learning result more stable.
  • the acquisition unit 101 may acquire one or more learning data from each of a plurality of data groups including one or more learning data and use them as learning data (batch) used for each learning.
  • learning data as shown in FIG. 13
  • the acquisition unit 101 has one or more learning data from each of a plurality of data groups in which the learning data is classified according to a value (coordinate value) indicating at least one of a position and a posture. May be randomly selected, for example. More specifically, for example, the space of the position coordinates (x, y) shown in FIG.
  • the acquisition unit 101 acquires learning data (batch) to be used for each learning by selecting one or more learning data from each data group classified in this way.
  • the acquisition unit 101 may preferentially select learning data including a position closer to the obstacle. For example, when the learning data is classified into a plurality of data groups as described above, the acquisition unit 101 obtains more learning data from the area including the obstacle or the area adjacent to the area containing the obstacle than the other areas. You may get it. This makes it possible to learn at least one of the positions and postures for avoiding obstacles more efficiently.
  • an expression such as "at least one (one) of a, b and c" is a combination of a, b, c, ab, ac, bc and abc. It is an expression that includes not only a plurality of combinations of the same elements such as aa, abb, aabbbcc, and the like. In addition, it is an expression that covers a configuration including elements other than a, b, and c, such as a combination of abcd. Similarly, in the present specification, expressions such as "at least one (one) of a, b or c" are referred to as a, b, c, ab, ac, bc, ab-.
  • Robot system 100 Control device (learning device) 101 Acquisition unit 102 Learning unit 103 Inference unit 104 Movement control unit 121 Storage unit 200 Controller 204 Memory 206 Hardware processor 208 Storage device 210 Operation device 212 Display device 214 Communication device 222 ROM 224 RAM 300 robot 400 sensor

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Feedback Control In General (AREA)

Abstract

The control device according to one embodiment of the present invention is provided with a deduction unit. To a first model which receives, as an input, input data on a latent space indicative of latent characteristics of the position and/or attitude of a mobile object and which supplies, as an output, output data indicative of the position and/or attitude of the mobile object that does not contact with an obstacle in a real space, the deduction unit feeds multiple pieces of first input data on the latent space, and obtains multiple pieces of first output data outputted by the first model.

Description

制御装置、システム、学習装置および制御方法Control devices, systems, learning devices and control methods
 本発明の実施形態は、制御装置、システム、学習装置および制御方法に関する。 Embodiments of the present invention relate to control devices, systems, learning devices and control methods.
 工場および倉庫などで利用されるロボットに加え、生活環境で人間と協働作業するロボットの需要が高まっている。これらのロボットは、照明および障害物などの条件が常に変化する環境での動作が想定される。従って、人間、環境の障害物、および、ロボット自身が損傷しないようにロボットが動作することを保証するために、障害物を回避する能力をロボットが備えることが必要である。 In addition to robots used in factories and warehouses, there is an increasing demand for robots that collaborate with humans in the living environment. These robots are expected to operate in an environment where conditions such as lighting and obstacles are constantly changing. Therefore, it is necessary for the robot to have the ability to avoid obstacles in order to ensure that the robot operates without damaging humans, environmental obstacles, and the robot itself.
 発明が解決しようとする課題は、障害物を回避する軌道をより容易に求めることにある。 The problem to be solved by the invention is to more easily find a trajectory to avoid obstacles.
 実施形態にかかる制御装置は、推論部を備える。推論部は、移動体の位置および姿勢の少なくとも一方の潜在的な特徴を示す潜在空間上の入力データを入力し、実空間上で障害物に接触しない移動体の位置および姿勢の少なくとも一方を示す出力データを出力する第1モデルに対して、潜在空間上の複数の第1入力データを入力し、第1モデルが出力する複数の第1出力データを得る。 The control device according to the embodiment includes an inference unit. The inference unit inputs input data on the latent space showing the potential features of at least one of the position and orientation of the moving body, and indicates at least one of the positions and postures of the moving body that does not come into contact with obstacles in the real space. A plurality of first input data on the latent space are input to the first model that outputs the output data, and a plurality of first output data output by the first model is obtained.
図1は、本実施形態の制御装置を含むロボットシステムのハードウェア構成例を示す図である。FIG. 1 is a diagram showing a hardware configuration example of a robot system including the control device of the present embodiment. 図2は、2リンクアームロボットであるロボットの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of a robot that is a two-link arm robot. 図3は、制御装置のハードウェアブロック図である。FIG. 3 is a hardware block diagram of the control device. 図4は、制御装置の機能構成の一例を示す機能ブロック図である。FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device. 図5は、本実施形態で用いるGANの構成例を示す図である。FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment. 図6は、本実施形態における学習処理の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the learning process in the present embodiment. 図7は、本実施形態における制御処理の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of the control process in the present embodiment. 図8は、潜在空間上での直線軌道の関節角度空間への写像の一例を示す図である。FIG. 8 is a diagram showing an example of mapping a straight line trajectory to the joint angle space on the latent space. 図9は、ロボットの軌跡の例を示す図である。FIG. 9 is a diagram showing an example of the trajectory of the robot. 図10は、潜在表現上での直線軌道の関節角度空間への写像の一例を示す図である。FIG. 10 is a diagram showing an example of mapping a straight line trajectory to a joint angle space on a latent representation. 図11は、関節角度空間上の軌道を元に動作させたロボットの軌跡の例を示す図である。FIG. 11 is a diagram showing an example of a trajectory of a robot operated based on a trajectory in the joint angle space. 図12は、VAEの構成例を示す図である。FIG. 12 is a diagram showing a configuration example of VAE. 図13は、学習データの例を説明するための図である。FIG. 13 is a diagram for explaining an example of learning data.
 以下、図面を参照しながら実施形態について詳細に説明する。 Hereinafter, the embodiment will be described in detail with reference to the drawings.
 障害物を回避する軌道計画では、人間による設計が不要であり、計算コストがより小さい障害物回避の方法が望まれる。そこで、本実施形態では、生成モデルを用いて障害物を回避する軌道を計算する。例えばGAN(Generative Adversarial Networks)を、生成モデルを含むモデルとして用いることができる。GANは、多様体仮説に基づいて、学習データをより低次元の潜在表現(潜在空間上で表されるデータ)に落とし込める利点がある。本実施形態では、ロボットが障害物を回避する位置および姿勢の少なくとも一方をGANに含まれる潜在空間に獲得させ、潜在空間上で軌道を指定する。これにより、障害物を回避する軌道計画がより容易に実現可能となる。簡単な設計により障害物回避を実現することができるため、例えば技術者だけでなく、熟練した知識のないユーザでもロボットを扱うことが可能となる。 In the trajectory plan to avoid obstacles, it is not necessary to design by humans, and a method of avoiding obstacles with a smaller calculation cost is desired. Therefore, in the present embodiment, the trajectory for avoiding obstacles is calculated using the generative model. For example, GAN (Generative Adversarial Networks) can be used as a model including a generative model. GAN has an advantage that the training data can be dropped into a lower-dimensional latent representation (data represented on the latent space) based on the manifold hypothesis. In the present embodiment, at least one of the position and the posture in which the robot avoids the obstacle is acquired in the latent space included in the GAN, and the trajectory is specified in the latent space. This makes it easier to realize a trajectory plan that avoids obstacles. Obstacle avoidance can be realized by a simple design, so that not only engineers but also users without skilled knowledge can handle the robot, for example.
 図1は、本実施形態の制御装置100を含むロボットシステム1のハードウェア構成例を示す図である。図1に示すように、ロボットシステム1は、制御装置100と、コントローラ200と、ロボット300と、センサ400と、を備えている。 FIG. 1 is a diagram showing a hardware configuration example of the robot system 1 including the control device 100 of the present embodiment. As shown in FIG. 1, the robot system 1 includes a control device 100, a controller 200, a robot 300, and a sensor 400.
 ロボット300は、制御装置100によって位置および姿勢の少なくとも一方(軌道)が制御されて移動する移動体の例である。ロボット300は、例えば、複数のリンク、複数の関節、および、関節それぞれを駆動する複数の駆動装置(モータなど)を備える。以下では、2つの関節および2つのリンクを備える2リンクアームロボットであるロボット300を例に説明する。 The robot 300 is an example of a moving body in which at least one of the position and the posture (orbit) is controlled by the control device 100 and moves. The robot 300 includes, for example, a plurality of links, a plurality of joints, and a plurality of driving devices (motors and the like) for driving each of the joints. In the following, a robot 300, which is a two-link arm robot having two joints and two links, will be described as an example.
 図2は、2リンクアームロボットであるロボット300の構成例を示す図である。図2に示すように、ロボット300は、ベース部材321と、2つの関節301、302と、2つのリンク311、312と、を備えている。関節301、302は、図2の紙面と垂直な方向の軸回りに回転する。関節301は、ベース部材321に固定された軸回りに回転する。リンク311、312は、関節301、302の回転に応じて移動する。図2では、関節301、302がそれぞれ反時計回りに回転することによりリンク311、312が移動する様子が示されている。 FIG. 2 is a diagram showing a configuration example of a robot 300 which is a two-link arm robot. As shown in FIG. 2, the robot 300 includes a base member 321 and two joints 301 and 302, and two links 311 and 312. The joints 301 and 302 rotate about an axis in the direction perpendicular to the paper surface of FIG. The joint 301 rotates about an axis fixed to the base member 321. The links 311 and 312 move according to the rotation of the joints 301 and 302. FIG. 2 shows how the links 31 and 312 move as the joints 301 and 302 rotate counterclockwise, respectively.
 適用可能なロボット(移動体)はこれに限られず、どのようなロボット(移動体)であってもよい。例えば、3つ以上の関節およびリンクを備えるロボット、モバイルマニピュレータ、および、移動台車であってもよい。また、ロボット全体を実空間内の任意の方向に平行移動させるための駆動装置を備えるロボットであってもよい。移動体は、このように全体の位置が変化する物体でもよいし、図2のリンクアームロボットのように、一部の位置が固定され、他の部分の位置および姿勢の少なくとも一方が変化する物体でもよい。 The applicable robot (moving body) is not limited to this, and any robot (moving body) may be used. For example, it may be a robot having three or more joints and links, a mobile manipulator, and a mobile trolley. Further, the robot may be provided with a drive device for moving the entire robot in parallel in an arbitrary direction in the real space. The moving body may be an object whose overall position changes in this way, or an object such as the link arm robot of FIG. 2 in which a part of the position is fixed and at least one of the position and the posture of the other part changes. It may be.
 図1に戻り、センサ400は、ロボット300の動作の制御に用いるための情報を検知する。センサ400は、例えば、ロボット300の周囲の画像を撮像する撮像装置(カメラ)、および、ロボット300の周囲の物体までの深度情報を検知する深度センサ(デプスセンサ)の両方または一方である。センサ400はこれらに限られるものではなく、例えば、障害物の位置に関する情報(位置情報)を取得可能なセンサであってもよい。 Returning to FIG. 1, the sensor 400 detects information to be used for controlling the operation of the robot 300. The sensor 400 is, for example, both or one of an imaging device (camera) that captures an image of the surroundings of the robot 300 and a depth sensor (depth sensor) that detects depth information up to an object around the robot 300. The sensor 400 is not limited to these, and may be, for example, a sensor capable of acquiring information (position information) regarding the position of an obstacle.
 コントローラ200は、制御装置100からの指示に応じて、ロボット300の駆動を制御する。例えばコントローラ200は、制御装置100から指定された回転方向および回転速度で回転するように、ロボット300の関節を駆動する駆動装置(モータなど)を制御する。 The controller 200 controls the drive of the robot 300 in response to an instruction from the control device 100. For example, the controller 200 controls a drive device (motor or the like) that drives the joints of the robot 300 so as to rotate in the rotation direction and rotation speed specified by the control device 100.
 制御装置100は、コントローラ200、ロボット300、および、センサ400に接続され、ロボットシステム1の全体を制御する。例えば制御装置100は、ロボット300の動作を制御する。ロボット300の動作の制御には、生成モデルを用いた軌道の計算が含まれる。制御装置100は、計算した軌道に従ってロボット300を動作させるための動作指令を、コントローラ200に出力する。制御装置100は、生成モデルを学習する機能を備えてもよい。この場合、制御装置100は、生成モデルを学習する学習装置としても機能する。 The control device 100 is connected to the controller 200, the robot 300, and the sensor 400, and controls the entire robot system 1. For example, the control device 100 controls the operation of the robot 300. The control of the motion of the robot 300 includes the calculation of the trajectory using the generative model. The control device 100 outputs an operation command for operating the robot 300 according to the calculated trajectory to the controller 200. The control device 100 may have a function of learning a generative model. In this case, the control device 100 also functions as a learning device for learning the generative model.
 図3は、制御装置100のハードウェアブロック図である。制御装置100は、一例として、図3に示すような一般のコンピュータ(情報処理装置)と同様のハードウェア構成により実現される。制御装置100は、図3に示すような1つのコンピュータにより実現されてもよいし、協働して動作する複数のコンピュータにより実現されてもよい。 FIG. 3 is a hardware block diagram of the control device 100. As an example, the control device 100 is realized by a hardware configuration similar to that of a general computer (information processing device) as shown in FIG. The control device 100 may be realized by one computer as shown in FIG. 3, or may be realized by a plurality of computers operating in cooperation with each other.
 制御装置100は、メモリ204と、1または複数のハードウェアプロセッサ206と、記憶装置208と、操作装置210と、表示装置212と、通信装置214とを備える。各部は、バスにより接続される。 The control device 100 includes a memory 204, one or more hardware processors 206, a storage device 208, an operation device 210, a display device 212, and a communication device 214. Each part is connected by a bus.
 メモリ204は、例えば、ROM222と、RAM224とを含む。ROM222は、制御装置100の制御に用いられるプログラムおよび各種設定情報等を書き換え不可能に記憶する。RAM224は、SDRAM(Synchronous Dynamic Random Access Memory)等の揮発性の記憶媒体である。RAM224は、1または複数のハードウェアプロセッサ206の作業領域として機能する。 The memory 204 includes, for example, a ROM 222 and a RAM 224. The ROM 222 stores the program used for controlling the control device 100, various setting information, and the like in a non-rewritable manner. The RAM 224 is a volatile storage medium such as SDRAM (Synchronous Dynamic Random Access Memory). The RAM 224 serves as a work area for one or more hardware processors 206.
 1または複数のハードウェアプロセッサ206は、メモリ204(ROM222およびRAM224)にバスを介して接続される。1または複数のハードウェアプロセッサ206は、例えば、1または複数のCPU(Central Processing Unit)であってもよいし、1または複数のGPU(Graphics Processing Unit)であってもよい。また、1または複数のハードウェアプロセッサ206は、ニューラルネットワークを実現するための専用の処理回路を含む半導体装置等であってもよい。 One or more hardware processors 206 are connected to memory 204 (ROM 222 and RAM 224) via a bus. The one or more hardware processors 206 may be, for example, one or a plurality of CPUs (Central Processing Units) or one or a plurality of GPUs (Graphics Processing Units). Further, the one or more hardware processors 206 may be a semiconductor device or the like including a dedicated processing circuit for realizing a neural network.
 1または複数のハードウェアプロセッサ206は、RAM224の所定領域を作業領域としてROM222または記憶装置208に予め記憶された各種プログラムとの協働により各種処理を実行し、制御装置100を構成する各部の動作を統括的に制御する。また、1または複数のハードウェアプロセッサ206は、ROM222または記憶装置208に予め記憶されたプログラムとの協働により、操作装置210、表示装置212、および、通信装置214等を制御する。 The one or more hardware processors 206 execute various processes in cooperation with various programs stored in ROM 222 or the storage device 208 in advance using a predetermined area of the RAM 224 as a work area, and operate each part constituting the control device 100. Is controlled comprehensively. Further, one or more hardware processors 206 control the operation device 210, the display device 212, the communication device 214, and the like in cooperation with the program stored in the ROM 222 or the storage device 208 in advance.
 記憶装置208は、フラッシュメモリ等の半導体による記憶媒体、あるいは、磁気的または光学的に記録可能な記憶媒体等の書き換え可能な記録装置である。記憶装置208は、制御装置100の制御に用いられるプログラムおよび各種設定情報等を記憶する。 The storage device 208 is a rewritable recording device such as a semiconductor storage medium such as a flash memory or a magnetically or optically recordable storage medium. The storage device 208 stores a program used for controlling the control device 100, various setting information, and the like.
 操作装置210は、マウスおよびキーボード等の入力デバイスである。操作装置210は、ユーザから操作入力された情報を受け付け、受け付けた情報を1または複数のハードウェアプロセッサ206に出力する。 The operation device 210 is an input device such as a mouse and a keyboard. The operation device 210 receives the information input from the user and outputs the received information to one or more hardware processors 206.
 表示装置212は、情報をユーザに表示する。表示装置212は、1または複数のハードウェアプロセッサ206から情報等を受け取り、受け取った情報を表示する。なお、通信装置214または記憶装置208等に情報を出力する場合、制御装置100は、表示装置212を備えなくてもよい。 The display device 212 displays information to the user. The display device 212 receives information or the like from one or more hardware processors 206, and displays the received information. When outputting information to the communication device 214, the storage device 208, or the like, the control device 100 does not have to include the display device 212.
 通信装置214は、外部の機器と通信して、ネットワーク等を介して情報を送受信する。 The communication device 214 communicates with an external device and transmits / receives information via a network or the like.
 本実施形態の制御装置100で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでCD-ROM、フレキシブルディスク(FD)、CD-R、DVD(Digital Versatile Disk)等のコンピュータで読み取り可能な記録媒体に記録されてコンピュータプログラムプロダクトとして提供される。 The program executed by the control device 100 of the present embodiment is a file in an installable format or an executable format on a computer such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital Versatile Disk). It is recorded on a readable recording medium and provided as a computer program product.
 また、本実施形態の制御装置100で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、本実施形態の制御装置100で実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。また、本実施形態の制御装置100で実行されるプログラムを、ROM等に予め組み込んで提供するように構成してもよい。 Further, the program executed by the control device 100 of the present embodiment may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided or distributed via a network such as the Internet. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided by incorporating it into a ROM or the like in advance.
 本実施形態にかかる制御装置100で実行されるプログラムは、コンピュータを後述する制御装置100の各部として機能させうる。このコンピュータは、ハードウェアプロセッサ206がコンピュータ読取可能な記憶媒体からプログラムを主記憶装置上に読み出して実行することができる。 The program executed by the control device 100 according to the present embodiment can make the computer function as each part of the control device 100 described later. The computer can read and execute a program on the main memory from a computer-readable storage medium by the hardware processor 206.
 図1に示すハードウェア構成は一例であり、これに限られるものではない。制御装置100、コントローラ200、ロボット300、および、センサ400のうち一部または全部を、1つの装置が備えるように構成してもよい。例えば、ロボット300が、制御装置100、コントローラ200、および、センサ400の機能も備えるように構成してもよい。また、制御装置100が、コントローラ200およびセンサ400の一方または両方の機能も備えるように構成してもよい。また、図1では制御装置100が学習装置としても機能しうることを記載しているが、制御装置100と学習装置とを物理的に異なる装置により実現してもよい。 The hardware configuration shown in FIG. 1 is an example, and is not limited to this. One device may be configured to include a part or all of the control device 100, the controller 200, the robot 300, and the sensor 400. For example, the robot 300 may be configured to also include the functions of the control device 100, the controller 200, and the sensor 400. Further, the control device 100 may be configured to have one or both functions of the controller 200 and the sensor 400. Further, although it is described in FIG. 1 that the control device 100 can also function as a learning device, the control device 100 and the learning device may be realized by physically different devices.
 次に、制御装置100の機能構成について説明する。図4は、制御装置100の機能構成の一例を示す機能ブロック図である。図4に示すように、制御装置100は、取得部101と、学習部102と、推論部103と、移動制御部104と、記憶部121と、を備えている。 Next, the functional configuration of the control device 100 will be described. FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device 100. As shown in FIG. 4, the control device 100 includes an acquisition unit 101, a learning unit 102, an inference unit 103, a movement control unit 104, and a storage unit 121.
 取得部101は、制御装置100が実行する各種処理で用いられる各種情報を取得する。例えば取得部101は、生成モデルを学習するための学習データを取得する。学習データの取得方法はどのような方法であってもよいが、取得部101は、例えば予め作成された学習データを、外部の装置からネットワークなどを介して、または、記憶媒体から取得する。 The acquisition unit 101 acquires various information used in various processes executed by the control device 100. For example, the acquisition unit 101 acquires learning data for learning the generative model. The learning data can be acquired by any method, but the acquisition unit 101 acquires, for example, the learning data created in advance from an external device via a network or the like, or from a storage medium.
 学習部102は、学習データを用いて生成モデル(第1モデル)を学習する。GANを用いる場合、学習部102は、GANを構成する生成器および識別器の2つのニューラルネットワークを学習する。 The learning unit 102 learns the generative model (first model) using the learning data. When GAN is used, the learning unit 102 learns two neural networks, a generator and a discriminator that constitute GAN.
 学習データは、例えば、実空間上で障害物に接触しないロボット300の位置および姿勢の少なくとも一方を示すデータである。このような学習データを用いて学習することにより、入力された潜在表現(潜在空間上のデータ)に対して、実空間上で障害物に接触(干渉)しない移動体の位置および姿勢の少なくとも一方を示す出力データを出力する生成器が得られる。なお、位置および姿勢の少なくとも一方を示す出力データは、位置を示す出力データ、姿勢を示す出力データ、および、位置および姿勢の両方を示す出力データ、を含む。生成器は、移動体の位置および姿勢の少なくとも一方の潜在的な特徴を示す潜在空間上の入力データを入力し、実空間上で障害物に接触しない移動体の位置および姿勢の少なくとも一方を示す出力データを出力する生成モデル(第1モデル)に相当する。学習方法の詳細は後述する。 The learning data is, for example, data indicating at least one of the position and posture of the robot 300 that does not come into contact with an obstacle in the real space. By learning using such learning data, at least one of the positions and postures of the moving body that does not come into contact with (interfere with) obstacles in the real space with respect to the input latent expression (data in the latent space). A generator is obtained that outputs the output data indicating. The output data indicating at least one of the position and the posture includes the output data indicating the position, the output data indicating the posture, and the output data indicating both the position and the posture. The generator inputs input data in latent space that indicates the potential features of at least one of the position and orientation of the moving object, and indicates at least one of the positions and orientations of the moving object that does not come into contact with obstacles in real space. It corresponds to the generative model (first model) that outputs the output data. The details of the learning method will be described later.
 推論部103は、学習された生成モデルを用いた推論を実行する。例えば推論部103は、生成モデルに対して、潜在空間上で線を構成する複数の入力データ(第1入力データ)を入力し、生成モデルが出力する複数の出力データ(第1出力データ)を得る。 The inference unit 103 executes inference using the learned generative model. For example, the inference unit 103 inputs a plurality of input data (first input data) forming a line in the latent space to the generative model, and outputs a plurality of output data (first output data) output by the generative model. obtain.
 移動制御部104は、ロボット300の移動を制御する。例えば移動制御部104は、推論部103により得られた出力データを、実空間上でロボット300が障害物に接触しない軌道を示す軌道データとして、ロボット300の移動を制御する。より具体的には、移動制御部104は、軌道データに従ってロボット300を動作させるための動作指令を生成し、動作指令をコントローラ200に送信することにより、ロボット300を移動させる。 The movement control unit 104 controls the movement of the robot 300. For example, the movement control unit 104 controls the movement of the robot 300 by using the output data obtained by the inference unit 103 as trajectory data indicating a trajectory in which the robot 300 does not come into contact with an obstacle in the real space. More specifically, the movement control unit 104 moves the robot 300 by generating an operation command for operating the robot 300 according to the trajectory data and transmitting the operation command to the controller 200.
 記憶部121は、制御装置100で用いられる各種情報を記憶する。例えば記憶部121は、GANを構成するニューラルネットワーク(生成器および識別器)のパラメータ(重み係数、バイアスなど)、および、GANを構成するニューラルネットワークを学習するための学習データを記憶する。記憶部121は、例えば図3の記憶装置208により実現される。 The storage unit 121 stores various information used in the control device 100. For example, the storage unit 121 stores parameters (weighting factors, biases, etc.) of the neural network (generator and discriminator) constituting the GAN, and learning data for learning the neural network constituting the GAN. The storage unit 121 is realized by, for example, the storage device 208 of FIG.
 上記各部(取得部101、学習部102、推論部103、および、移動制御部104)は、例えば、1または複数のハードウェアプロセッサ206により実現される。例えば上記各部は、1または複数のCPUにプログラムを実行させること、すなわちソフトウェアにより実現してもよい。上記各部は、専用のIC(Integrated Circuit)などのハードウェアプロセッサ、すなわちハードウェアにより実現してもよい。上記各部は、ソフトウェアおよびハードウェアを併用して実現してもよい。複数のプロセッサを用いる場合、各プロセッサは、各部のうち1つを実現してもよいし、各部のうち2以上を実現してもよい。 Each of the above units (acquisition unit 101, learning unit 102, inference unit 103, and movement control unit 104) is realized by, for example, one or more hardware processors 206. For example, each of the above parts may be realized by having one or a plurality of CPUs execute a program, that is, by software. Each of the above parts may be realized by a hardware processor such as a dedicated IC (Integrated Circuit), that is, hardware. Each of the above parts may be realized by using software and hardware in combination. When a plurality of processors are used, each processor may realize one of each part, or may realize two or more of each part.
 次に、GANの構成例について説明する。図5は、本実施形態で用いるGANの構成例を示す図である。図5に示すように、GANは、生成器501および識別器502の2つのニューラルネットワークを含む。生成器501は、入力される低次元の潜在空間上の潜在変数zに対して、学習データの偽のデータ(高次元データ)を出力する。生成器501は、出力された偽のデータの分布が真の学習データの分布に近くなるように学習される。識別器502は、入力データが真の学習データか偽のデータのいずれであるかを識別し、識別精度をより高くするように学習される。 Next, a configuration example of GAN will be described. FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment. As shown in FIG. 5, the GAN includes two neural networks, a generator 501 and a classifier 502. The generator 501 outputs false data (high-dimensional data) of the training data with respect to the input latent variable z in the low-dimensional latent space. The generator 501 is trained so that the distribution of the output false data is close to the distribution of the true training data. The classifier 502 discriminates whether the input data is true training data or false data, and is trained to improve the discrimination accuracy.
 図2に示すような2リンクロボットを用いる場合、低次元(潜在空間)のデータ(潜在変数)、および、高次元(実空間)のデータ(学習データ、偽のデータ)は、例えば以下のように定義される。 When a two-link robot as shown in FIG. 2 is used, low-dimensional (latent space) data (latent variables) and high-dimensional (real space) data (learning data, fake data) are, for example, as follows. Is defined in.
 まず、関節301、302の角度(関節角度)を、それぞれθ、θとする。関節302、および、リンク312の先端の座標をそれぞれ(x,y)、および、(x,y)とする。リンク311、312の長さ(リンク長)は、例えば1とする。高次元のデータは、2つの関節301、302の角度、関節302の位置、リンク312の先端の位置、を含む6次元のデータ(θ0,θ1,,y,x,y)として表される。このような6次元の位置姿勢情報のうち(x,y,x,y)は、潜在的には(θ0,θ)という2次元の情報を与えれば順運動学によって生成できる。このため、低次元のデータ(潜在変数)は、2次元のデータ(z,z)と定義することができる。同様に、関節などの駆動部がn個(n≧3の整数)である場合は、例えばn次元のデータ(z,z,・・・,zn-1)を潜在変数とすることができる。 First, the angles (joint angles) of the joints 301 and 302 are set to θ 0 and θ 1 , respectively. Let the coordinates of the tips of the joint 302 and the link 312 be (x 0 , y 0 ) and (x 1 , y 1 ), respectively. The length (link length) of the links 311 and 312 is, for example, 1. High-dimensional data includes 6-dimensional data (θ 0, θ 1, x 0 , y 0 , x 1 , y) including the angles of the two joints 301 and 302, the position of the joint 302, and the position of the tip of the link 312. It is expressed as 1 ). Of such 6-dimensional position and orientation information, (x 0 , y 0 , x 1 , y 1 ) is potentially generated by forward kinematics if two-dimensional information (θ 0, θ 1 ) is given. it can. Therefore, low-dimensional data (latent variables) can be defined as two-dimensional data (z 0 , z 1 ). Similarly, when there are n driving parts (integers of n ≧ 3) such as joints, for example, n-dimensional data (z 0 , z 1 , ..., Z n-1 ) should be used as a latent variable. Can be done.
 なお、上記のような低次元(潜在空間)のデータ、および、高次元(実空間)のデータは一例であり、これらに限られるものではない。例えば、関節の自由度よりも大きい次元数の潜在変数が用いられてもよい。 The low-dimensional (latent space) data and the high-dimensional (real space) data as described above are examples, and are not limited to these. For example, a latent variable with a number of dimensions larger than the degree of freedom of the joint may be used.
 図5に示すGANは、生成器501および識別器502それぞれに対して、複数の条件のうち指定された条件(Condition)を入力することができる。これにより、生成器501および識別器502は、条件に応じたデータ(偽のデータ、または、識別結果)を出力することができる。なお、このように条件を入力可能なGANは、Conditional GANと呼ばれる場合がある。なお、条件を入力しないGANを用いるように構成してもよい。 The GAN shown in FIG. 5 can input a specified condition (Condition) out of a plurality of conditions to each of the generator 501 and the classifier 502. As a result, the generator 501 and the classifier 502 can output data (fake data or discrimination result) according to the conditions. A GAN capable of inputting conditions in this way may be called a Conditional GAN. In addition, GAN which does not input a condition may be used.
 図5に示す条件は、ロボット300が移動可能な範囲内に、黒い矩形で示す障害物が存在することを示す。条件の指定方法はどのような方法であってもよい。図5に示すように、ロボット300が移動可能な範囲を複数(図5では16個)に区切った領域(矩形)ごとに障害物が存在するか否かを示す情報(例えば障害物が存在する場合に1、存在しない場合に0)を指定した条件を用いてもよい。ロボット300の周囲を撮像した画像およびロボット300の周囲の物体までの深度情報の一方または両方を、障害物に関する条件として用いてもよい。この場合、センサ400(撮像装置、深度センサ)により検知される画像情報および深度情報を障害物に関する条件として用いることができる。画像情報または深度情報を用いる場合、障害物の位置を示す情報(位置情報)がさらに障害物に関する条件として明示的に与えられてもよいし、与えられなくてもよい。障害物の位置情報を取得可能なセンサ400を用いて、障害物の位置情報のみを障害物に関する条件として用いてもよい。 The condition shown in FIG. 5 indicates that an obstacle indicated by a black rectangle exists within the movable range of the robot 300. The condition can be specified by any method. As shown in FIG. 5, information indicating whether or not an obstacle exists in each area (rectangle) in which the movable range of the robot 300 is divided into a plurality of areas (16 in FIG. 5) (for example, an obstacle exists). The condition in which 1 is specified in the case and 0) is specified in the case where it does not exist may be used. One or both of the image of the surroundings of the robot 300 and the depth information to the objects around the robot 300 may be used as the conditions for obstacles. In this case, the image information and the depth information detected by the sensor 400 (imaging device, depth sensor) can be used as conditions for obstacles. When image information or depth information is used, information indicating the position of the obstacle (position information) may or may not be explicitly given as a condition regarding the obstacle. The sensor 400 capable of acquiring the position information of the obstacle may be used, and only the position information of the obstacle may be used as a condition regarding the obstacle.
 多様体仮説によると、この世の中のあらゆる表現は、より低次元の多様体として表現できる。GANは、多様体仮説に基づき、高次元の表現から低次元の潜在表現を獲得することが期待される。本実施形態のように障害物を回避する軌道計画に適用する場合、GANの条件付き生成モデル(生成器501)を学習することにより、障害物と接触しないロボット300の位置および姿勢の少なくとも一方を示す低次元の潜在表現が獲得される。学習された生成器501は、指定された潜在表現(潜在空間上の位置および姿勢の少なくとも一方を示すデータ)から、実空間上でロボットが障害物を回避する位置および姿勢の少なくとも一方を示すデータを生成することが可能となる。 According to the manifold hypothesis, every expression in the world can be expressed as a lower dimensional manifold. GAN is expected to acquire low-dimensional latent expressions from high-dimensional expressions based on the manifold hypothesis. When applied to a trajectory plan that avoids obstacles as in this embodiment, by learning the conditional generation model (generator 501) of GAN, at least one of the positions and attitudes of the robot 300 that does not come into contact with obstacles can be determined. The low-dimensional latent representation shown is acquired. The trained generator 501 represents data indicating at least one of the positions and orientations in which the robot avoids obstacles in real space from the specified latent representation (data indicating at least one of the positions and attitudes in the latent space). Can be generated.
 また、GANは、潜在空間上のデータ(潜在変数)の隣接関係と、生成するデータの隣接関係とが対応するように学習することが知られている。従って、低次元の潜在空間上で指定された軌道(複数の隣接する位置および姿勢の少なくとも一方)を生成器501により写像すると、実空間上でロボット300が障害物を回避する位置および姿勢の少なくとも一方の軌道が得られる。 It is also known that GAN learns so that the adversarial relationship of data (latent variable) in the latent space corresponds to the adversarial relationship of the generated data. Therefore, when a designated trajectory (at least one of a plurality of adjacent positions and postures) is mapped on the low-dimensional latent space by the generator 501, at least the position and posture in which the robot 300 avoids obstacles in the real space. One orbit is obtained.
 また、障害物の位置などの情報を含む条件を生成器501と識別器502に入力すると、同じ潜在表現から出力される位置および姿勢の少なくとも一方が歪み、条件に含まれる障害物の位置に応じて、障害物を回避する位置およびは姿勢の少なくとも一方が出力される。 Further, when a condition including information such as the position of an obstacle is input to the generator 501 and the classifier 502, at least one of the position and the posture output from the same latent expression is distorted, depending on the position of the obstacle included in the condition. Therefore, at least one of the position and the posture for avoiding the obstacle is output.
 次に、このように構成された本実施形態にかかる制御装置100による学習処理について説明する。図6は、本実施形態における学習処理の一例を示すフローチャートである。 Next, the learning process by the control device 100 according to the present embodiment configured in this way will be described. FIG. 6 is a flowchart showing an example of the learning process in the present embodiment.
 まず、取得部101は、学習データを取得する(ステップS101)。取得部101は、例えば外部の装置からネットワークなどを介して取得され、記憶部121に記憶された学習データを取得する。通常、学習処理は、複数回繰り返し実行される。取得部101は、複数の学習データのうち一部を、各回の学習に用いる学習データ(バッチ)として取得してもよい。 First, the acquisition unit 101 acquires the learning data (step S101). The acquisition unit 101 acquires the learning data acquired from, for example, an external device via a network or the like and stored in the storage unit 121. Usually, the learning process is repeatedly executed a plurality of times. The acquisition unit 101 may acquire a part of the plurality of learning data as learning data (batch) used for each learning.
 次に学習部102は、GANの生成器501により、偽のデータを生成する(ステップS102)。学習部102は、生成した偽のデータ、または、ステップS101で取得された学習データ(真の学習データ)を識別器502に入力し、識別器502が出力する識別結果を得る(ステップS103)。 Next, the learning unit 102 generates fake data by the GAN generator 501 (step S102). The learning unit 102 inputs the generated fake data or the learning data (true learning data) acquired in step S101 into the classifier 502, and obtains the discrimination result output by the classifier 502 (step S103).
 学習部102は、識別結果を用いて、生成器501および識別器502のパラメータを更新する(ステップS104)。例えば学習部102は、識別器502が偽のデータを真の学習データであると誤って識別するほど小さい値となるロス関数の値を最小化するように生成器501のパラメータを更新する。また学習部102は、識別器502の識別結果が正しいほど小さい値となるロス関数の値を最小化するように識別器502のパラメータを更新する。学習部102は、どのようなアルゴリズムを学習に用いてもよいが、例えばAdam(Adaptive moment estimation)を用いて学習を行うことができる。 The learning unit 102 updates the parameters of the generator 501 and the classifier 502 using the discrimination result (step S104). For example, the learning unit 102 updates the parameters of the generator 501 so as to minimize the value of the loss function, which is small enough for the classifier 502 to erroneously identify false data as true learning data. Further, the learning unit 102 updates the parameters of the classifier 502 so as to minimize the value of the loss function in which the discrimination result of the classifier 502 becomes smaller enough to be correct. The learning unit 102 may use any algorithm for learning, and for example, learning can be performed using Adam (Adaptive moment estimation).
 学習部102は、学習を終了するか否かを判定する(ステップS105)。例えば学習部102は、すべての学習データを処理したか、ロス関数の改善の大きさが閾値より小さくなったか、または、学習の回数が上限値に達したか否か、などにより、学習の終了を判定する。 The learning unit 102 determines whether or not to end learning (step S105). For example, the learning unit 102 ends learning depending on whether all the learning data has been processed, whether the improvement of the loss function is smaller than the threshold value, or whether the number of learnings has reached the upper limit. To judge.
 学習が終了していない場合(ステップS105:No)、ステップS101に戻り、新たな学習データに対して処理が繰り返される。学習が終了したと判定された場合(ステップS105:Yes)、学習処理を終了する。 If the learning is not completed (step S105: No), the process returns to step S101 and the process is repeated for the new learning data. When it is determined that the learning is completed (step S105: Yes), the learning process is terminated.
 なお、GANは、学習する際にしばしば勾配が消失あるいは発散するため、学習が難しいことが知られている。そこで学習部102は、学習を安定化させるための手法、例えば、生成器501および識別器502の各層に正規化(Spectral Normalizationなど)を適用する手法を用いてもよい。 It is known that GAN is difficult to learn because the gradient often disappears or diverges when learning. Therefore, the learning unit 102 may use a method for stabilizing learning, for example, a method of applying normalization (Spectral Normalization or the like) to each layer of the generator 501 and the classifier 502.
 以上のような学習処理により、潜在空間上の入力データに対して、実空間上でロボット300が障害物に接触しない位置および姿勢の少なくとも一方を示す出力データを出力する生成モデル(生成器501)が得られる。ロボット300の移動する軌道を計算する際には、このように生成された生成器501が用いられる。 A generative model (generator 501) that outputs output data indicating at least one of a position and a posture in which the robot 300 does not come into contact with an obstacle in the real space with respect to the input data in the latent space by the above learning process. Is obtained. When calculating the moving trajectory of the robot 300, the generator 501 thus generated is used.
 次に、本実施形態にかかる制御装置100によるロボット300の制御処理について説明する。図7は、本実施形態における制御処理の一例を示すフローチャートである。 Next, the control process of the robot 300 by the control device 100 according to the present embodiment will be described. FIG. 7 is a flowchart showing an example of the control process in the present embodiment.
 まず、推論部103は、潜在空間上での、ロボット300のスタート位置(移動開始位置)およびエンド位置(移動終了位置)を計算する(ステップS201)。なお、実空間上でのロボット300のスタート位置およびエンド位置は、事前に与えられていることを前提とする。 First, the inference unit 103 calculates the start position (movement start position) and end position (movement end position) of the robot 300 on the latent space (step S201). It is assumed that the start position and end position of the robot 300 in the real space are given in advance.
 推論部103は、例えば、潜在空間上の潜在変数zをランダムに生成し、潜在変数zを生成器501に入力して得られるデータが、実空間上で与えられたスタート位置と一致するか判定する。なお一致には、値が完全に一致する場合のみでなく、値の差分が閾値以内となる場合を含んでもよい。一致した場合に、推論部103は、生成器501に入力したデータを、潜在空間上でのスタート位置と推定する。一致しない場合は再度潜在変数zをランダムに生成し、処理を繰り返す。推論部103は、潜在空間上のエンド位置も同様にして推定することができる。 For example, the inference unit 103 randomly generates a latent variable z in the latent space, inputs the latent variable z to the generator 501, and determines whether the data obtained matches the start position given in the real space. To do. Note that the match may include not only the case where the values match completely but also the case where the difference between the values is within the threshold value. If they match, the inference unit 103 estimates the data input to the generator 501 as the start position in the latent space. If they do not match, the latent variable z is randomly generated again and the process is repeated. The inference unit 103 can estimate the end position in the latent space in the same way.
 推論部103は、生成器501とは異なるモデル(第2モデル)を用いて、潜在空間上でのロボット300のスタート位置およびエンド位置を計算(推定)してもよい。例えば、学習部102は、実空間上のデータ(生成器501が生成した偽のデータなど)を入力し、潜在空間上のデータを出力するニューラルネットワークモデル(第2モデル)を、生成器501および識別器502の学習と同時に、または独立に学習する。推論部103は、このように学習されたニューラルネットワークモデルに対して、実空間上で与えられたスタート位置およびエンド位置を入力し、出力されたデータを、それぞれ潜在空間上でのロボット300のスタート位置およびエンド位置として推定する。 The inference unit 103 may calculate (estimate) the start position and end position of the robot 300 in the latent space by using a model (second model) different from the generator 501. For example, the learning unit 102 sets a neural network model (second model) that inputs data in real space (such as fake data generated by generator 501) and outputs data in latent space to the generator 501 and The classifier 502 is learned at the same time as or independently. The inference unit 103 inputs the start position and the end position given in the real space to the neural network model learned in this way, and inputs the output data to the start of the robot 300 in the latent space. Estimated as position and end position.
 推論部103は、潜在空間上でのスタート位置とエンド位置とを結ぶ軌道を決定する(ステップS202)。軌道は、スタート位置とエンド位置とを結ぶ軌道であればどのような軌道であってもよい。上記のように2次元の潜在空間を用いる場合、推論部103は、例えばスタート位置とエンド位置とを結ぶ線(直線、曲線)を軌道として決定してもよい。 The inference unit 103 determines the trajectory connecting the start position and the end position in the latent space (step S202). The trajectory may be any trajectory as long as it connects the start position and the end position. When the two-dimensional latent space is used as described above, the inference unit 103 may determine, for example, a line (straight line, curve) connecting the start position and the end position as an orbit.
 推論部103は、決定した軌道に相当する潜在空間上の複数の入力データを生成器501に入力し、生成器501が出力する複数の出力データを得る(ステップS203)。この出力データは、実空間上でロボット300が障害物に接触せずに移動する軌道に相当する。 The inference unit 103 inputs a plurality of input data in the latent space corresponding to the determined trajectory to the generator 501, and obtains a plurality of output data output by the generator 501 (step S203). This output data corresponds to a trajectory in which the robot 300 moves in real space without contacting an obstacle.
 移動制御部104は、計算された軌道に従って移動するようにロボット300の移動を制御する(ステップS204)。 The movement control unit 104 controls the movement of the robot 300 so as to move according to the calculated trajectory (step S204).
 以上のように、本実施形態によれば、GANの学習により得られる生成モデル(生成器501)を用いてロボット300の軌道を計算することができる。このような方法では、ポテンシャル法のような複雑な関数の設計は不要である。また、学習データより低次元の潜在空間での軌道の計算となるため、計算コストを低減することができる。 As described above, according to the present embodiment, the trajectory of the robot 300 can be calculated using the generation model (generator 501) obtained by learning GAN. In such a method, it is not necessary to design a complicated function like the potential method. In addition, since the orbit is calculated in a latent space having a lower dimension than the training data, the calculation cost can be reduced.
 次に、ロボットの移動制御の具体例について図8~図11を用いて説明する。以下では、図2のような2リンクアームロボットであるロボット300をシミュレートするシミュレータを用いて軌道を計算した例を説明する。 Next, specific examples of robot movement control will be described with reference to FIGS. 8 to 11. In the following, an example of calculating the trajectory using a simulator simulating the robot 300, which is a two-link arm robot as shown in FIG. 2, will be described.
 まず、ロボット(シミュレータ)の動作の前提について説明する。学習データは以下のように求める。関節301の関節角度θの範囲は、-90°~+90°とする。関節302の関節角度θの範囲は、0°~+150°とする。各関節角度の刻み幅は1°とする。関節角度θ、θ01から順運動学によって関節302およびリンク312の先端の座標を含む上記の(x,y,x,y)を得ることができる。このようにして得られる6次元のデータ(θ0,θ1,,y,x,y)が学習データとして用いられる。 First, the premise of the operation of the robot (simulator) will be described. The training data is calculated as follows. The range of the joint angle θ 0 of the joint 301 is −90 ° to + 90 °. The range of the joint angle θ 1 of the joint 302 is 0 ° to + 150 °. The step size of each joint angle is 1 °. From the joint angles θ 0 and θ 01 , the above (x 0 , y 0 , x 1 , y 1 ) including the coordinates of the tips of the joint 302 and the link 312 can be obtained by forward kinematics. The six-dimensional data (θ 0, θ 1, x 0 , y 0 , x 1 , y 1 ) obtained in this way is used as training data.
 条件としては、障害物情報を含むマップを与える。マップは、ロボット300の存在する空間を8×4=32の領域に分け、各領域に対して障害物の有無を表す2値の情報(例えば障害物がある場合1、ない場合0)を障害物情報として含む。条件数は、障害物のない場合(いずれの領域にも障害物がない場合)、および、障害物のある場合(いずれかの領域に障害物がある場合)の2つとする。以下では、障害物のない場合を条件1、障害物のある場合を条件2という場合がある。バッチサイズ(各回の学習データの個数)は2056、最適化手法はAdamとし、10万回の学習を行う。 As a condition, give a map including obstacle information. The map divides the space where the robot 300 exists into 8 × 4 = 32 areas, and obstacles binary information (for example, 1 when there is an obstacle, 0 when there is no obstacle) indicating the presence or absence of an obstacle in each area. Included as physical information. There are two conditions, one is when there is no obstacle (when there is no obstacle in any area) and the other is when there is an obstacle (when there is an obstacle in any area). In the following, the case where there is no obstacle may be referred to as condition 1, and the case where there is an obstacle may be referred to as condition 2. The batch size (the number of training data each time) is 2056, the optimization method is Adam, and 100,000 learnings are performed.
 図8は、上記のような前提に従い学習した後の生成器501に対して、条件1(障害物なし)とともに入力された潜在空間上での直線軌道の関節角度空間への写像の一例を示す図である。関節角度空間とは、6次元のデータのうち関節角度θ0,θにより表される2次元の空間である。図8の上部は、潜在空間上で指定されるスタート位置801と、エンド位置802とを結ぶ直線軌道の例を示す。図8の下部は、関節角度空間上に写像されたスタート位置811とエンド位置812とを結ぶ軌道の例を示す。また、図9は、図8の関節角度空間上の軌道を元に動作させたロボット(シミュレータ)の軌跡の例を示す図である。 FIG. 8 shows an example of mapping a linear trajectory to the joint angle space on the latent space input together with condition 1 (no obstacles) for the generator 501 after learning according to the above assumptions. It is a figure. The joint angle space is a two-dimensional space represented by joint angles θ 0 and θ 1 in the six-dimensional data. The upper part of FIG. 8 shows an example of a straight track connecting the start position 801 designated on the latent space and the end position 802. The lower part of FIG. 8 shows an example of a trajectory connecting the start position 811 and the end position 812 mapped on the joint angle space. Further, FIG. 9 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.
 また、図10は、生成器501に対して、条件2(障害物あり)とともに入力された潜在表現上での直線軌道の関節角度空間への写像の一例を示す図である。また、図11は、図10の関節角度空間上の軌道を元に動作させたロボット(シミュレータ)の軌跡の例を示す図である。 Further, FIG. 10 is a diagram showing an example of a mapping of a linear trajectory to the joint angle space on the latent expression input with the condition 2 (with obstacles) for the generator 501. Further, FIG. 11 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.
 潜在表現上での隣接関係はロボット位置および姿勢の少なくとも一方の隣接関係と対応しており、潜在表現上で指定された滑らかな軌道は関節角度空間でも滑らかな軌道となり、ロボット位置および姿勢の少なくとも一方は滑らかに変化する。 The adjacency on the latent representation corresponds to the adjacency of at least one of the robot position and posture, and the smooth trajectory specified on the latent representation becomes a smooth trajectory even in the joint angle space, and at least the robot position and posture. One changes smoothly.
 図11に示すように、障害物情報を条件として入力した場合、障害物1101に衝突する関節角度は生成されず、障害物1101に衝突しない関節角度が生成される。このように、ロボットは障害物1101のある領域を回避して移動することが可能となる。 As shown in FIG. 11, when the obstacle information is input as a condition, the joint angle that collides with the obstacle 1101 is not generated, and the joint angle that does not collide with the obstacle 1101 is generated. In this way, the robot can move while avoiding the area where the obstacle 1101 is located.
 図8のように、条件として障害物がない情報を入力した場合であっても、生成される関節角度が、条件として障害物がある情報を入力した場合である図10の関節角度と類似している。Conditional GANでは異なる条件に対しても同一のニューラルネットワーク(生成モデル)が用いられるため、他の条件の影響を受けることが原因の1つと考えられる。この例のように条件を2つしか用いない場合は、2つの条件が互いに影響し合い、類似する出力データが出力されている可能性がある。条件数をより多くすれば、特定の条件の影響を受けないように学習された(汎化性能の高い)生成モデルが得られることが期待される。例えば、条件1(障害物なし)とともに入力された潜在空間上での直線軌道の関節角度空間への写像が、図8の下部のような歪みを含まない写像となる可能性がある。 As shown in FIG. 8, even when the information without obstacles is input as a condition, the generated joint angle is similar to the joint angle of FIG. 10 when the information with obstacles is input as a condition. ing. Since the same neural network (generative model) is used for different conditions in Conditional GAN, it is considered that one of the causes is that it is affected by other conditions. When only two conditions are used as in this example, there is a possibility that the two conditions affect each other and similar output data is output. If the number of conditions is increased, it is expected that a generative model trained (high generalization performance) will be obtained so as not to be affected by specific conditions. For example, the mapping of the linear trajectory to the joint angle space on the latent space input together with the condition 1 (without obstacles) may be a mapping without distortion as shown in the lower part of FIG.
(変形例1)
 生成モデルは、GANに含まれる生成器に限られるものではない。学習データから低次元の潜在表現が得られるような生成モデルであれば、どのようなモデルを用いてもよい。例えば、GANの代わりに、VAE(Variational Autoencoder)、オートエンコーダ、または、フローベース生成モデルを用いてもよい。
(Modification example 1)
The generative model is not limited to the generators included in the GAN. Any model may be used as long as it is a generative model that can obtain a low-dimensional latent expression from the training data. For example, instead of GAN, a VAE (Variational Autoencoder), an autoencoder, or a flow-based generative model may be used.
 図12は、VAEの構成例を示す図である。VAEでは、6次元のデータ(θ0,θ1,,y,x,y)がエンコーダ1201に入力され、エンコーダ1201が潜在空間上の潜在変数zを出力する。潜在変数zに対して条件が与えられた変数z’がデコーダ1202に入力され、デコーダ1202が新たな6次元のデータを生成して出力する。VAEの場合、デコーダ1202が生成モデルとして用いられる。 FIG. 12 is a diagram showing a configuration example of VAE. In VAE, 6-dimensional data (θ 0, θ 1, x 0 , y 0 , x 1 , y 1 ) is input to the encoder 1201, and the encoder 1201 outputs the latent variable z in the latent space. A variable z'with a condition given to the latent variable z is input to the decoder 1202, and the decoder 1202 generates and outputs new 6-dimensional data. In the case of VAE, decoder 1202 is used as the generative model.
 VAEでは、実空間上でのロボットのスタート位置およびエンド位置をエンコーダ1201に入力することにより、潜在空間上でのスタート位置およびエンド位置を得ることができる。すなわち、エンコーダ1201を、潜在空間上でのロボットのスタート位置およびエンド位置を推定するためのモデル(第2モデル)として用いることができる。 In VAE, the start position and end position in the latent space can be obtained by inputting the start position and end position of the robot in the real space to the encoder 1201. That is, the encoder 1201 can be used as a model (second model) for estimating the start position and end position of the robot in the latent space.
(変形例2)
 図13は、2リンクアームロボットの軌道計画に用いるモデルの学習のための学習データの例を説明するための図である。丸印は、例えば図2の2リンクアームロボットの関節301、302またはリンク312の先端の位置を表す。図13では、座標(0,0)を関節301の位置とし、リンク311、312の長さを1としている。また、図13では、関節302およびリンク312の先端の位置が、それぞれ座標(0,-1.0)および座標(0,-2.0)である状態から、関節301、302の角度を一定幅で変化させた場合のロボット各部の位置の変化が表されている。図13に示す各位置に対応する6次元のデータが学習データとして用いられる。
(Modification 2)
FIG. 13 is a diagram for explaining an example of learning data for learning a model used for trajectory planning of a two-link arm robot. The circles represent, for example, the positions of the tips of the joints 301, 302 or link 312 of the two-link arm robot of FIG. In FIG. 13, the coordinates (0,0) are the positions of the joints 301, and the lengths of the links 311 and 312 are 1. Further, in FIG. 13, the angles of the joints 301 and 302 are constant from the state where the positions of the tips of the joint 302 and the link 312 are the coordinates (0, -1.0) and the coordinates (0, -2.0), respectively. The change in the position of each part of the robot when changed by the width is shown. Six-dimensional data corresponding to each position shown in FIG. 13 is used as learning data.
 各回の学習では、バッチとして与えられる複数の学習データが用いられる。例えば、学習データ全体からランダムに選択された一定数の学習データがバッチとして各回の学習に用いられる。 In each learning, multiple learning data given as a batch are used. For example, a fixed number of training data randomly selected from the entire training data is used as a batch for each training.
 このような方法では、学習結果が安定しない場合がある。例えば、図13に示すような学習データのうち左上部の学習データが偏って選択されたため、学習後に得られる潜在空間にも偏りが生じる場合がある。 With such a method, the learning result may not be stable. For example, since the learning data in the upper left part of the learning data as shown in FIG. 13 is selected in a biased manner, the latent space obtained after the learning may also be biased.
 このような問題を回避し、学習結果がより安定するように学習データを選択してもよい。例えば、取得部101は、1以上の学習データをそれぞれ含む複数のデータ群から、それぞれ1以上の学習データを取得し、各回の学習に用いる学習データ(バッチ)として用いてもよい。図13のような学習データの場合、取得部101は、例えば、位置および姿勢の少なくとも一方を示す値(座標値)に応じて学習データが分類された複数のデータ群それぞれから1以上の学習データを、例えばランダムに選択してもよい。より具体的には、例えば図13に示す位置座標(x,y)の空間をメッシュ状に複数の領域に分割し、各領域内にリンク312の先端の座標が含まれる学習データを、それぞれ各領域に対応するデータ群に分類する。取得部101は、このように分類される各データ群から1以上の学習データを選択することにより、各回の学習に用いる学習データ(バッチ)を取得する。 The learning data may be selected so as to avoid such a problem and make the learning result more stable. For example, the acquisition unit 101 may acquire one or more learning data from each of a plurality of data groups including one or more learning data and use them as learning data (batch) used for each learning. In the case of learning data as shown in FIG. 13, for example, the acquisition unit 101 has one or more learning data from each of a plurality of data groups in which the learning data is classified according to a value (coordinate value) indicating at least one of a position and a posture. May be randomly selected, for example. More specifically, for example, the space of the position coordinates (x, y) shown in FIG. 13 is divided into a plurality of regions in a mesh shape, and the learning data including the coordinates of the tip of the link 312 in each region is provided. Classify into data groups corresponding to the area. The acquisition unit 101 acquires learning data (batch) to be used for each learning by selecting one or more learning data from each data group classified in this way.
 また、取得部101は、障害物により近い位置を含む学習データを優先して選択してもよい。例えば上記のように学習データを複数のデータ群に分類する場合は、取得部101は、障害物を含む領域または障害物を含む領域に隣接する領域から、他の領域よりも多くの学習データを取得してもよい。これにより、障害物を回避する位置および姿勢の少なくとも一方をより効率的に学習することが可能となる。 Further, the acquisition unit 101 may preferentially select learning data including a position closer to the obstacle. For example, when the learning data is classified into a plurality of data groups as described above, the acquisition unit 101 obtains more learning data from the area including the obstacle or the area adjacent to the area containing the obstacle than the other areas. You may get it. This makes it possible to learn at least one of the positions and postures for avoiding obstacles more efficiently.
 本変形例によれば、偏りが生じないように取得した学習データを用いて学習できるため、より均一な潜在空間を生成することが可能となる。 According to this modified example, since learning can be performed using the learning data acquired so as not to cause bias, it is possible to generate a more uniform latent space.
 本明細書において、“a、bおよびcの少なくとも1つ(一方)”のような表現は、a、b、c、a-b、a-c、b-c、a-b-cの組み合わせだけでなく、a-a、a-b-b、a-a-b-b-c-cなどの同じ要素の複数の組み合わせも含む表現である。また、a-b-c-dの組み合わせのように、a、b、c以外の要素を含む構成もカバーする表現である。同様に、本明細書において、“a、bまたはcの少なくとも1つ(一方)”のような表現は、a、b、c、a-b、a-c、b-c、a-b-cの組み合わせだけでなく、a-a、a-b-b、a-a-b-b-c-cなどの同じ要素の複数の組み合わせも含む表現である。また、a-b-c-dの組み合わせのように、a、b、c以外の要素を含む構成もカバーする表現である。 In the present specification, an expression such as "at least one (one) of a, b and c" is a combination of a, b, c, ab, ac, bc and abc. It is an expression that includes not only a plurality of combinations of the same elements such as aa, abb, aabbbcc, and the like. In addition, it is an expression that covers a configuration including elements other than a, b, and c, such as a combination of abcd. Similarly, in the present specification, expressions such as "at least one (one) of a, b or c" are referred to as a, b, c, ab, ac, bc, ab-. It is an expression including not only the combination of c but also a plurality of combinations of the same elements such as aa, abb, aabbbcc. In addition, it is an expression that covers a configuration including elements other than a, b, and c, such as a combination of abcd.
 本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.
1 ロボットシステム
100 制御装置(学習装置)
101 取得部
102 学習部
103 推論部
104 移動制御部
121 記憶部
200 コントローラ
204 メモリ
206 ハードウェアプロセッサ
208 記憶装置
210 操作装置
212 表示装置
214 通信装置
222 ROM
224 RAM
300 ロボット
400 センサ
1 Robot system 100 Control device (learning device)
101 Acquisition unit 102 Learning unit 103 Inference unit 104 Movement control unit 121 Storage unit 200 Controller 204 Memory 206 Hardware processor 208 Storage device 210 Operation device 212 Display device 214 Communication device 222 ROM
224 RAM
300 robot 400 sensor

Claims (13)

  1.  移動体の位置および姿勢の少なくとも一方の潜在的な特徴を示す潜在空間上の入力データを入力し、実空間上で障害物に接触しない前記移動体の位置および姿勢の少なくとも一方を示す出力データを出力する第1モデルに対して、前記潜在空間上の複数の第1入力データを入力し、前記第1モデルが出力する複数の第1出力データを得る推論部と、
     を備える制御装置。
    Input data on the latent space showing the potential features of at least one of the position and posture of the moving body is input, and output data showing at least one of the position and posture of the moving body that does not come into contact with obstacles in the real space is input. An inference unit that inputs a plurality of first input data on the latent space to the first model to be output and obtains a plurality of first output data output by the first model.
    A control device comprising.
  2.  複数の前記第1出力データに基づいて、前記移動体の移動を制御する移動制御部、
     を備える請求項1に記載の制御装置。
    A movement control unit that controls the movement of the moving body based on the plurality of first output data.
    The control device according to claim 1.
  3.  複数の前記第1入力データは、前記実空間上での前記移動体の移動開始位置および移動終了位置に対応する第1入力データを含む、
     請求項1又は2に記載の制御装置。
    The plurality of first input data includes first input data corresponding to the movement start position and the movement end position of the moving body in the real space.
    The control device according to claim 1 or 2.
  4.  複数の前記第1入力データは、前記潜在空間上での移動開始位置および移動終了位置を結ぶ線を構成する、
     請求項1乃至3のいずれか一項に記載の制御装置。
    The plurality of first input data constitutes a line connecting the movement start position and the movement end position in the latent space.
    The control device according to any one of claims 1 to 3.
  5.  前記推論部は、前記第1モデル、または、前記実空間上での前記移動体の位置および姿勢の少なくとも一方を示す入力データを入力し前記潜在空間上の出力データを出力する第2モデルを用いて、前記実空間上での前記移動体の移動開始位置に対応する前記潜在空間上での移動開始位置、および、前記実空間上での前記移動体の移動終了位置に対応する前記潜在空間上での移動終了位置を推定する、
     請求項3又は4に記載の制御装置。
    The inference unit uses the first model or a second model that inputs input data indicating at least one of the position and orientation of the moving body in the real space and outputs output data in the latent space. In the latent space corresponding to the movement start position of the moving body in the real space and the movement end position of the moving body in the real space. Estimate the movement end position in
    The control device according to claim 3 or 4.
  6.  前記第1モデルは、前記障害物に関する条件とともに前記入力データを入力し、前記出力データを出力するように学習され、
     前記推論部は、前記障害物に関する条件とともに複数の前記第1入力データを前記第1モデルに入力し、複数の前記第1出力データを得る、
     請求項1乃至5のいずれか一項に記載の制御装置。
    The first model is trained to input the input data together with the conditions relating to the obstacle and output the output data.
    The inference unit inputs a plurality of the first input data to the first model together with a condition relating to the obstacle, and obtains a plurality of the first output data.
    The control device according to any one of claims 1 to 5.
  7.  前記障害物に関する条件は、前記障害物の画像情報、深度情報、及び、位置情報のいずれか一つを含む、
     請求項6に記載の制御装置。
    The condition relating to the obstacle includes any one of image information, depth information, and position information of the obstacle.
    The control device according to claim 6.
  8.  前記第1モデルは、GAN(Generative Adversarial Network)、VAE(Variational Autoencoder)、オートエンコーダ、または、フローベース生成モデルである、
     請求項1乃至7のいずれか一項に記載の制御装置。
    The first model is a GAN (Generative Adversarial Network), a VAE (Variational Autoencoder), an autoencoder, or a flow-based generative model.
    The control device according to any one of claims 1 to 7.
  9.  前記障害物に関する条件を取得するセンサと、
     請求項1乃至8のいずれか一項に記載の制御装置と、
     前記移動体と、
     を備えるシステム。
    A sensor that acquires the conditions related to the obstacle and
    The control device according to any one of claims 1 to 8.
    With the moving body
    System with.
  10.  実空間上で障害物に接触しない移動体の位置および姿勢の少なくとも一方を示す1以上の学習データを取得する取得部と、
     取得された前記学習データを用いて、前記移動体の位置および姿勢の少なくとも一方の潜在的な特徴を示す潜在空間上の入力データを入力し、実空間上で前記障害物に接触しない前記移動体の位置および姿勢の少なくとも一方を示す出力データを出力する第1モデルを学習する学習部と、
     を備える学習装置。
    An acquisition unit that acquires one or more learning data indicating at least one of the position and posture of a moving body that does not come into contact with an obstacle in real space.
    Using the acquired learning data, input data in a latent space indicating at least one potential feature of the position and posture of the moving body is input, and the moving body does not come into contact with the obstacle in the real space. A learning unit that learns the first model that outputs output data indicating at least one of the position and posture of
    A learning device equipped with.
  11.  前記取得部は、前記1以上の学習データをそれぞれ含む複数のデータ群から、それぞれ1以上の学習データを取得する、
     請求項10に記載の学習装置。
    The acquisition unit acquires one or more learning data from each of a plurality of data groups including the one or more learning data.
    The learning device according to claim 10.
  12.  複数の前記データ群は、前記位置および前記姿勢の少なくとも一方を示す値に応じて分類された1以上の学習データを含む、
     請求項11に記載の学習装置。
    The plurality of data groups include one or more training data classified according to a value indicating at least one of the position and the posture.
    The learning device according to claim 11.
  13.  移動体の位置および姿勢の少なくとも一方の潜在的な特徴を示す潜在空間上の入力データを入力し、実空間上で障害物に接触しない前記移動体の位置および姿勢の少なくとも一方を示す出力データを出力する第1モデルに対して、前記潜在空間上の複数の第1入力データを入力し、前記第1モデルが出力する複数の第1出力データを得るステップ、
     を含む制御方法。
    Input data on the latent space showing the potential features of at least one of the position and posture of the moving body is input, and output data showing at least one of the position and posture of the moving body that does not come into contact with obstacles in the real space is input. A step of inputting a plurality of first input data on the latent space to the first model to be output and obtaining a plurality of first output data output by the first model.
    Control methods including.
PCT/JP2020/021831 2019-06-04 2020-06-02 Control device, system, learning device, and control method WO2020246482A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-104449 2019-06-04
JP2019104449A JP2020196102A (en) 2019-06-04 2019-06-04 Control device, system, learning device and control method

Publications (1)

Publication Number Publication Date
WO2020246482A1 true WO2020246482A1 (en) 2020-12-10

Family

ID=73649427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/021831 WO2020246482A1 (en) 2019-06-04 2020-06-02 Control device, system, learning device, and control method

Country Status (2)

Country Link
JP (1) JP2020196102A (en)
WO (1) WO2020246482A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022163513A1 (en) * 2021-01-27 2022-08-04 富士フイルム株式会社 Learned model generation method, machine learning system, program, and medical image processing device
JP6955733B1 (en) * 2021-02-17 2021-10-27 株式会社エクサウィザーズ Information processing equipment, information processing methods, and programs
JP2023062782A (en) * 2021-10-22 2023-05-09 川崎重工業株式会社 Robot data processing server and interference data providing method
KR102624237B1 (en) * 2023-08-03 2024-01-15 주식회사 아임토리 Domain Adaptation Device and Method for Robot Arm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04340603A (en) * 1991-05-17 1992-11-27 Mitsubishi Electric Corp Method for controlling manipulator considering obstacle avoidance
JP2005125475A (en) * 2003-10-24 2005-05-19 Sunao Kitamura Architecture capable of learning best course by single successful trial
WO2019004350A1 (en) * 2017-06-29 2019-01-03 株式会社 Preferred Networks Data discriminator training method, data discriminator training device, program and training method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04340603A (en) * 1991-05-17 1992-11-27 Mitsubishi Electric Corp Method for controlling manipulator considering obstacle avoidance
JP2005125475A (en) * 2003-10-24 2005-05-19 Sunao Kitamura Architecture capable of learning best course by single successful trial
WO2019004350A1 (en) * 2017-06-29 2019-01-03 株式会社 Preferred Networks Data discriminator training method, data discriminator training device, program and training method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ARIKI, YUKA ET AL.: "Kullback-Leibler dynamic Imitation learning using a shared latent space", PREPRINTS OF THE 34TH ACADEMIC LECTURE CONFERENCE OF THE ROBOTICS SOCIETY OF JAPAN DVD-ROM 2016, THE ROBOTICS SOCIETY OF JAPAN, vol. 34, 7 July 2019 (2019-07-07), XP009527102 *
SHINDOH, TOMONORI: "Pioneering application of GAN of deep learning generative model, Ricoh develops a new algorithm for imitative learning of robot", NIKKEI ROBOTICS, vol. 42, pages 5 - 14, XP009527099, ISSN: 2189-5783 *
TORISHIMA, RYOTA; MORI, HIROKI; TAKAHASHI, KUNIYUKI; OKANOHARA, DAISUKE; OGATA, TETSUYA: "1P2-A14 Conditional generative adversarial networks: Obstacle avoidance for robot arm by conditional generative adversarial networks", PROCEEDINGS OF THE JSME CONFERENCE ON ROBOTICS AND MECHATRONICS; HIROSHIMA, JAPAN; JUNE 5-8 2019, vol. 19, no. 2, 5 June 2019 (2019-06-05), pages 1P2-A14(1) - 1P2-A14(3), XP009527093 *

Also Published As

Publication number Publication date
JP2020196102A (en) 2020-12-10

Similar Documents

Publication Publication Date Title
WO2020246482A1 (en) Control device, system, learning device, and control method
Kyrarini et al. Robot learning of industrial assembly task via human demonstrations
Finn et al. Deep visual foresight for planning robot motion
Rana et al. Towards robust skill generalization: Unifying learning from demonstration and motion planning
Ravichandar et al. Learning Partially Contracting Dynamical Systems from Demonstrations.
Toussaint et al. Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference
Wang et al. Collision-free trajectory planning in human-robot interaction through hand movement prediction from vision
Smits et al. iTASC: a tool for multi-sensor integration in robot manipulation
Kumar et al. Visual motor control of a 7DOF redundant manipulator using redundancy preserving learning network
Sturm et al. Body schema learning for robotic manipulators from visual self-perception
CN115605326A (en) Method for controlling a robot and robot controller
JP7295421B2 (en) Control device and control method
Rozo et al. Orientation probabilistic movement primitives on riemannian manifolds
Paus et al. Predicting pushing action effects on spatial object relations by learning internal prediction models
Zarubin et al. Hierarchical Motion Planning in Topological Representations.
KR20220155921A (en) Method for controlling a robot device
JP2022155828A (en) Trajectory generation system, trajectory generation method and program
Fan et al. Learning resilient behaviors for navigation under uncertainty
Sturm et al. Unsupervised body scheme learning through self-perception
US20220375210A1 (en) Method for controlling a robotic device
WO2017134735A1 (en) Robot system, robot optimization system, and robot operation plan learning method
Ahmadzadeh et al. Generalized Cylinders for Learning, Reproduction, Generalization, and Refinement of Robot Skills.
Ahmad et al. Learning to adapt the parameters of behavior trees and motion generators (btmgs) to task variations
Gäbert et al. Generation of human-like arm motions using sampling-based motion planning
Petrič et al. Smooth transition between tasks on a kinematic control level: Application to self collision avoidance for two Kuka LWR robots

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20818975

Country of ref document: EP

Kind code of ref document: A1