WO2020246482A1

WO2020246482A1 - Control device, system, learning device, and control method

Info

Publication number: WO2020246482A1
Application number: PCT/JP2020/021831
Authority: WO
Inventors: 裕紀森; 亮太鳥島; 哲也尾形; 城志高橋; 大輔岡野原
Original assignee: 株式会社ＰｒｅｆｅｒｒｅｄＮｅｔｗｏｒｋｓ
Priority date: 2019-06-04
Filing date: 2020-06-02
Publication date: 2020-12-10
Also published as: JP2020196102A

Abstract

The control device according to one embodiment of the present invention is provided with a deduction unit. To a first model which receives, as an input, input data on a latent space indicative of latent characteristics of the position and/or attitude of a mobile object and which supplies, as an output, output data indicative of the position and/or attitude of the mobile object that does not contact with an obstacle in a real space, the deduction unit feeds multiple pieces of first input data on the latent space, and obtains multiple pieces of first output data outputted by the first model.

Description

Control devices, systems, learning devices and control methods

Embodiments of the present invention relate to control devices, systems, learning devices and control methods.

In addition to robots used in factories and warehouses, there is an increasing demand for robots that collaborate with humans in the living environment. These robots are expected to operate in an environment where conditions such as lighting and obstacles are constantly changing. Therefore, it is necessary for the robot to have the ability to avoid obstacles in order to ensure that the robot operates without damaging humans, environmental obstacles, and the robot itself.

The problem to be solved by the invention is to more easily find a trajectory to avoid obstacles.

The control device according to the embodiment includes an inference unit. The inference unit inputs input data on the latent space showing the potential features of at least one of the position and orientation of the moving body, and indicates at least one of the positions and postures of the moving body that does not come into contact with obstacles in the real space. A plurality of first input data on the latent space are input to the first model that outputs the output data, and a plurality of first output data output by the first model is obtained.

FIG. 1 is a diagram showing a hardware configuration example of a robot system including the control device of the present embodiment. FIG. 2 is a diagram showing a configuration example of a robot that is a two-link arm robot. FIG. 3 is a hardware block diagram of the control device. FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device. FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment. FIG. 6 is a flowchart showing an example of the learning process in the present embodiment. FIG. 7 is a flowchart showing an example of the control process in the present embodiment. FIG. 8 is a diagram showing an example of mapping a straight line trajectory to the joint angle space on the latent space. FIG. 9 is a diagram showing an example of the trajectory of the robot. FIG. 10 is a diagram showing an example of mapping a straight line trajectory to a joint angle space on a latent representation. FIG. 11 is a diagram showing an example of a trajectory of a robot operated based on a trajectory in the joint angle space. FIG. 12 is a diagram showing a configuration example of VAE. FIG. 13 is a diagram for explaining an example of learning data.

Hereinafter, the embodiment will be described in detail with reference to the drawings.

In the trajectory plan to avoid obstacles, it is not necessary to design by humans, and a method of avoiding obstacles with a smaller calculation cost is desired. Therefore, in the present embodiment, the trajectory for avoiding obstacles is calculated using the generative model. For example, GAN (Generative Adversarial Networks) can be used as a model including a generative model. GAN has an advantage that the training data can be dropped into a lower-dimensional latent representation (data represented on the latent space) based on the manifold hypothesis. In the present embodiment, at least one of the position and the posture in which the robot avoids the obstacle is acquired in the latent space included in the GAN, and the trajectory is specified in the latent space. This makes it easier to realize a trajectory plan that avoids obstacles. Obstacle avoidance can be realized by a simple design, so that not only engineers but also users without skilled knowledge can handle the robot, for example.

FIG. 1 is a diagram showing a hardware configuration example of the robot system 1 including the control device 100 of the present embodiment. As shown in FIG. 1, the robot system 1 includes a control device 100, a controller 200, a robot 300, and a sensor 400.

The robot 300 is an example of a moving body in which at least one of the position and the posture (orbit) is controlled by the control device 100 and moves. The robot 300 includes, for example, a plurality of links, a plurality of joints, and a plurality of driving devices (motors and the like) for driving each of the joints. In the following, a robot 300, which is a two-link arm robot having two joints and two links, will be described as an example.

FIG. 2 is a diagram showing a configuration example of a robot 300 which is a two-link arm robot. As shown in FIG. 2, the robot 300 includes a base member 321 and two

joints

301 and 302, and two

links

311 and 312. The

joints

301 and 302 rotate about an axis in the direction perpendicular to the paper surface of FIG. The joint 301 rotates about an axis fixed to the base member 321. The

links

311 and 312 move according to the rotation of the

joints

301 and 302. FIG. 2 shows how the links 31 and 312 move as the

joints

301 and 302 rotate counterclockwise, respectively.

The applicable robot (moving body) is not limited to this, and any robot (moving body) may be used. For example, it may be a robot having three or more joints and links, a mobile manipulator, and a mobile trolley. Further, the robot may be provided with a drive device for moving the entire robot in parallel in an arbitrary direction in the real space. The moving body may be an object whose overall position changes in this way, or an object such as the link arm robot of FIG. 2 in which a part of the position is fixed and at least one of the position and the posture of the other part changes. It may be.

Returning to FIG. 1, the sensor 400 detects information to be used for controlling the operation of the robot 300. The sensor 400 is, for example, both or one of an imaging device (camera) that captures an image of the surroundings of the robot 300 and a depth sensor (depth sensor) that detects depth information up to an object around the robot 300. The sensor 400 is not limited to these, and may be, for example, a sensor capable of acquiring information (position information) regarding the position of an obstacle.

The controller 200 controls the drive of the robot 300 in response to an instruction from the control device 100. For example, the controller 200 controls a drive device (motor or the like) that drives the joints of the robot 300 so as to rotate in the rotation direction and rotation speed specified by the control device 100.

The control device 100 is connected to the controller 200, the robot 300, and the sensor 400, and controls the entire robot system 1. For example, the control device 100 controls the operation of the robot 300. The control of the motion of the robot 300 includes the calculation of the trajectory using the generative model. The control device 100 outputs an operation command for operating the robot 300 according to the calculated trajectory to the controller 200. The control device 100 may have a function of learning a generative model. In this case, the control device 100 also functions as a learning device for learning the generative model.

FIG. 3 is a hardware block diagram of the control device 100. As an example, the control device 100 is realized by a hardware configuration similar to that of a general computer (information processing device) as shown in FIG. The control device 100 may be realized by one computer as shown in FIG. 3, or may be realized by a plurality of computers operating in cooperation with each other.

The control device 100 includes a memory 204, one or more hardware processors 206, a storage device 208, an operation device 210, a display device 212, and a communication device 214. Each part is connected by a bus.

The memory 204 includes, for example, a ROM 222 and a RAM 224. The ROM 222 stores the program used for controlling the control device 100, various setting information, and the like in a non-rewritable manner. The RAM 224 is a volatile storage medium such as SDRAM (Synchronous Dynamic Random Access Memory). The RAM 224 serves as a work area for one or more hardware processors 206.

One or more hardware processors 206 are connected to memory 204 (ROM 222 and RAM 224) via a bus. The one or more hardware processors 206 may be, for example, one or a plurality of CPUs (Central Processing Units) or one or a plurality of GPUs (Graphics Processing Units). Further, the one or more hardware processors 206 may be a semiconductor device or the like including a dedicated processing circuit for realizing a neural network.

The one or more hardware processors 206 execute various processes in cooperation with various programs stored in ROM 222 or the storage device 208 in advance using a predetermined area of the RAM 224 as a work area, and operate each part constituting the control device 100. Is controlled comprehensively. Further, one or more hardware processors 206 control the operation device 210, the display device 212, the communication device 214, and the like in cooperation with the program stored in the ROM 222 or the storage device 208 in advance.

The storage device 208 is a rewritable recording device such as a semiconductor storage medium such as a flash memory or a magnetically or optically recordable storage medium. The storage device 208 stores a program used for controlling the control device 100, various setting information, and the like.

The operation device 210 is an input device such as a mouse and a keyboard. The operation device 210 receives the information input from the user and outputs the received information to one or more hardware processors 206.

The display device 212 displays information to the user. The display device 212 receives information or the like from one or more hardware processors 206, and displays the received information. When outputting information to the communication device 214, the storage device 208, or the like, the control device 100 does not have to include the display device 212.

The communication device 214 communicates with an external device and transmits / receives information via a network or the like.

The program executed by the control device 100 of the present embodiment is a file in an installable format or an executable format on a computer such as a CD-ROM, a flexible disk (FD), a CD-R, or a DVD (Digital Versatile Disk). It is recorded on a readable recording medium and provided as a computer program product.

Further, the program executed by the control device 100 of the present embodiment may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided or distributed via a network such as the Internet. Further, the program executed by the control device 100 of the present embodiment may be configured to be provided by incorporating it into a ROM or the like in advance.

The program executed by the control device 100 according to the present embodiment can make the computer function as each part of the control device 100 described later. The computer can read and execute a program on the main memory from a computer-readable storage medium by the hardware processor 206.

The hardware configuration shown in FIG. 1 is an example, and is not limited to this. One device may be configured to include a part or all of the control device 100, the controller 200, the robot 300, and the sensor 400. For example, the robot 300 may be configured to also include the functions of the control device 100, the controller 200, and the sensor 400. Further, the control device 100 may be configured to have one or both functions of the controller 200 and the sensor 400. Further, although it is described in FIG. 1 that the control device 100 can also function as a learning device, the control device 100 and the learning device may be realized by physically different devices.

Next, the functional configuration of the control device 100 will be described. FIG. 4 is a functional block diagram showing an example of the functional configuration of the control device 100. As shown in FIG. 4, the control device 100 includes an acquisition unit 101, a learning unit 102, an inference unit 103, a movement control unit 104, and a storage unit 121.

The acquisition unit 101 acquires various information used in various processes executed by the control device 100. For example, the acquisition unit 101 acquires learning data for learning the generative model. The learning data can be acquired by any method, but the acquisition unit 101 acquires, for example, the learning data created in advance from an external device via a network or the like, or from a storage medium.

The learning unit 102 learns the generative model (first model) using the learning data. When GAN is used, the learning unit 102 learns two neural networks, a generator and a discriminator that constitute GAN.

The learning data is, for example, data indicating at least one of the position and posture of the robot 300 that does not come into contact with an obstacle in the real space. By learning using such learning data, at least one of the positions and postures of the moving body that does not come into contact with (interfere with) obstacles in the real space with respect to the input latent expression (data in the latent space). A generator is obtained that outputs the output data indicating. The output data indicating at least one of the position and the posture includes the output data indicating the position, the output data indicating the posture, and the output data indicating both the position and the posture. The generator inputs input data in latent space that indicates the potential features of at least one of the position and orientation of the moving object, and indicates at least one of the positions and orientations of the moving object that does not come into contact with obstacles in real space. It corresponds to the generative model (first model) that outputs the output data. The details of the learning method will be described later.

The inference unit 103 executes inference using the learned generative model. For example, the inference unit 103 inputs a plurality of input data (first input data) forming a line in the latent space to the generative model, and outputs a plurality of output data (first output data) output by the generative model. obtain.

The movement control unit 104 controls the movement of the robot 300. For example, the movement control unit 104 controls the movement of the robot 300 by using the output data obtained by the inference unit 103 as trajectory data indicating a trajectory in which the robot 300 does not come into contact with an obstacle in the real space. More specifically, the movement control unit 104 moves the robot 300 by generating an operation command for operating the robot 300 according to the trajectory data and transmitting the operation command to the controller 200.

The storage unit 121 stores various information used in the control device 100. For example, the storage unit 121 stores parameters (weighting factors, biases, etc.) of the neural network (generator and discriminator) constituting the GAN, and learning data for learning the neural network constituting the GAN. The storage unit 121 is realized by, for example, the storage device 208 of FIG.

Each of the above units (acquisition unit 101, learning unit 102, inference unit 103, and movement control unit 104) is realized by, for example, one or more hardware processors 206. For example, each of the above parts may be realized by having one or a plurality of CPUs execute a program, that is, by software. Each of the above parts may be realized by a hardware processor such as a dedicated IC (Integrated Circuit), that is, hardware. Each of the above parts may be realized by using software and hardware in combination. When a plurality of processors are used, each processor may realize one of each part, or may realize two or more of each part.

Next, a configuration example of GAN will be described. FIG. 5 is a diagram showing a configuration example of GAN used in the present embodiment. As shown in FIG. 5, the GAN includes two neural networks, a generator 501 and a classifier 502. The generator 501 outputs false data (high-dimensional data) of the training data with respect to the input latent variable z in the low-dimensional latent space. The generator 501 is trained so that the distribution of the output false data is close to the distribution of the true training data. The classifier 502 discriminates whether the input data is true training data or false data, and is trained to improve the discrimination accuracy.

When a two-link robot as shown in FIG. 2 is used, low-dimensional (latent space) data (latent variables) and high-dimensional (real space) data (learning data, fake data) are, for example, as follows. Is defined in.

First, the angles (joint angles) of the

joints

301 and 302 are set to θ ₀ and θ ₁ , respectively. Let the coordinates of the tips of the joint 302 and the link 312 be (x ₀ , y ₀ ) and (x ₁ , y ₁ ), respectively. The length (link length) of the

links

311 and 312 is, for example, 1. High-dimensional data includes 6-dimensional data (θ _0, θ _1, x ₀ , y ₀ , x ₁ , y) including the angles of the two

joints

301 and 302, the position of the joint 302, and the position of the tip of the link 312. _It is expressed as ₁ ). Of such 6-dimensional position and orientation information, (x ₀ , y ₀ , x ₁ , y ₁ ) is potentially generated by forward kinematics if two-dimensional information (θ _0, θ ₁ ) is given. it can. Therefore, low-dimensional data (latent variables) can be defined as two-dimensional data (z ₀ , z ₁ ). Similarly, when there are n driving parts (integers of n ≧ 3) such as joints, for example, n-dimensional data (z ₀ , z ₁ , ..., Z _n-1 ) should be used as a latent variable. Can be done.

The low-dimensional (latent space) data and the high-dimensional (real space) data as described above are examples, and are not limited to these. For example, a latent variable with a number of dimensions larger than the degree of freedom of the joint may be used.

The GAN shown in FIG. 5 can input a specified condition (Condition) out of a plurality of conditions to each of the generator 501 and the classifier 502. As a result, the generator 501 and the classifier 502 can output data (fake data or discrimination result) according to the conditions. A GAN capable of inputting conditions in this way may be called a Conditional GAN. In addition, GAN which does not input a condition may be used.

The condition shown in FIG. 5 indicates that an obstacle indicated by a black rectangle exists within the movable range of the robot 300. The condition can be specified by any method. As shown in FIG. 5, information indicating whether or not an obstacle exists in each area (rectangle) in which the movable range of the robot 300 is divided into a plurality of areas (16 in FIG. 5) (for example, an obstacle exists). The condition in which 1 is specified in the case and 0) is specified in the case where it does not exist may be used. One or both of the image of the surroundings of the robot 300 and the depth information to the objects around the robot 300 may be used as the conditions for obstacles. In this case, the image information and the depth information detected by the sensor 400 (imaging device, depth sensor) can be used as conditions for obstacles. When image information or depth information is used, information indicating the position of the obstacle (position information) may or may not be explicitly given as a condition regarding the obstacle. The sensor 400 capable of acquiring the position information of the obstacle may be used, and only the position information of the obstacle may be used as a condition regarding the obstacle.

According to the manifold hypothesis, every expression in the world can be expressed as a lower dimensional manifold. GAN is expected to acquire low-dimensional latent expressions from high-dimensional expressions based on the manifold hypothesis. When applied to a trajectory plan that avoids obstacles as in this embodiment, by learning the conditional generation model (generator 501) of GAN, at least one of the positions and attitudes of the robot 300 that does not come into contact with obstacles can be determined. The low-dimensional latent representation shown is acquired. The trained generator 501 represents data indicating at least one of the positions and orientations in which the robot avoids obstacles in real space from the specified latent representation (data indicating at least one of the positions and attitudes in the latent space). Can be generated.

It is also known that GAN learns so that the adversarial relationship of data (latent variable) in the latent space corresponds to the adversarial relationship of the generated data. Therefore, when a designated trajectory (at least one of a plurality of adjacent positions and postures) is mapped on the low-dimensional latent space by the generator 501, at least the position and posture in which the robot 300 avoids obstacles in the real space. One orbit is obtained.

Further, when a condition including information such as the position of an obstacle is input to the generator 501 and the classifier 502, at least one of the position and the posture output from the same latent expression is distorted, depending on the position of the obstacle included in the condition. Therefore, at least one of the position and the posture for avoiding the obstacle is output.

Next, the learning process by the control device 100 according to the present embodiment configured in this way will be described. FIG. 6 is a flowchart showing an example of the learning process in the present embodiment.

First, the acquisition unit 101 acquires the learning data (step S101). The acquisition unit 101 acquires the learning data acquired from, for example, an external device via a network or the like and stored in the storage unit 121. Usually, the learning process is repeatedly executed a plurality of times. The acquisition unit 101 may acquire a part of the plurality of learning data as learning data (batch) used for each learning.

Next, the learning unit 102 generates fake data by the GAN generator 501 (step S102). The learning unit 102 inputs the generated fake data or the learning data (true learning data) acquired in step S101 into the classifier 502, and obtains the discrimination result output by the classifier 502 (step S103).

The learning unit 102 updates the parameters of the generator 501 and the classifier 502 using the discrimination result (step S104). For example, the learning unit 102 updates the parameters of the generator 501 so as to minimize the value of the loss function, which is small enough for the classifier 502 to erroneously identify false data as true learning data. Further, the learning unit 102 updates the parameters of the classifier 502 so as to minimize the value of the loss function in which the discrimination result of the classifier 502 becomes smaller enough to be correct. The learning unit 102 may use any algorithm for learning, and for example, learning can be performed using Adam (Adaptive moment estimation).

The learning unit 102 determines whether or not to end learning (step S105). For example, the learning unit 102 ends learning depending on whether all the learning data has been processed, whether the improvement of the loss function is smaller than the threshold value, or whether the number of learnings has reached the upper limit. To judge.

If the learning is not completed (step S105: No), the process returns to step S101 and the process is repeated for the new learning data. When it is determined that the learning is completed (step S105: Yes), the learning process is terminated.

It is known that GAN is difficult to learn because the gradient often disappears or diverges when learning. Therefore, the learning unit 102 may use a method for stabilizing learning, for example, a method of applying normalization (Spectral Normalization or the like) to each layer of the generator 501 and the classifier 502.

A generative model (generator 501) that outputs output data indicating at least one of a position and a posture in which the robot 300 does not come into contact with an obstacle in the real space with respect to the input data in the latent space by the above learning process. Is obtained. When calculating the moving trajectory of the robot 300, the generator 501 thus generated is used.

Next, the control process of the robot 300 by the control device 100 according to the present embodiment will be described. FIG. 7 is a flowchart showing an example of the control process in the present embodiment.

First, the inference unit 103 calculates the start position (movement start position) and end position (movement end position) of the robot 300 on the latent space (step S201). It is assumed that the start position and end position of the robot 300 in the real space are given in advance.

For example, the inference unit 103 randomly generates a latent variable z in the latent space, inputs the latent variable z to the generator 501, and determines whether the data obtained matches the start position given in the real space. To do. Note that the match may include not only the case where the values match completely but also the case where the difference between the values is within the threshold value. If they match, the inference unit 103 estimates the data input to the generator 501 as the start position in the latent space. If they do not match, the latent variable z is randomly generated again and the process is repeated. The inference unit 103 can estimate the end position in the latent space in the same way.

The inference unit 103 may calculate (estimate) the start position and end position of the robot 300 in the latent space by using a model (second model) different from the generator 501. For example, the learning unit 102 sets a neural network model (second model) that inputs data in real space (such as fake data generated by generator 501) and outputs data in latent space to the generator 501 and The classifier 502 is learned at the same time as or independently. The inference unit 103 inputs the start position and the end position given in the real space to the neural network model learned in this way, and inputs the output data to the start of the robot 300 in the latent space. Estimated as position and end position.

The inference unit 103 determines the trajectory connecting the start position and the end position in the latent space (step S202). The trajectory may be any trajectory as long as it connects the start position and the end position. When the two-dimensional latent space is used as described above, the inference unit 103 may determine, for example, a line (straight line, curve) connecting the start position and the end position as an orbit.

The inference unit 103 inputs a plurality of input data in the latent space corresponding to the determined trajectory to the generator 501, and obtains a plurality of output data output by the generator 501 (step S203). This output data corresponds to a trajectory in which the robot 300 moves in real space without contacting an obstacle.

The movement control unit 104 controls the movement of the robot 300 so as to move according to the calculated trajectory (step S204).

As described above, according to the present embodiment, the trajectory of the robot 300 can be calculated using the generation model (generator 501) obtained by learning GAN. In such a method, it is not necessary to design a complicated function like the potential method. In addition, since the orbit is calculated in a latent space having a lower dimension than the training data, the calculation cost can be reduced.

Next, specific examples of robot movement control will be described with reference to FIGS. 8 to 11. In the following, an example of calculating the trajectory using a simulator simulating the robot 300, which is a two-link arm robot as shown in FIG. 2, will be described.

First, the premise of the operation of the robot (simulator) will be described. The training data is calculated as follows. The range of the joint angle θ ₀ of the joint 301 is −90 ° to + 90 °. The range of the joint angle θ ₁ of the joint 302 is 0 ° to + 150 °. The step size of each joint angle is 1 °. From the joint angles θ ₀ and θ ₀₁ , the above (x ₀ , y ₀ , x ₁ , y ₁ ) including the coordinates of the tips of the joint 302 and the link 312 can be obtained by forward kinematics. The six-dimensional data (θ _0, θ _1, x ₀ , y ₀ , x ₁ , y ₁ ) obtained in this way is used as training data.

As a condition, give a map including obstacle information. The map divides the space where the robot 300 exists into 8 × 4 = 32 areas, and obstacles binary information (for example, 1 when there is an obstacle, 0 when there is no obstacle) indicating the presence or absence of an obstacle in each area. Included as physical information. There are two conditions, one is when there is no obstacle (when there is no obstacle in any area) and the other is when there is an obstacle (when there is an obstacle in any area). In the following, the case where there is no obstacle may be referred to as condition 1, and the case where there is an obstacle may be referred to as condition 2. The batch size (the number of training data each time) is 2056, the optimization method is Adam, and 100,000 learnings are performed.

FIG. 8 shows an example of mapping a linear trajectory to the joint angle space on the latent space input together with condition 1 (no obstacles) for the generator 501 after learning according to the above assumptions. It is a figure. The joint angle space is a two-dimensional space represented by joint angles θ _{0 and} θ _{1 in} the six-dimensional data. The upper part of FIG. 8 shows an example of a straight track connecting the start position 801 designated on the latent space and the end position 802. The lower part of FIG. 8 shows an example of a trajectory connecting the start position 811 and the end position 812 mapped on the joint angle space. Further, FIG. 9 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.

Further, FIG. 10 is a diagram showing an example of a mapping of a linear trajectory to the joint angle space on the latent expression input with the condition 2 (with obstacles) for the generator 501. Further, FIG. 11 is a diagram showing an example of a trajectory of a robot (simulator) operated based on the trajectory in the joint angle space of FIG.

The adjacency on the latent representation corresponds to the adjacency of at least one of the robot position and posture, and the smooth trajectory specified on the latent representation becomes a smooth trajectory even in the joint angle space, and at least the robot position and posture. One changes smoothly.

As shown in FIG. 11, when the obstacle information is input as a condition, the joint angle that collides with the obstacle 1101 is not generated, and the joint angle that does not collide with the obstacle 1101 is generated. In this way, the robot can move while avoiding the area where the obstacle 1101 is located.

As shown in FIG. 8, even when the information without obstacles is input as a condition, the generated joint angle is similar to the joint angle of FIG. 10 when the information with obstacles is input as a condition. ing. Since the same neural network (generative model) is used for different conditions in Conditional GAN, it is considered that one of the causes is that it is affected by other conditions. When only two conditions are used as in this example, there is a possibility that the two conditions affect each other and similar output data is output. If the number of conditions is increased, it is expected that a generative model trained (high generalization performance) will be obtained so as not to be affected by specific conditions. For example, the mapping of the linear trajectory to the joint angle space on the latent space input together with the condition 1 (without obstacles) may be a mapping without distortion as shown in the lower part of FIG.

(Modification example 1)
The generative model is not limited to the generators included in the GAN. Any model may be used as long as it is a generative model that can obtain a low-dimensional latent expression from the training data. For example, instead of GAN, a VAE (Variational Autoencoder), an autoencoder, or a flow-based generative model may be used.

FIG. 12 is a diagram showing a configuration example of VAE. In VAE, 6-dimensional data (θ _0, θ _1, x ₀ , y ₀ , x ₁ , y ₁ ) is input to the encoder 1201, and the encoder 1201 outputs the latent variable z in the latent space. A variable z'with a condition given to the latent variable z is input to the decoder 1202, and the decoder 1202 generates and outputs new 6-dimensional data. In the case of VAE, decoder 1202 is used as the generative model.

In VAE, the start position and end position in the latent space can be obtained by inputting the start position and end position of the robot in the real space to the encoder 1201. That is, the encoder 1201 can be used as a model (second model) for estimating the start position and end position of the robot in the latent space.

(Modification 2)
FIG. 13 is a diagram for explaining an example of learning data for learning a model used for trajectory planning of a two-link arm robot. The circles represent, for example, the positions of the tips of the

joints

301, 302 or link 312 of the two-link arm robot of FIG. In FIG. 13, the coordinates (0,0) are the positions of the joints 301, and the lengths of the

links

311 and 312 are 1. Further, in FIG. 13, the angles of the

joints

301 and 302 are constant from the state where the positions of the tips of the joint 302 and the link 312 are the coordinates (0, -1.0) and the coordinates (0, -2.0), respectively. The change in the position of each part of the robot when changed by the width is shown. Six-dimensional data corresponding to each position shown in FIG. 13 is used as learning data.

In each learning, multiple learning data given as a batch are used. For example, a fixed number of training data randomly selected from the entire training data is used as a batch for each training.

With such a method, the learning result may not be stable. For example, since the learning data in the upper left part of the learning data as shown in FIG. 13 is selected in a biased manner, the latent space obtained after the learning may also be biased.

The learning data may be selected so as to avoid such a problem and make the learning result more stable. For example, the acquisition unit 101 may acquire one or more learning data from each of a plurality of data groups including one or more learning data and use them as learning data (batch) used for each learning. In the case of learning data as shown in FIG. 13, for example, the acquisition unit 101 has one or more learning data from each of a plurality of data groups in which the learning data is classified according to a value (coordinate value) indicating at least one of a position and a posture. May be randomly selected, for example. More specifically, for example, the space of the position coordinates (x, y) shown in FIG. 13 is divided into a plurality of regions in a mesh shape, and the learning data including the coordinates of the tip of the link 312 in each region is provided. Classify into data groups corresponding to the area. The acquisition unit 101 acquires learning data (batch) to be used for each learning by selecting one or more learning data from each data group classified in this way.

Further, the acquisition unit 101 may preferentially select learning data including a position closer to the obstacle. For example, when the learning data is classified into a plurality of data groups as described above, the acquisition unit 101 obtains more learning data from the area including the obstacle or the area adjacent to the area containing the obstacle than the other areas. You may get it. This makes it possible to learn at least one of the positions and postures for avoiding obstacles more efficiently.

According to this modified example, since learning can be performed using the learning data acquired so as not to cause bias, it is possible to generate a more uniform latent space.

In the present specification, an expression such as "at least one (one) of a, b and c" is a combination of a, b, c, ab, ac, bc and abc. It is an expression that includes not only a plurality of combinations of the same elements such as aa, abb, aabbbcc, and the like. In addition, it is an expression that covers a configuration including elements other than a, b, and c, such as a combination of abcd. Similarly, in the present specification, expressions such as "at least one (one) of a, b or c" are referred to as a, b, c, ab, ac, bc, ab-. It is an expression including not only the combination of c but also a plurality of combinations of the same elements such as aa, abb, aabbbcc. In addition, it is an expression that covers a configuration including elements other than a, b, and c, such as a combination of abcd.

Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

1 Robot system 100 Control device (learning device)
101 Acquisition unit 102 Learning unit 103 Inference unit 104 Movement control unit 121 Storage unit 200 Controller 204 Memory 206 Hardware processor 208 Storage device 210 Operation device 212 Display device 214 Communication device 222 ROM
224 RAM
300 robot 400 sensor

Claims

Input data on the latent space showing the potential features of at least one of the position and posture of the moving body is input, and output data showing at least one of the position and posture of the moving body that does not come into contact with obstacles in the real space is input. An inference unit that inputs a plurality of first input data on the latent space to the first model to be output and obtains a plurality of first output data output by the first model.
A control device comprising.
A movement control unit that controls the movement of the moving body based on the plurality of first output data.
The control device according to claim 1.
The plurality of first input data includes first input data corresponding to the movement start position and the movement end position of the moving body in the real space.
The control device according to claim 1 or 2.
The plurality of first input data constitutes a line connecting the movement start position and the movement end position in the latent space.
The control device according to any one of claims 1 to 3.
The inference unit uses the first model or a second model that inputs input data indicating at least one of the position and orientation of the moving body in the real space and outputs output data in the latent space. In the latent space corresponding to the movement start position of the moving body in the real space and the movement end position of the moving body in the real space. Estimate the movement end position in
The control device according to claim 3 or 4.
The first model is trained to input the input data together with the conditions relating to the obstacle and output the output data.
The inference unit inputs a plurality of the first input data to the first model together with a condition relating to the obstacle, and obtains a plurality of the first output data.
The control device according to any one of claims 1 to 5.
The condition relating to the obstacle includes any one of image information, depth information, and position information of the obstacle.
The control device according to claim 6.
The first model is a GAN (Generative Adversarial Network), a VAE (Variational Autoencoder), an autoencoder, or a flow-based generative model.
The control device according to any one of claims 1 to 7.
A sensor that acquires the conditions related to the obstacle and
The control device according to any one of claims 1 to 8.
With the moving body
System with.
An acquisition unit that acquires one or more learning data indicating at least one of the position and posture of a moving body that does not come into contact with an obstacle in real space.
Using the acquired learning data, input data in a latent space indicating at least one potential feature of the position and posture of the moving body is input, and the moving body does not come into contact with the obstacle in the real space. A learning unit that learns the first model that outputs output data indicating at least one of the position and posture of
A learning device equipped with.
The acquisition unit acquires one or more learning data from each of a plurality of data groups including the one or more learning data.
The learning device according to claim 10.
The plurality of data groups include one or more training data classified according to a value indicating at least one of the position and the posture.
The learning device according to claim 11.
Input data on the latent space showing the potential features of at least one of the position and posture of the moving body is input, and output data showing at least one of the position and posture of the moving body that does not come into contact with obstacles in the real space is input. A step of inputting a plurality of first input data on the latent space to the first model to be output and obtaining a plurality of first output data output by the first model.
Control methods including.