WO2019146007A1

WO2019146007A1 - Position control device and position control method

Info

Publication number: WO2019146007A1
Application number: PCT/JP2018/002053
Authority: WO
Inventors: 勇人山中; 高志南本
Original assignee: 三菱電機株式会社
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2019-08-01
Also published as: JP6458912B1; TW201932257A; JPWO2019146007A1

Abstract

The present invention is provided with: a path determination unit 802 which, when alignment for two objects involving insertion is included, instructs a control amount for the insertion on the basis of an image acquired from an image capturing unit 201 and a value from a force sensor 801, and learns from a result of the alignment; and a control parameter adjustment unit 1401 which outputs a period control amount adjustment value on the basis of a period control amount set for each control period so as to reach the control amount, and a control amount adapted to an external force based on the value of the force sensor 801, wherein a robot arm 100 is operated by a control amount in which a trajectory generation control and a compliant motion control obtained by using the force sensor 801 are added. It is possible to safely make an attempt even in an initial stage of learning according to the present invention, while trial and error is needed until learning ends and the environment can be destroyed in a typical reinforcement learning model.

Description

Position control device and position control method

The present invention relates to a position control device and a position control method.

When constructing a production system that performs assembly operations with a robot arm, it is common to perform teaching operations by a human hand called teaching. However, in this teaching, since the robot repeatedly performs the operation only at the stored position, there may be cases where it can not be dealt with if an error occurs due to manufacturing or mounting. Therefore, if it is possible to develop a position correction technology that absorbs this individual error, it is possible to expect improvement in productivity and also increase the scene in which the robot plays an active part.

Also in the present technology, there is a technology for performing position correction just before the connector insertion operation using a camera image (Patent Document 1). Also, if a plurality of devices such as a force sensor, a stereo camera, etc. are used, it is possible to absorb an error in position related to assembly (insertion, work holding, etc.). However, in order to determine the position correction amount, it is necessary to explicitly calculate the amount of the center coordinates of the gripped connector and the center coordinates of the connector to be inserted as described in the reference from the image information. This calculation depends on the shape of the connector and must be set by the designer for each used connector. This calculation is also relatively easy if three-dimensional information can be acquired from a distance camera etc. However, it is necessary to develop an image processing algorithm for each connector in order to acquire from two-dimensional image information. It takes time.

In addition, there are methods called deep learning and deep reinforcement learning as methods by which the robot learns itself and acquires appropriate actions. However, in order to acquire appropriate behavior by such learning, it is usually necessary to collect a large amount of appropriate learning data. In addition, when collecting data using techniques such as reinforcement learning, it is necessary to experience the same scene over and over again, which requires a large number of trials, and performance can not be guaranteed for unexperienced scenes . Therefore, it is necessary to gather learning data of various scenes uniformly, and it takes a lot of time.
For example, there is also a method of obtaining an optimum route by one success trial as in Patent Document 2, but data that can be used for deep learning or deep reinforcement learning can not be collected, and it is necessary to perform learning for a considerable number of times. There is.

WO 98-017444 JP 2005-125475 A

When performing alignment with insertion for two objects, the function of specifying the position to learn and the function of operating a servomotor or the like to control the position of the robot usually exist independently. Accordingly, since the load given to the object is not considered, there is a problem that an excessive load is given to the object depending on the displacement amount given for learning.

The present invention has been made to solve the above-described problems, and it is possible to prevent excessive load on objects even if the function for learning and the function for position control of the robot are different. Intended to collect data.

When the position control device according to the present invention includes alignment with insertion for two objects, the control amount for insertion is designated based on the image acquired from the imaging unit and the value of the force sensor, and the alignment is performed. Cycle control based on the path determination unit that learns from the results for the above, the cycle control amount set for each control cycle to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor And a combining unit for outputting the amount adjustment value.

According to the present invention, even if the function for learning and the function for position control of the robot are different, it is possible to collect learning data while preventing excessive load on objects.

FIG. 2 is a diagram in which a robot arm 100, a male connector 110, and a female connector 120 according to Embodiment 1 are arranged. FIG. 2 is a functional configuration diagram of a position control device according to Embodiment 1. FIG. 2 is a hardware configuration diagram of a position control device according to Embodiment 1. 6 is a flowchart of position control of the position control device according to the first embodiment. FIG. 8 shows an example of a diagram showing an insertion start position captured by the single-eye camera 102 according to Embodiment 1, and a camera image and a control amount near the periphery thereof. FIG. 2 is a diagram showing an example of a neural network according to the first embodiment and a learning rule of the neural network. 7 is a flowchart using a plurality of networks in the neural network in Embodiment 1. FIG. 7 is a functional configuration diagram of a position control device in Embodiment 2. FIG. 7 is a hardware configuration diagram of a position control device in a second embodiment. FIG. 9 is a view showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment. 10 is a flowchart of path learning of the position control device according to the second embodiment. 15 is a flowchart of path learning in the position control device in Embodiment 3. FIG. 16 is a diagram showing an example of a neural network according to a third embodiment and a learning rule of the neural network. FIG. 14 is a functional configuration diagram of a position control device in a fourth embodiment. FIG. 16 is a flowchart of path learning of the position control device in Embodiment 4. FIG.

Embodiment 1
Hereinafter, embodiments of the present invention will be described.

In the first embodiment, a robot arm that learns the insertion position of each connector and performs assembly on a production line and a position control method thereof will be described.

The configuration will be described. FIG. 1 is a view in which a robot arm 100, a male side connector 110, and a female side connector 120 according to the first embodiment are arranged. The robot arm 100 is provided with a gripping portion 101 for gripping the male connector 110, and the monocular camera 102 is attached to the robot arm 100 so that the gripping portion can be seen. The position of this monocular camera 102 is such that when the grip portion 101 at the tip of the robot arm 100 grips the male connector 110, the tip of the griped male connector 110 and the female connector 120 on the inserted side can be seen Install in

FIG. 2 is a functional configuration diagram of the position control device in the first embodiment.
2, an imaging unit 201 for capturing an image, a control parameter generating unit 202 for generating a control amount of the position of the robot arm 100 using the captured image, and a position, which are functions of the single-eye camera 102 in FIG. Control unit 203 that controls the current / voltage value of the drive unit 204 of the robot arm 100 using the control amount of <'> and changing the position of the robot arm 100 based on the current and voltage values output from the control unit 203 A drive unit 204 is configured.

The control parameter generation unit 202 is a function of the monocular camera 102, and when an image is acquired from the imaging unit 201 that captures an image, control on the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 is performed. The quantities (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are determined, and the control amount is output to the control unit 203. (X, Y, Z are positions of robot arm, Ax, Ay, Az are attitude angles of robot arm 100)
The control unit 203 controls the drive unit 204 based on the received control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) for the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 received. Determine and control current and voltage values for each device to be configured.
The drive unit 204 moves the robot arm 100 to the position of (X + ΔX, Y + ΔY, Z + ΔZ, Ax + ΔAx, Ay + ΔAy, Az + ΔAz) by operating with the current / voltage value for each device received from the control unit 203.

FIG. 3 is a hardware block diagram of the position control device in the first embodiment.
The monocular camera 102 is communicably connected to the processor 302 and the memory 303 via the input / output interface 301 regardless of wired or wireless communication. The input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG. The input / output interface 301 is communicably connected to the control circuit 304 corresponding to the control unit 203 regardless of wired or wireless communication. The control circuit 304 is also electrically connected to the motor 305. The motor 305 corresponds to the drive unit 204 in FIG. 2 and is configured as a component for controlling the position of each device. In the present embodiment, although the motor 305 is used as a form of hardware corresponding to the drive unit 204, any hardware capable of controlling the position may be used. Therefore, the monocular camera 201 and the input / output interface 301 and the input / output interface 301 and the control circuit 304 may be separately provided.

Next, the operation will be described.
FIG. 4 is a flowchart of position control of the position control device according to the first embodiment.
First, in step S101, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and operated based on a control program registered on the side of the control unit 203 in advance.

Next, in step S102, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and posture of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated.
Next, in step S103, the control parameter generation unit 202 instructs the imaging unit 201 of the single eye camera 102 to take an image, and the single eye camera 103 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown.

Next, in step S104, the control parameter generation unit 202 acquires an image from the imaging unit 201, and determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). With regard to the determination of the control amount, the control parameter generation unit 202 uses the processor 302 and the memory 303 of FIG. 3 as hardware and calculates a control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) using a neural network. Do. The calculation method of the control amount using a neural network will be described later.

Next, in step S105, the control unit 203 acquires the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output by the control parameter generation unit 202, and at the same time, determines the predetermined threshold and control amount. Compare all ingredients. If all the components of the control amount are equal to or less than the threshold value, the process proceeds to step S107, and the control unit 203 controls the drive unit 204 to insert the male connector 110 into the female connector 120.
If any component of the control amount is larger than the threshold, the control unit 203 uses the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output by the control parameter generation unit 202 in step S106. Control the step 204 and return to step S103.

Next, a method of calculating the control amount using the neural network in step S104 of FIG. 4 will be described.
Prior to calculation of the control amount using the neural network, as preparation in advance, in order to be able to calculate the movement amount from the input image to the fitting success by the neural network, a set of the image and the necessary movement amount in advance Collect For example, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images. Also, with the insertion start position as the control amount (0, 0, 0, 0, 0, 0), there is not only a movement amount of only the movement amount from the fitting state to the insertion start Images corresponding to ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are also acquired.
FIG. 5 is an example of a diagram showing an insertion start position photographed by the single-eye camera 102 according to the first embodiment, a camera image in the vicinity of the position, and a control amount.

Then, using a plurality of sets of movement amounts from the fitted state to the insertion start position and images of the insertion start position and peripheral positions in the single-eye camera 102, based on a general neural network learning rule (example: probability Gradient method) to learn.
Although various forms such as CNN and RNN exist in the neural network, the present invention does not depend on the form and any form can be used.

FIG. 6 is a diagram showing an example of a neural network in the first embodiment and a learning rule of the neural network.
The input layer receives an image (for example, luminance and color difference value of each pixel) obtained from the monocular camera 102, and the output layer outputs control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). .
In the learning process of the neural network, optimization of the parameters of the intermediate layer is performed so that the output value of the output layer obtained from the input image through the intermediate layer approximates the control amount stored in the image set . There is a stochastic gradient method as an approximation method.

Therefore, as shown in FIG. 5, not only the movement amount of the movement amount from the fitting state to the insertion start, but more accurate learning is obtained by acquiring and learning the image corresponding to the movement around it. It can be carried out.
In FIG. 5, the male connector 110 is fixed in position with respect to the single-eye camera 102, and only the position of the female connector 120 is changed. However, the male side connector 110 is not gripped at the correct position, and there are cases where the position of the male side connector 110 is shifted due to individual differences and the like. Therefore, the male connector 110 is acquired by acquiring and learning a plurality of control amounts and images of the insertion start position and the position in the vicinity thereof when the male connector 110 deviates from the correct position in the process of learning. The learning which can respond to the individual difference of both of and female side connector 120 is performed.

However, it should be noted here that the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) is calculated excluding the amount of movement from the fitted state position at the time of shooting to the insertion start position. The movement amount from the position to the fitted state position needs to be separately stored for use in step S107 of FIG. Further, since the above coordinates can be obtained as a coordinate system of a monocular camera, the control unit 203 needs to control the robot arm 100 after converting the coordinate system of the monocular camera if the coordinate system of the entire robot arm 100 is different. .

In this embodiment, since the monocular camera is fixed to the robot arm 100, the coordinate system in which the female connector 120 is placed is different from the coordinate system of the monocular camera 102. Therefore, if the monocular camera 102 has the same coordinate system as the position of the female connector 120, the conversion from the coordinate system of the monocular camera 102 to the coordinate system of the robot arm 100 is unnecessary.

Next, details of the operation of FIG. 4 and an operation example will be described.
In step S101, the robot arm 100 grips the male connector 110 according to the operation registered in advance in order to grip the male connector 110. In step S102, the female connector 120 is moved substantially upward.

At this time, the position immediately before gripping of the male connector 110 being gripped is not always constant. A slight error may always occur due to a slight movement deviation of a machine that sets the position of the male connector 110 or the like. Similarly, the female connector 120 may also have some errors.

Therefore, in step S103, as shown in FIG. 5, an image captured by both the male connector 110 and the female connector 120 in the image captured by the imaging unit 201 of the single-lens camera 102 attached to the robot arm 100 is acquired. It is important to do. Since the position of the monocular camera 102 with respect to the robot arm 100 is always fixed, relative positional information between the male connector 110 and the female connector 120 is reflected in this image.

In step S104, the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) is calculated by the control parameter generation unit 202 having a neural network as shown in FIG. 6 in which the relative position information is learned in advance. . However, depending on the success or failure of learning, the control amount output from the control parameter generation unit 202 may not operate up to the insertion start position. In such a case, the control parameter generation unit 202 repeatedly calculates so that the loop of steps S103 to S106 is repeated a plurality of times so that the threshold does not exceed the threshold shown in step S105, and the control unit 203 and drive unit 204 control. There is also a case to control the position.

The threshold shown in S105 is determined by the required accuracy of the mating male connector 110 and female connector 120. For example, when the fitting with the connector is loose and the accuracy is not necessary so far as the characteristics of the connector, the threshold can be set large. In the opposite case, the threshold is set smaller. In general, in the case of a manufacturing process, it is also possible to use this value because an error that can be tolerated by manufacturing is often defined.

Further, assuming that the control amount output from the control parameter generation unit 202 can not operate up to the insertion start position depending on the success or failure of learning, a plurality of insertion start positions may be set. If the insertion start position is set without taking a sufficient distance between the male connector 110 and the female connector 120, the male connector 110 and the female connector 120 abut each other before the insertion is started, and one of them is broken. There is also a risk of In that case, for example, the clearance between the male connector 110 and the female connector 120 is 5 mm at the beginning, 20 mm at the next, and 10 mm at the next, depending on the number of loops between step S103 and step S106 in FIG. The insertion start position may be set.

Although the present embodiment has been described using the connector, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb individual differences when performing alignment between objects and objects.

Therefore, in the first embodiment, the imaging unit 201 for capturing an image in which two things are present, and information of the two captured images of things are input to the input layer of the neural network, and the positional relationship between the two things is obtained. A control parameter generation unit 202 that outputs a control amount of position for controlling as an output layer of a neural network, and a current or voltage for controlling the positional relationship between two things using the control amount of the output position Control unit 203 and a drive unit 204 for moving the position of one of the two objects using the current or voltage for controlling the position of the two objects. Alternatively, even if there is an error in the positional relationship between two objects, there is an effect that alignment can be performed with only a single-eye camera.

Although the embodiment using one neural network has been described, it is necessary to use more than one as needed. The reason is that, when the input is an image and the output is a numerical value as in this case, the approximation accuracy of this numerical value is limited, and an error of about several percent may occur depending on the situation. Depending on the amount from the position near the insertion start in step 2 of FIG. 4 to the insertion start position, the determination in step S105 may always be No and the operation may not be completed. In such a case, a plurality of networks are used as shown in FIG.
FIG. 7 is a flowchart using a plurality of networks in the neural network in the first embodiment. It shows the detailed steps of step S104 in FIG. A plurality of parameters are included in the control parameter generator of FIG.

In step S701, the control parameter generation unit 202 selects which network to use based on the input image.
If the loop count is the first or the obtained control amount is 25 mm or more, the neural network 1 is selected and the process proceeds to step S702. If the control amount obtained in the second and subsequent loop times is 5 mm or more and less than 25 mm, the neural network 2 is selected, and the process proceeds to step S703. Furthermore, if the control amount obtained in the second and subsequent loop times is less than 5 mm, the neural network 3 is selected and the process proceeds to step S704. The control amount is calculated using the neural network selected in steps S702 to S704.
For example, each neural network is learned according to the distance or control amount between the male connector 110 and the female connector 120, and the neural network 3 in the figure has learning data with an error of ± 1 mm, ± 1 degree, The neural network 2 changes the range of data in which learning data in the range of ± 1 to ± 10 mm, ± 1 to ± 5 degrees is learned stepwise. Here, it is more efficient not to overlap the range of the image used in each neural network.
Further, although three examples are shown in FIG. 7, the number of networks is not particularly limited. In the case of using such a scheme, it is necessary to prepare the determination function of step S 701 for determining which network to use as the “network selection switch”.
The network selection switch can also be configured as a neural network. In this case, the input image to the input layer and the output of the output layer are network numbers. The image data uses image / network number pairs used in all networks.

Although an example using a plurality of neural networks has also been described using a connector, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
Further, the example using a plurality of neural networks is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.

Therefore, an imaging unit 201 for capturing an image in which two things are present, and information on images of the two captured things are input to an input layer of a neural network to control the positional relationship between the two things. A control parameter generation unit 202 which outputs a control amount as an output layer of a neural network, and a control unit 203 which controls a current or a voltage for controlling the positional relationship between two things using the output control amount of the position; The control parameter generation unit 202 includes a drive unit 204 for moving the position of one of the two objects by using a current or voltage for controlling the position of the two objects, and the control parameter generation unit 202 In order to perform alignment even if there is an individual difference between the individual objects or an error in the positional relationship between the two objects. There is an effect that accuracy can be performed well.

Second Embodiment
In the first embodiment, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images. In the second embodiment, the case where the fitting position of the male connector 110 and the female connector 120 is unknown will be described.

A method called reinforcement learning has been studied as a previous study of a method in which a robot learns by itself and acquires appropriate behavior. In this method, the robot performs various motions by trial and error, and as a result optimizes the behavior while memorizing the behavior that produced a good result, but a large number of trials is required to optimize the behavior And
As a method of reducing the number of trials, a framework called "on policy" is commonly used in reinforcement learning. However, in order to apply this framework to teaching of a robot arm, it is difficult to devise various devices specialized for the robot arm and control signals, which is difficult and has not been put to practical use.
In the second embodiment, the robot as in the first embodiment performs various operations in a trial-and-error manner, and as a result stores the behavior that has produced a good result, while reducing the number of trials for optimizing the behavior as a result The form which can be done is explained.

Describe the system configuration. Parts not specifically described are the same as in the first embodiment.
The entire hardware configuration is the same as that of FIG. 1 of the first embodiment, but a force sensor 801 (not shown in FIG. 1) for measuring the load applied to the grip unit 101 is added to the robot arm 100 It differs in that it is done.

FIG. 8 shows a functional block diagram of the position control device in the second embodiment. The difference from FIG. 2 is that a force sensor 801 and a route determination unit 802 are added, and the route determination unit 802 is configured of a Critic unit 803, an Actor unit 804, an evaluation unit 805, and a route setting unit 806. There is.
FIG. 9 is a hardware block diagram of the position control device in the second embodiment. The only difference from FIG. 3 is that the force sensor 801 is electrically or communicably connected to the input / output interface 301. The input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG. 8 and also configure the function of the path determination unit 802. Therefore, the force sensor 801, the monocular camera 201, and the input / output interface 301, and the input / output interface 301 and the control circuit 304 may be separately provided.

Next, the details of FIG. 8 will be described.
The force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is.
In the Critic unit 803 and the Actor unit 804, S3 and S4 are the same as the Critic unit and the Actor unit in the conventional reinforcement learning.
Here, the conventional reinforcement learning method will be described. In this embodiment, a model called Actor-Critic model is also used among reinforcement learning (Reference: reinforcement learning: RSSutton and AGBarto, published in December 2000). The Actor unit 804 and the Critic unit 803 acquire the state of the environment through the imaging unit 201 and the force sensor 801. The Actor unit 804 is a function that receives the environmental condition I acquired using the sensor device and outputs the control amount A to the robot controller. The Critic unit 803 is a mechanism for the Actor unit 804 to appropriately learn the output A with respect to the input I so that the fitting to the Actor unit 804 is properly and successfully achieved.
Hereinafter, the method of the conventional reinforcement learning method will be described.

In reinforcement learning, an amount called a reward R is defined, and the Actor unit 804 can acquire an action A that maximizes R. As an example, assuming that the work to be learned is the fitting between the male connector 110 and the female connector 120 as shown in the first embodiment, R = 1 when the fitting is successful, R = 0 otherwise. It is defined. The action A indicates the movement correction amount from the current position (X, Y, Z, Ax, Ay, Az) this time, and A = (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). Here, X, Y, Z indicate position coordinates with the central portion of the robot as the origin, and Ax, Ay, Az indicate the amounts of rotation about the X axis, Y axis, and Z axis, respectively. The movement correction amount is a control amount from the fitting start position for the first attempt of fitting the male connector 110 from the current point. The observation of the environmental condition, that is, the trial result is obtained from the image from the imaging unit 201 and the value of the force sensor 801.

In reinforcement learning, a Critic unit 803 learns a function called a state value function V (I). Here, when time t = 1 (for example, at the start of the fitting trial), action A (1) is taken in state I (1), and time t = 2 (for example, the second after the first fitting trial is completed) It is assumed that the environment changes to I (2) at the time of the start of the fitting) and the amount of reward R (2) (the result of the first fitting trial) is obtained. Although various update formulas are conceivable, the following is mentioned as an example.
The update equation for V (I) is defined below.

Here, δ is a prediction error, α is a learning coefficient, a positive real number from 0 to 1, and γ is a discount rate, a positive real number from 0 to 1.
The Actor unit 804 has an input of I and an output of A (I), and A (I) is updated as follows.
When δ> 0

When δ ≦ 0

Here, σ indicates the value of the standard deviation of the output, and Actor adds a random number having a distribution with an average of 0 and a variance of σ ² to A (I) in state I. That is, regardless of the result of the trial, the second movement correction amount is determined randomly.
Although the above-mentioned update formula is used as an example, the Actor-Critic model also has various update formulas, and any model that is generally used regardless of the above can be changed.

However, although the Actor unit 804 learns the appropriate action in each state with the above configuration, the action according to the first embodiment is when learning is completed. During learning, since the recommended action at the time of learning is calculated and delivered from the route setting unit 806, at the time of learning, the control unit 203 receives the movement signal from the route setting unit 806 as it is. Will control.
That is, in the Actor-Critic conventional model, R = 1 when fitting is successful and R = 0 otherwise, learning is performed only when fitting is successful, and fitting is Since the movement correction amount used for the trial is randomly given until the success, the determination of the movement correction amount for the next trial according to the trial failure degree is not performed. This is the same result because not only the conventional model of Actor-Critic but also other reinforcement learning models such as Q-Learning are used to evaluate only the success and failure of the fitting itself. In the present embodiment of the present invention, a process of evaluating the degree of failure and determining the movement correction amount for the next trial will be described.

The evaluation unit 805 generates a function that performs evaluation at each fitting trial.
FIG. 10 is a diagram showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment.
For example, it is assumed that an image as shown in FIG. 10A is obtained as a result of the trial. In this trial, the fitting position of the connector is largely misaligned and fails. At this time, how close to success is measured and quantified to obtain an evaluation value indicating the degree of success. As a method of digitization, for example, as shown in FIG. 10B, there is a method of calculating a connector surface area (number of pixels) on the insertion side in an image. In this method, when the insertion failure of the male connector 110 and the female connector 120 is detected by the force sensor 801 of the robot arm 100, only the surface of the mating surface of the female connector 120 is coated with a color different from other backgrounds. Alternatively, attaching a sticker makes data acquisition and calculation from images easier. Further, although the method described so far is the case where the number of cameras is one, a plurality of cameras may be arranged side by side, and the results obtained by using each of the photographed images may be integrated. In addition to the connector surface area, the same thing can be evaluated even if the number of pixels in the two-dimensional direction (for example, the X and Y directions) is acquired.

The route setting unit 806 is divided into two steps as processing.
In the first step, the evaluation result processed by the evaluation unit 805 and the motion that the robot has moved to practice are learned. Assuming that the movement correction amount of the robot is A, and the evaluation value indicating the degree of success processed by the evaluation unit 805 is E, the path setting unit 806 prepares and approximates a function having A as an input and E as an output. . As an example, RBF (Radial Basis Function) network raises as a function. RBF is known as a function that can easily approximate various unknown functions.
For example, the kth input

Whereas the output f (x) is defined as:

Here, σ is the standard deviation, and μ is the center of RBF.

The data learned in RBF is not single but all data from the start of the trial to the latest data. For example, in the case of the Nth trial, N data are currently prepared. Although it is necessary to determine the above W = (w_1,... W_J) by learning, various methods can be considered for the determination, and the following RBF complementation can be mentioned as an example.

When

Learning is completed.

After the approximation is completed by the RBF complement, the minimum value is determined by the above RBF network by a general optimization method such as the steepest descent method or PSO (Particle Swarm Optimization). This minimum value is input to the next Actor unit 804 as the next recommended value.
In short, to describe the above case specifically, the surface area and the number of pixels in the two-dimensional direction with respect to the movement correction amount at the time of failure are arranged as time series for each trial number as evaluation values and the optimum solution is determined using the values of the order. It is a thing. The movement correction amount moved at a constant rate in the direction of decreasing the number of pixels in the two-dimensional direction may be determined more simply.

Next, an operation flow is shown in FIG.
FIG. 11 is a flowchart of path learning of the position control device according to the second embodiment.
First, in step S <b> 1101, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the control unit 203 side of FIG. 8 and operated based on a control program registered on the control unit 203 side in advance.

Next, in step S1102, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 8, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.

Next, in step S1103, the path determination unit 802 instructs the imaging unit 201 of the single-eye camera 102 to capture an image, and the single-eye camera 102 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown. Further, the path determination unit 802 instructs the control unit 203 and the single-eye camera 102 to capture an image near the current position, and the movement unit 204 moves the drive unit 204 based on the plurality of movement values instructed to the control unit 203. In this position, the single-eye camera captures an image in which both the male connector 110 and the female connector 120 to be inserted appear.

Next, in step S1104, the Actor unit 804 of the path determination unit 802 gives a control amount for fitting to the control unit 203 and causes the drive unit 204 to move the robot arm 100, and the male side connector 110; The fitting of the female connector 120 to be inserted is tried.
Next, in step S1105, when the connectors are in contact with each other while the robot arm 100 is being moved by the drive unit 204, the value of the force sensor 801 and the image from the monocular camera 102 are determined for each unit amount of movement amount The evaluation unit 805 and the Critic unit 803 of 802 store them.

Then, in step S1106, the evaluation unit 805 and the Critic unit 803 confirm whether the fitting is successful.
Usually, the fit is not successful at this point. Therefore, in step S1108, the evaluation unit 805 evaluates the degree of success according to the method described with reference to FIG. 10, and provides the path setting unit 806 with an evaluation value indicating the degree of success for alignment.
Then, in step S1109, the route setting unit 806 performs learning using the above-described method, and the route setting unit 806 gives the next recommended value to the Actor unit 804, and the Critic unit 803 obtains it according to the amount of reward. The Actor unit 804 outputs the received value. In step S 1110, the Actor unit 804 adds the value obtained according to the reward amount output from the Critic unit 803 and the next recommended value output from the route setting unit 806 to obtain a movement correction amount. In this step, when using the next recommended value output by the route setting unit 806 has sufficient effect, it is not necessary to add the value obtained according to the amount of reward output by the Critic unit 803. Needless to say. In addition, in order to obtain the movement correction amount, the Actor unit 804 sets an addition ratio of the value obtained according to the reward amount output by the Critic unit 803 and the next recommended value output by the route setting unit 806, and adds the ratio. It may be changed according to

Thereafter, in step S1111, the Actor unit 804 gives the movement correction amount to the control unit 203 to move the gripping unit 101 of the robot arm 100.
Thereafter, the process returns to step 1103 again, and the image is photographed at the position moved by the movement correction amount, and the fitting operation is performed. Repeat this until it succeeds.
If the fitting is successful, in step S1107, after the fitting is successful, learning of the Actor unit 804 and the Critic unit 803 is performed for I from steps S1102 to S1106 when the fitting is successful. Finally, the path determination unit 802 supplies the learned data of the neural network to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.

In the step S1107, the learning of the Actor unit 804 and the Critic unit 803 is performed for I when the fitting is successful, but the Actor unit 804 and the Critic are obtained using data of all trials from the disclosure of the fitting trial to the success. The unit 803 may learn. In that case, although the case where a plurality of neural networks are formed according to the control amount is described in Embodiment 1, if the position of the success of the fitting is known, control is performed using the distance to the success of the fitting. It is possible to simultaneously form a plurality of neural networks suitable for the magnitude of the quantity.

Although described based on the Actor-Critic model as a reinforcement learning module, other reinforcement learning models such as Q-Learning may be used.
Although the RBF network is mentioned as the function approximation, other function approximation methods (linear, quadratic function, etc.) may be used.
Although the method of making the surface of the connector different in color has been described as the evaluation method, the amount of displacement between the connectors may be used as the evaluation method by other image processing techniques.

Further, as described in Embodiment 1 and the present embodiment, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method can be used to produce an effect.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.

Therefore, in the present embodiment, when using the Actor-Critic model to learn the control amount, the Actor unit 804 uses the value obtained by the Critic unit 803 according to the amount of reward, and the route setting unit 806 uses the evaluation value. Based on the recommended value obtained based on the above to obtain the movement correction amount for trial, the normal Actor-Critic model requires a large number of trial and error numbers until alignment is successful. The invention makes it possible to significantly reduce the number of alignment trials.

In the present embodiment, the number of alignment trials is reduced by evaluating the image from the imaging unit 201 at the time of alignment failure, but the value of the force sensor 801 at the time of alignment trial is described. Can also reduce the number of trials. For example, in alignment including fitting of a connector or insertion of two things, in the case of failure, when the value of the force sensor 801 exceeds a certain threshold, the positions of two things are completely fitted or inserted. In general, the Actor unit 804 determines whether or not the position is present. In that case, a. If it is in the process of fitting or inserting when the threshold is reached b. It is also conceivable that the value of the force sensor 801 which has been fitted and inserted but has been fitted or inserted shows a certain value.
a. In the case of the above, there is a method of learning both the value of the force sensor 801 and the image, and the details can be implemented using the method described in the third embodiment.
b. Also in the case of the above, the method described in the third embodiment can be implemented as a method of learning only with the value of the force sensor 801. As another method, in the definition of reward R in the Actor-Critic model, when the maximum load applied during fitting or insertion is F and A is a positive constant, R = (1 when successful) The same effect can be achieved by defining -A / F) and R = 0 at failure.

Third Embodiment
In the present embodiment, a method of efficiently collecting data in a learning process performed after successful alignment in Embodiment 2 will be described. Therefore, the case where it is not particularly described is the same as the second embodiment. That is, the functional block diagram of the position control device in the third embodiment is shown in FIG. 8, and the hardware block diagram is in FIG.

In operation, a method of collecting learning data more efficiently in the operation of step S1107 of FIG. 11 in the second embodiment will be described below.

FIG. 12 shows a flowchart in path learning of the position control device in the third embodiment.
First, when the male connector 110 and the female connector 120 are successfully fitted in step S1107 of FIG. 11 in step S1201, the path setting unit 806 sets the variables to i = 0, j = 1, k = 1 and initially. Turn The variable i is the number of times of learning of the robot arm 100, the variable k is the number of times of learning from when the male connector 110 and the female connector 120 are disengaged, and the variable j is the number of loops in the flowchart of FIG. It is.

Next, in step S1202, the path setting unit 806 gives the movement amount to the control unit 203 via the Actor unit 804 so as to return 1 mm from the movement amount given to perform the fitting in FIG. The robot arm 100 is moved by the drive unit 204. Then, 1 is added to the variable i. Here, an instruction to return 1 mm from the movement amount is given. However, the instruction is not necessarily limited to 1 mm, and a unit amount such as 0.5 mm or 2 mm may be used.

Next, in step S1203, the route setting unit 806 stores the coordinates at that time as O (i) (i = 1 at this time).
In step S 1204, the route setting unit 806 randomly determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) centering on O (i), and controls the control unit 203 via the Actor unit 804. The amount is given, and the robot arm 100 is moved by the drive unit 204. At this time, the maximum amount of this control amount can be set arbitrarily within the range in which movement is possible.

Next, in step S1205, at the position after movement in step S1204, the Actor unit 804 collects the values of the force sensor 801 corresponding to the movement amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz), and In S1206, the Critic part 803 and the Actor part 804 take the movement amount to hold the male side connector 110 by multiplying the movement amount by -1 (-.DELTA.X, -.DELTA.Y, -.DELTA.Z, -.DELTA.Ax, -.DELTA.Ay, -.DELTA.Az) The sensor value of the force sensor 801 that measures the force is recorded as learning data.

Next, in step S1207, the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1208, and the process returns to step S1204 to change the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) by random numbers and acquire data, The steps S1204 to S1207 are repeated until the individual data are accumulated.
When the specified number of data are accumulated, the route setting unit 806 sets the variable j to 1 in step S1209, and then confirms whether the male connector 110 and the female connector 120 are disengaged in step S1210. Do.

If not, the process returns to step S1202 via step S1211.
In step S1211, the route setting unit 806 gives a control amount to the control unit 203 via the Actor unit 804 so as to return the coordinates of the robot arm 100 to the coordinates O (i) before giving the control amount. To move the robot arm 100.
Thereafter, the loop from step S1202 to step S1210 is returned by 1 mm or a unit amount from the control amount given to perform fitting until the fitting between the male connector 110 and the female connector 120 is released. A process of giving a control amount centering on the position and collecting data of the force sensor 801 is repeated. If the male connector 110 and the female connector 120 are disengaged from each other, the process proceeds to step S1212.

In step S1212, the route setting unit 806 sets the variable i to I (I is an integer larger than the value of i when it is determined that the male connector 110 and the female connector 120 are disengaged from each other). A control amount is given to the control unit 203 via the Actor unit 804 so as to return, for example, 10 mm (this may also be another value) from the movement amount given to perform fitting, and the drive unit 204 Move it.

Next, in step S1213, the path setting unit 806 stores the position of the coordinates of the robot arm 100 moved in step S1212 as the central position O (i + k).
Next, in step S 1214, the route setting unit 806 randomly determines again the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) again centering on the center position O (i + k), A control amount is given to the control unit 203 via the Actor unit 804, and the robot arm 100 is moved by the drive unit 204.

In step S 1215, the Critic unit 803 and the Actor unit 804 are images captured by the imaging unit 201 of the monocular camera 102 at the robot arm 100 position after movement by the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) To get
In step S 1216, the Critic unit 803 and the Actor unit 804 record the image as one learning data as (−ΔX, −ΔY, −ΔZ, −ΔAx, −ΔAy, −ΔAz) in which the movement amount is multiplied by −1. .

In step S1217, the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1212, and the process returns to step S1214 to change the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) by random numbers and acquire data, and the specified number J S1214 to S1217 are repeated until individual data are accumulated.
The maximum value of the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) in S 1204 and the random value of the control amount in S 1204 can take different values.
The learning data acquired by the above method performs learning of the Actor unit 804 and the Critic unit 803.

FIG. 13 is a diagram showing an example of a neural network according to the third embodiment and a learning rule of the neural network.
The first and second embodiments have not described a learning method using data of the force sensor 801. In the first and second embodiments, the input layer is only an image, while in the third embodiment, the value of the force sensor 801 may be inserted into the input layer instead of the image. The value of the force sensor 801 may be either three (force and moment in two directions) or six (three moment and three moment). The output layer outputs control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). When the male connector 110 and the female connector 120 are not fitted to each other, the image and the value of the force sensor 801 are simultaneously input to the input layer.
In the learning process of the neural network, the output value of the output layer obtained through the intermediate layer from the input image and the value of the force sensor 801 approximates the value of the image and the force sensor 801 and the control amount stored in a set. The optimization of the parameters of the middle layer is performed to learn, and is learned.
Finally, the path determination unit 802 supplies the learned data of the neural network to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.

In the present embodiment, the robot arm 100 is slightly moved from the movement for fitting the male connector 110 and the female connector 120 while the robot arm 100 is slightly moved to the periphery for learning. The explanation has been made on the assumption that sufficient learning can not be performed depending on the pixel amount of the image of the single-eye camera 102 until the difference occurs.
However, in the case where the image of the single-eye camera 102 is sufficiently high definition and an image in which the robot arm 100 is slightly moved to the periphery is sufficiently learnable, learning may be performed only with the image of the single-eye camera 102. Even when the male connector 110 and the female connector 120 are fitted, both the image of the monocular camera 102 and the value of the force sensor 801 may be used.

Furthermore, in

Embodiments

1 and 2, the case of using a plurality of neural networks is described. Also in the present embodiment, for example, the neural network is used in a state in which the male connector 110 and the female connector 120 are fitted and in the case where the male connector 110 and the female connector 120 are not fitted. You may distinguish. As described above, in the state where the male connector 110 and the female connector 120 are fitted, only the force sensor 801 is formed as the input layer, and when it is out of the fitting, the input layer is formed by only the image In this case, learning can be performed with higher accuracy, and even when learning is performed using only images, it is possible to perform accurate learning because the configuration of an image is different by distinguishing between cases where fitting is performed and cases where fitting is not performed.

As described in the first and second embodiments, the application of this technique is not limited to the fitting of the connector even in the present embodiment. For example, the present invention can be applied to the case of mounting an IC on a substrate, and even in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method is effective.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.

Therefore, in the present embodiment, when including alignment with insertion for two objects, in order to learn the control amount, when extracting from the insertion state, it is moved on the path from the insertion state and its periphery The path setting unit 806 for instructing the control amount, the output layer of the moved position, the value of the force sensor 801 at the moved position as the input layer, and the moved position and the value of the force sensor 801 The acquisition of the Actor unit 804 enables efficient collection of learning data.

Fourth Embodiment
In the present embodiment, a method of performing safe control also in the learning process (particularly, in the initial stage of learning) in the second embodiment will be described. The hardware configuration diagram of the position control device in the fourth embodiment is the same as that of the second embodiment shown in FIG.

FIG. 14 shows a functional configuration diagram of the position control device in the fourth embodiment. The difference from FIG. 8 is that a control parameter adjustment unit 1401 is added, and the control parameter adjustment unit 1401 includes a trajectory generation unit 1402, a coordinate conversion unit 1403, a gravity correction unit 1404, a compliant motion control unit 1405, and a combination. A section 1406 is composed.

The configuration of the route determination unit 802 is the same as that of the second embodiment, and the configuration including the Actor-Critic model as a reinforcement learning module and the evaluation unit 805 and the route setting unit 806 as modules for evaluating the degree of success is described. You may use other reinforcement learning models, such as DDPG. Moreover, from the point of the contents described in the present embodiment, even if there is no evaluation unit 805 or the path setting unit 806, the function for learning and the function for controlling the position of the robot are different. Learning data can be collected while preventing the application of a heavy load.

Next, the details of FIG. 14 will be described.
In the second embodiment, the control amount generated by the control parameter generation unit 202 is output to the control unit 203 to determine and control the current / voltage value for each device constituting the drive unit 204. In particular, in the initial learning process, the control amount may become inappropriate, and the drive unit 204 may stop in error or damage the surrounding environment such as the robot arm and the male connector 110 and the female connector 120. Further, the strengths of the male connector 110 and the female connector 120 are weaker than expected, and even if the control amount is set sufficiently small in the learning process, the surrounding environment such as the male connector 110 and the female connector 120 is damaged. there is a possibility. This is a factor that may occur because the setting side of the control amount and the control side based on the control amount are independent. Here, in the present embodiment, a mechanism is introduced to prevent the surrounding environment from being overloaded even in the learning process.

The trajectory generation unit 1402 for generating a trajectory acquires the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) generated by the control parameter generation unit 202 as a target position, and adjusts so that the velocity and acceleration become smooth. It has a function of outputting the periodic control amounts (ΔX ′, ΔY ′, ΔZ ′, ΔAx ′, ΔAy ′, ΔAz ′) in accordance with the control cycle of the robot arm 100, that is, the control cycle of the control unit 203. While the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) as the target position is defined as a control period to be reached at a plurality of cycles, the period control amount (ΔX ′, ΔY ′) , ZZ ', AAx', AAy ', AAz') are basically control amounts set for each cycle to reach the control amount, and are control amounts taking into account the load on the surrounding environment, assumed The external force sensor 801 can also cope with the detected load. The method of adjusting the periodic control amount will be described later.
The force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is. In the second embodiment, an excessive force is applied to the surrounding environment due to the operation output at the initial stage of learning, which may damage the surrounding environment such as the robot arm 100 and the male connector 110 and the female connector 120. There is. Therefore, in the fourth embodiment, the robot arm 100 or the male-side connector can be operated by disposing the compliant motion control unit 1405 at the subsequent stage of the control parameter generation unit 202 and operating according to the external force acquired by the force sensor 801. It is prevented that excessive force is applied to the surrounding environment such as 110 and the female side connector 120. This makes it possible to safely carry out the trials necessary for learning.
The value of the force sensor 801 may be either three (force and moment in two directions) or six (three moment and three moment). The values of the force sensor 801 in the six cases can be expressed as (Fx, Fy, Fz, Tx, Ty, Tz). However, since the above coordinates can be obtained as the coordinate system of the force sensor, the coordinate conversion unit 1403 determines the value of the force sensor 801 for the entire robot arm 100 when the coordinate system of the force sensor and the coordinate system of the entire robot arm 100 are different. It has a function to convert to coordinate system.
The value measured by the force sensor 801 is affected by gravity. The gravity correction unit 1404 has a function of removing the influence of gravity from the value measured by the force sensor 801.
The compliant motion control unit 1405 acquires the value of the force sensor 801 corrected by the coordinate conversion unit 1403 and the gravity correction unit 1404. According to the physical law, a control amount adapted to the external force detected from the force sensor 801 is output. The adjustment method of the control amount adapted to the external force will be described later.
The combining unit 1406 combines the control amount output from the trajectory generation unit 1402 and the control amount output from the compliant motion control unit 1405, and outputs the combined amount to the control unit 203. In the combining method, the addition of the control amount which is the output of the trajectory generation unit 1402 and the control amount which is the output of the compliant motion control unit 1405 is used. Alternatively, the addition ratio may be set, and weighted addition may be performed according to the addition ratio.

The trajectory generation unit 1402 calculates the periodic control amount in control cycle units so as not to exceed at least one of these limitations, given the control period of the robot arm 100 and the maximum velocity, maximum acceleration, and maximum jerk. .
For example, non-patent documents (KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on IEEE, 2012. As in p. 4862-4869.), there is a method of calculating the periodic control amount so as to satisfy all the following conditions.
The following constants are given constants corresponding to the specifications of the robot arm 100.

Tcycle: Period Vmax: Maximum velocity Amax: Maximum acceleration Jmax: Maximum jerk (jerk)

Here, xi, vi, αi and ji are variables representing the following.
xi: Current position in step i vi: Current velocity α in step i: Current acceleration in step i ji: Current acceleration in step i (jerk)

A method of adjusting the control amount adapted to the external force in the compliant motion control unit 1405 described above will be described. In the compliant motion control, the dependent motion is calculated from the information of the external force, given a coefficient representing the stability and stiffness of the environment. For example, when the external force is f (t), the dependent movement Δx (t) can be calculated by solving the following differential equation.

Here, m, d, k are coefficients representing the stability and stiffness of the environment.
m: gravity constant
d: resistance
k: spring constant

Next, the operation flow is shown in FIG.
FIG. 15 is a flowchart of route learning of the position control device according to the fourth embodiment.
First, in step S <b> 1501, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and attitude of the male connector 110 are registered in advance on the control unit 203 side of FIG. 14 and operated based on a control program registered on the control unit 203 side in advance.

Next, in step S1502, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 14, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.

Next, in step S1503, the path determination unit 802 instructs the imaging unit 201 of the single-eye camera 102 to take an image, and the single-eye camera 102 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown. The evaluation unit 805 and the Critic unit 803 of the path determination unit 802 store the image from the single-eye camera 102.
Further, in step S1504, the path determination unit 802 instructs the force sensor 801 to acquire an external force, and the force sensor 801 acquires an external force at the current position. At the same time, the evaluation unit 805 and the Critic unit 803 of the path determination unit 802 store the value of the force sensor 801.

Next, in step S1505, the Actor unit 804 of the path determination unit 802 calculates a control amount for fitting, and supplies the control amount to the trajectory generation unit 1402.

Next, in step S1506, the trajectory generation unit 1402 calculates a new control amount adjusted so that the velocity and acceleration become smooth. Specifically, a cycle control amount is calculated which is a target position for each control cycle xi which satisfies given constants Tcycle, Vmax, Amax, and Jmax corresponding to the specification of the robot arm 100 described above.

In step S1507, the coordinate conversion unit 1403 converts the value of the force sensor 801 acquired in step S1503 into the coordinate system of the entire robot arm 100.

Next, in step S1508, the gravity correction unit 1404 removes the influence of gravity from the value of the force sensor 801 subjected to coordinate conversion in step S1506, and applies the value to the compliant motion control unit 1405.

Next, in step S1509, the compliant motion control unit 1405 calculates a control amount adapted to the external force from the value of the force sensor 801 corrected for gravity in step S1508, and supplies the control amount to the combining unit 1406. As the control amount adapted to the external force, for example, the value calculated in the above is calculated so that the value of the force sensor 801 becomes smaller.

Next, in step S1510, the combining unit 1406 combines the periodic control amount calculated in step S1506 and the compliant motion control amount calculated in step S1509 by adding or weighting, and adjusts the periodic control amount adjustment value. As the control unit 203.

Next, in step S1511, the robot arm 100 is moved by the drive unit 204 to try to insert a connector. The control parameter adjustment unit 1401 checks whether the periodic control amount adjustment value has reached the control amount generated by the control parameter generation unit 202, and if it has not, the process returns to step S1504. Therefore, the operations from step S1504 to step S1511 can be repeated for each control cycle.
In addition, even when the male connector 110 and the female connector 120 abut before reaching the control amount generated by the control parameter generation unit 202, it is detected that the value of the force sensor 801 rises for each control cycle. Since the feedback control is performed as the periodic control amount adjustment value, the possibility of destroying the surrounding environment can be reduced even in the initial stage of learning.

Next, in step S1512, the evaluation unit 805 and the Critic unit 803 confirm whether the fitting is successful, and at the same time, based on the values of the monocular camera 102 and force sensor 801 stored in step S1503 and the control parameters calculated in S1504. The neural network parameters of the Actor unit 804 and the Critic unit 803 are updated.

If the fitting is not successful in step S1513, the position of the robot arm 100 is not moved, and the process returns to step S1503 to perform the next trial.
Although not shown in FIG. 15 before returning to step S1503, the evaluation method used in step S1108 and step S1109 of FIG. 11 of the second embodiment and in the second embodiment shown in FIG. By returning to step S1503, the same effect as that of the second embodiment can be obtained.

If the combination is successful in step S1513, the combination task itself ends. When learning of the Actor unit 804 and the Critic unit 803 is further continued, it is possible to increase the accuracy of learning by returning to step S1501 and retrying from gripping the male connector 110.

Although described based on the Actor-Critic model as a reinforcement learning module, other reinforcement learning models such as DDPG may be used.

Therefore, in the present embodiment, in the case where the alignment involving the insertion of two objects is included, the control amount to be inserted is indicated based on the image acquired from the imaging unit 201 and the value of the force sensor 801. Based on the path determination unit 802 which learns from the result for matching, the periodic control amount set for each control period to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor 801 Since the control parameter adjustment unit 1401 that outputs the periodic control amount adjustment value is provided, the robot arm 100 is operated by the control amount obtained by adding the trajectory generation control and the compliant motion control using the force sensor 801, and normal reinforcement learning The model requires trial and error until the learning converges, which may damage the environment. Even early it is possible to perform a safe trial.

In the present embodiment, only the functions of the control parameter adjustment unit 1401 have been described, but the functions of the control parameter adjustment unit 1401 may be added to the contents of the description in the second and third embodiments. The second and third embodiments can be operated, and the learning speed can be improved safely.

100: Robot arm,
101: gripping portion,
102: monocular camera 110: male connector 120: female connector 201: imaging unit 202: control parameter generation unit 203: control unit 204: drive unit 301: input / output interface 302: processor,
303: Memory,
304: control circuit,
305: Motor,
801: Force sensor 802: Path determination section 803: Critic section 804: Actor section 805: Evaluation section 806: Path setting section 1401: Control parameter adjustment section 1402: Trajectory generation section 1403: Coordinate conversion section 1404: Gravity correction section 1405: Compliant motion control unit 1406: combining unit

Claims

And a path determination unit for instructing a control amount to be inserted based on the image acquired from the imaging unit and the value of the force sensor when including alignment involving insertion of two objects, and learning from the result of alignment, and ,
Periodic control amount adjustment based on a periodic control amount set for each control period to reach the control amount, and a control amount adapted to an external force based on a value of the force sensor corresponding to the one control period A control parameter adjustment unit that outputs a value;
Position control device with.
The periodic control amount is set for each period in consideration of any of the maximum velocity, the maximum acceleration, and the maximum jerk to reach the control amount, and the control amount adapted to the external force is the force sense The position control device according to claim 1, wherein the position control device is determined according to a value obtained by excluding the gravity component from the value of the sensor.
A control unit for controlling a current or a voltage for controlling the positional relationship between the two objects using the periodic control amount adjustment value instructed in claim 1 and a positional relationship between the two objects. And a drive unit for moving one position of the positional relationship between the two objects using current or voltage, and the force sensor acquires a force applied when maintaining the positional relationship between the two objects, The position control device according to claim 1.
The path determination unit according to claim 1, when extracting from the insertion state, the path setting unit that instructs the movement amount to move on the path from the insertion state and its periphery, and the moved position data as the output layer An Actor unit for acquiring the value of the moved position and the value of the force sensor in order to learn the value of the moved force sensor as the input layer;
The position control device according to any one of claims 1 to 3, further comprising:
It has a single-eye camera for capturing and acquiring the two existing images.
The Actor unit acquires an image captured by the monocular camera at the moved position.
The position control device according to claim 4.
The position control device according to claim 4, wherein the Actor unit performs learning from the input layer and the output layer using an Actor-Critic model.
The Actor unit learns a plurality of neural networks, and in one of the plurality of neural networks, data of a position where the positional relationship of the two objects is inserted is used for learning, and the other data is the second data. The position control device according to claim 6, wherein data of a position where the positional relationship of two objects is not inserted is used for learning.
The Actor unit uses the value of the force sensor for data of the position where the positional relationship between the two objects is inserted, and the data of the position where the positional relationship between the two objects is not inserted, The position control device according to claim 7, wherein image data is used.
It is a position control method of two things,
When including alignment with insertion for two objects, output a control amount for insertion based on the acquired image and the value of the force sensor,
The periodic control amount adjustment value is output based on a periodic control amount that can be reached in one control cycle with respect to the control amount and a control amount adapted to an external force based on the value of the force sensor corresponding to the one control period. ,
Two-object position control method that learns from the result for alignment.