WO2019146007A1 - Position control device and position control method - Google Patents

Position control device and position control method Download PDF

Info

Publication number
WO2019146007A1
WO2019146007A1 PCT/JP2018/002053 JP2018002053W WO2019146007A1 WO 2019146007 A1 WO2019146007 A1 WO 2019146007A1 JP 2018002053 W JP2018002053 W JP 2018002053W WO 2019146007 A1 WO2019146007 A1 WO 2019146007A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
control
control amount
value
learning
Prior art date
Application number
PCT/JP2018/002053
Other languages
French (fr)
Japanese (ja)
Inventor
勇人 山中
高志 南本
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2018530627A priority Critical patent/JP6458912B1/en
Priority to PCT/JP2018/002053 priority patent/WO2019146007A1/en
Priority to TW107125131A priority patent/TW201932257A/en
Publication of WO2019146007A1 publication Critical patent/WO2019146007A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23PMETAL-WORKING NOT OTHERWISE PROVIDED FOR; COMBINED OPERATIONS; UNIVERSAL MACHINE TOOLS
    • B23P19/00Machines for simply fitting together or separating metal parts or objects, or metal and non-metal parts, whether or not involving some deformation; Tools or devices therefor so far as not provided for in other classes
    • B23P19/02Machines for simply fitting together or separating metal parts or objects, or metal and non-metal parts, whether or not involving some deformation; Tools or devices therefor so far as not provided for in other classes for connecting objects by press fit or for detaching same
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/10Programme-controlled manipulators characterised by positioning means for manipulator elements
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls

Definitions

  • the present invention relates to a position control device and a position control method.
  • teaching operations When constructing a production system that performs assembly operations with a robot arm, it is common to perform teaching operations by a human hand called teaching. However, in this teaching, since the robot repeatedly performs the operation only at the stored position, there may be cases where it can not be dealt with if an error occurs due to manufacturing or mounting. Therefore, if it is possible to develop a position correction technology that absorbs this individual error, it is possible to expect improvement in productivity and also increase the scene in which the robot plays an active part.
  • Patent Document 1 there is a technology for performing position correction just before the connector insertion operation using a camera image. Also, if a plurality of devices such as a force sensor, a stereo camera, etc. are used, it is possible to absorb an error in position related to assembly (insertion, work holding, etc.). However, in order to determine the position correction amount, it is necessary to explicitly calculate the amount of the center coordinates of the gripped connector and the center coordinates of the connector to be inserted as described in the reference from the image information. This calculation depends on the shape of the connector and must be set by the designer for each used connector. This calculation is also relatively easy if three-dimensional information can be acquired from a distance camera etc. However, it is necessary to develop an image processing algorithm for each connector in order to acquire from two-dimensional image information. It takes time.
  • the present invention has been made to solve the above-described problems, and it is possible to prevent excessive load on objects even if the function for learning and the function for position control of the robot are different. Intended to collect data.
  • the control amount for insertion is designated based on the image acquired from the imaging unit and the value of the force sensor, and the alignment is performed.
  • Cycle control based on the path determination unit that learns from the results for the above, the cycle control amount set for each control cycle to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor And a combining unit for outputting the amount adjustment value.
  • the present invention even if the function for learning and the function for position control of the robot are different, it is possible to collect learning data while preventing excessive load on objects.
  • FIG. 2 is a diagram in which a robot arm 100, a male connector 110, and a female connector 120 according to Embodiment 1 are arranged.
  • FIG. 2 is a functional configuration diagram of a position control device according to Embodiment 1.
  • FIG. 2 is a hardware configuration diagram of a position control device according to Embodiment 1.
  • 6 is a flowchart of position control of the position control device according to the first embodiment.
  • FIG. 8 shows an example of a diagram showing an insertion start position captured by the single-eye camera 102 according to Embodiment 1, and a camera image and a control amount near the periphery thereof.
  • FIG. 2 is a diagram showing an example of a neural network according to the first embodiment and a learning rule of the neural network.
  • FIG. 7 is a flowchart using a plurality of networks in the neural network in Embodiment 1.
  • FIG. 7 is a functional configuration diagram of a position control device in Embodiment 2.
  • FIG. 7 is a hardware configuration diagram of a position control device in a second embodiment.
  • FIG. 9 is a view showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment.
  • 10 is a flowchart of path learning of the position control device according to the second embodiment.
  • 15 is a flowchart of path learning in the position control device in Embodiment 3.
  • FIG. 16 is a diagram showing an example of a neural network according to a third embodiment and a learning rule of the neural network.
  • FIG. 14 is a functional configuration diagram of a position control device in a fourth embodiment.
  • FIG. 16 is a flowchart of path learning of the position control device in Embodiment 4.
  • FIG. 14 is a functional configuration diagram of a position control device in a fourth embodiment.
  • Embodiment 1 Hereinafter, embodiments of the present invention will be described.
  • FIG. 1 is a view in which a robot arm 100, a male side connector 110, and a female side connector 120 according to the first embodiment are arranged.
  • the robot arm 100 is provided with a gripping portion 101 for gripping the male connector 110, and the monocular camera 102 is attached to the robot arm 100 so that the gripping portion can be seen.
  • the position of this monocular camera 102 is such that when the grip portion 101 at the tip of the robot arm 100 grips the male connector 110, the tip of the griped male connector 110 and the female connector 120 on the inserted side can be seen Install in
  • FIG. 2 is a functional configuration diagram of the position control device in the first embodiment. 2, an imaging unit 201 for capturing an image, a control parameter generating unit 202 for generating a control amount of the position of the robot arm 100 using the captured image, and a position, which are functions of the single-eye camera 102 in FIG.
  • Control unit 203 that controls the current / voltage value of the drive unit 204 of the robot arm 100 using the control amount of <'> and changing the position of the robot arm 100 based on the current and voltage values output from the control unit 203
  • a drive unit 204 is configured.
  • the control parameter generation unit 202 is a function of the monocular camera 102, and when an image is acquired from the imaging unit 201 that captures an image, control on the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 is performed.
  • the quantities ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) are determined, and the control amount is output to the control unit 203.
  • the control unit 203 controls the drive unit 204 based on the received control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) for the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 received. Determine and control current and voltage values for each device to be configured.
  • the drive unit 204 moves the robot arm 100 to the position of (X + ⁇ X, Y + ⁇ Y, Z + ⁇ Z, Ax + ⁇ Ax, Ay + ⁇ Ay, Az + ⁇ Az) by operating with the current / voltage value for each device received from the control unit 203.
  • FIG. 3 is a hardware block diagram of the position control device in the first embodiment.
  • the monocular camera 102 is communicably connected to the processor 302 and the memory 303 via the input / output interface 301 regardless of wired or wireless communication.
  • the input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG.
  • the input / output interface 301 is communicably connected to the control circuit 304 corresponding to the control unit 203 regardless of wired or wireless communication.
  • the control circuit 304 is also electrically connected to the motor 305.
  • the motor 305 corresponds to the drive unit 204 in FIG. 2 and is configured as a component for controlling the position of each device.
  • the motor 305 is used as a form of hardware corresponding to the drive unit 204, any hardware capable of controlling the position may be used. Therefore, the monocular camera 201 and the input / output interface 301 and the input / output interface 301 and the control circuit 304 may be separately provided.
  • FIG. 4 is a flowchart of position control of the position control device according to the first embodiment.
  • the gripping unit 101 of the robot arm 100 grips the male connector 110.
  • the position and orientation of the male connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and operated based on a control program registered on the side of the control unit 203 in advance.
  • step S102 the robot arm 100 is brought close to the insertion position of the female connector 120.
  • the approximate position and posture of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated.
  • step S103 the control parameter generation unit 202 instructs the imaging unit 201 of the single eye camera 102 to take an image, and the single eye camera 103 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown.
  • step S104 the control parameter generation unit 202 acquires an image from the imaging unit 201, and determines control amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az).
  • the control parameter generation unit 202 uses the processor 302 and the memory 303 of FIG. 3 as hardware and calculates a control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) using a neural network. Do. The calculation method of the control amount using a neural network will be described later.
  • step S105 the control unit 203 acquires the control amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) output by the control parameter generation unit 202, and at the same time, determines the predetermined threshold and control amount. Compare all ingredients. If all the components of the control amount are equal to or less than the threshold value, the process proceeds to step S107, and the control unit 203 controls the drive unit 204 to insert the male connector 110 into the female connector 120. If any component of the control amount is larger than the threshold, the control unit 203 uses the control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) output by the control parameter generation unit 202 in step S106. Control the step 204 and return to step S103.
  • a method of calculating the control amount using the neural network in step S104 of FIG. 4 will be described.
  • a set of the image and the necessary movement amount in advance Collect
  • the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images.
  • FIG. 5 is an example of a diagram showing an insertion start position photographed by the single-eye camera 102 according to the first embodiment, a camera image in the vicinity of the position, and a control amount.
  • FIG. 6 is a diagram showing an example of a neural network in the first embodiment and a learning rule of the neural network.
  • the input layer receives an image (for example, luminance and color difference value of each pixel) obtained from the monocular camera 102, and the output layer outputs control amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az).
  • control amounts ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az.
  • optimization of the parameters of the intermediate layer is performed so that the output value of the output layer obtained from the input image through the intermediate layer approximates the control amount stored in the image set .
  • the male connector 110 is fixed in position with respect to the single-eye camera 102, and only the position of the female connector 120 is changed. However, the male side connector 110 is not gripped at the correct position, and there are cases where the position of the male side connector 110 is shifted due to individual differences and the like. Therefore, the male connector 110 is acquired by acquiring and learning a plurality of control amounts and images of the insertion start position and the position in the vicinity thereof when the male connector 110 deviates from the correct position in the process of learning. The learning which can respond to the individual difference of both of and female side connector 120 is performed.
  • control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) is calculated excluding the amount of movement from the fitted state position at the time of shooting to the insertion start position.
  • the movement amount from the position to the fitted state position needs to be separately stored for use in step S107 of FIG.
  • the control unit 203 needs to control the robot arm 100 after converting the coordinate system of the monocular camera if the coordinate system of the entire robot arm 100 is different. .
  • the monocular camera since the monocular camera is fixed to the robot arm 100, the coordinate system in which the female connector 120 is placed is different from the coordinate system of the monocular camera 102. Therefore, if the monocular camera 102 has the same coordinate system as the position of the female connector 120, the conversion from the coordinate system of the monocular camera 102 to the coordinate system of the robot arm 100 is unnecessary.
  • step S101 the robot arm 100 grips the male connector 110 according to the operation registered in advance in order to grip the male connector 110.
  • the female connector 120 is moved substantially upward.
  • the position immediately before gripping of the male connector 110 being gripped is not always constant. A slight error may always occur due to a slight movement deviation of a machine that sets the position of the male connector 110 or the like. Similarly, the female connector 120 may also have some errors.
  • step S103 an image captured by both the male connector 110 and the female connector 120 in the image captured by the imaging unit 201 of the single-lens camera 102 attached to the robot arm 100 is acquired. It is important to do. Since the position of the monocular camera 102 with respect to the robot arm 100 is always fixed, relative positional information between the male connector 110 and the female connector 120 is reflected in this image.
  • step S104 the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) is calculated by the control parameter generation unit 202 having a neural network as shown in FIG. 6 in which the relative position information is learned in advance. .
  • the control amount output from the control parameter generation unit 202 may not operate up to the insertion start position.
  • the control parameter generation unit 202 repeatedly calculates so that the loop of steps S103 to S106 is repeated a plurality of times so that the threshold does not exceed the threshold shown in step S105, and the control unit 203 and drive unit 204 control. There is also a case to control the position.
  • the threshold shown in S105 is determined by the required accuracy of the mating male connector 110 and female connector 120. For example, when the fitting with the connector is loose and the accuracy is not necessary so far as the characteristics of the connector, the threshold can be set large. In the opposite case, the threshold is set smaller. In general, in the case of a manufacturing process, it is also possible to use this value because an error that can be tolerated by manufacturing is often defined.
  • a plurality of insertion start positions may be set. If the insertion start position is set without taking a sufficient distance between the male connector 110 and the female connector 120, the male connector 110 and the female connector 120 abut each other before the insertion is started, and one of them is broken. There is also a risk of In that case, for example, the clearance between the male connector 110 and the female connector 120 is 5 mm at the beginning, 20 mm at the next, and 10 mm at the next, depending on the number of loops between step S103 and step S106 in FIG.
  • the insertion start position may be set.
  • the present embodiment has been described using the connector, the application of this technology is not limited to the fitting of the connector.
  • the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
  • the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount.
  • learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb individual differences when performing alignment between objects and objects.
  • the imaging unit 201 for capturing an image in which two things are present, and information of the two captured images of things are input to the input layer of the neural network, and the positional relationship between the two things is obtained.
  • a control parameter generation unit 202 that outputs a control amount of position for controlling as an output layer of a neural network, and a current or voltage for controlling the positional relationship between two things using the control amount of the output position Control unit 203 and a drive unit 204 for moving the position of one of the two objects using the current or voltage for controlling the position of the two objects.
  • FIG. 7 is a flowchart using a plurality of networks in the neural network in the first embodiment. It shows the detailed steps of step S104 in FIG. A plurality of parameters are included in the control parameter generator of FIG.
  • step S701 the control parameter generation unit 202 selects which network to use based on the input image. If the loop count is the first or the obtained control amount is 25 mm or more, the neural network 1 is selected and the process proceeds to step S702. If the control amount obtained in the second and subsequent loop times is 5 mm or more and less than 25 mm, the neural network 2 is selected, and the process proceeds to step S703. Furthermore, if the control amount obtained in the second and subsequent loop times is less than 5 mm, the neural network 3 is selected and the process proceeds to step S704. The control amount is calculated using the neural network selected in steps S702 to S704.
  • each neural network is learned according to the distance or control amount between the male connector 110 and the female connector 120, and the neural network 3 in the figure has learning data with an error of ⁇ 1 mm, ⁇ 1 degree
  • the neural network 2 changes the range of data in which learning data in the range of ⁇ 1 to ⁇ 10 mm, ⁇ 1 to ⁇ 5 degrees is learned stepwise.
  • the number of networks is not particularly limited. In the case of using such a scheme, it is necessary to prepare the determination function of step S 701 for determining which network to use as the “network selection switch”.
  • the network selection switch can also be configured as a neural network.
  • the input image to the input layer and the output of the output layer are network numbers.
  • the image data uses image / network number pairs used in all networks.
  • the application of this technology is not limited to the fitting of the connector.
  • the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
  • the example using a plurality of neural networks is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount.
  • learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
  • an imaging unit 201 for capturing an image in which two things are present, and information on images of the two captured things are input to an input layer of a neural network to control the positional relationship between the two things.
  • a control parameter generation unit 202 which outputs a control amount as an output layer of a neural network, and a control unit 203 which controls a current or a voltage for controlling the positional relationship between two things using the output control amount of the position;
  • the control parameter generation unit 202 includes a drive unit 204 for moving the position of one of the two objects by using a current or voltage for controlling the position of the two objects, and the control parameter generation unit 202 In order to perform alignment even if there is an individual difference between the individual objects or an error in the positional relationship between the two objects. There is an effect that accuracy can be performed well.
  • the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images.
  • the fitting position of the male connector 110 and the female connector 120 is unknown will be described.
  • a method called reinforcement learning has been studied as a previous study of a method in which a robot learns by itself and acquires appropriate behavior.
  • the robot performs various motions by trial and error, and as a result optimizes the behavior while memorizing the behavior that produced a good result, but a large number of trials is required to optimize the behavior
  • a framework called "on policy" is commonly used in reinforcement learning.
  • it is difficult to devise various devices specialized for the robot arm and control signals, which is difficult and has not been put to practical use.
  • the robot as in the first embodiment performs various operations in a trial-and-error manner, and as a result stores the behavior that has produced a good result, while reducing the number of trials for optimizing the behavior as a result
  • the form which can be done is explained.
  • the entire hardware configuration is the same as that of FIG. 1 of the first embodiment, but a force sensor 801 (not shown in FIG. 1) for measuring the load applied to the grip unit 101 is added to the robot arm 100 It differs in that it is done.
  • FIG. 8 shows a functional block diagram of the position control device in the second embodiment.
  • a force sensor 801 and a route determination unit 802 are added, and the route determination unit 802 is configured of a Critic unit 803, an Actor unit 804, an evaluation unit 805, and a route setting unit 806.
  • FIG. 9 is a hardware block diagram of the position control device in the second embodiment.
  • the force sensor 801 is electrically or communicably connected to the input / output interface 301.
  • the input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG. 8 and also configure the function of the path determination unit 802. Therefore, the force sensor 801, the monocular camera 201, and the input / output interface 301, and the input / output interface 301 and the control circuit 304 may be separately provided.
  • the force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is.
  • S3 and S4 are the same as the Critic unit and the Actor unit in the conventional reinforcement learning.
  • the conventional reinforcement learning method will be described.
  • a model called Actor-Critic model is also used among reinforcement learning (Reference: reinforcement learning: RSSutton and AGBarto, published in December 2000).
  • the Actor unit 804 and the Critic unit 803 acquire the state of the environment through the imaging unit 201 and the force sensor 801.
  • the Actor unit 804 is a function that receives the environmental condition I acquired using the sensor device and outputs the control amount A to the robot controller.
  • the Critic unit 803 is a mechanism for the Actor unit 804 to appropriately learn the output A with respect to the input I so that the fitting to the Actor unit 804 is properly and successfully achieved.
  • the method of the conventional reinforcement learning method will be described.
  • an amount called a reward R is defined, and the Actor unit 804 can acquire an action A that maximizes R.
  • X, Y, Z indicate position coordinates with the central portion of the robot as the origin
  • Ax, Ay, Az indicate the amounts of rotation about the X axis, Y axis, and Z axis, respectively.
  • the movement correction amount is a control amount from the fitting start position for the first attempt of fitting the male connector 110 from the current point.
  • the observation of the environmental condition, that is, the trial result is obtained from the image from the imaging unit 201 and the value of the force sensor 801.
  • a Critic unit 803 learns a function called a state value function V (I).
  • V (I) a state value function
  • action A (1) is taken in state I (1)
  • R (2) the result of the first fitting trial
  • is a prediction error
  • is a learning coefficient
  • is a discount rate
  • is a discount rate
  • indicates the value of the standard deviation of the output
  • Actor adds a random number having a distribution with an average of 0 and a variance of ⁇ 2 to A (I) in state I. That is, regardless of the result of the trial, the second movement correction amount is determined randomly.
  • the above-mentioned update formula is used as an example, the Actor-Critic model also has various update formulas, and any model that is generally used regardless of the above can be changed.
  • the Actor unit 804 learns the appropriate action in each state with the above configuration
  • the action according to the first embodiment is when learning is completed.
  • the evaluation unit 805 generates a function that performs evaluation at each fitting trial.
  • FIG. 10 is a diagram showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment. For example, it is assumed that an image as shown in FIG. 10A is obtained as a result of the trial. In this trial, the fitting position of the connector is largely misaligned and fails. At this time, how close to success is measured and quantified to obtain an evaluation value indicating the degree of success.
  • a method of digitization for example, as shown in FIG. 10B, there is a method of calculating a connector surface area (number of pixels) on the insertion side in an image.
  • the route setting unit 806 is divided into two steps as processing.
  • the evaluation result processed by the evaluation unit 805 and the motion that the robot has moved to practice are learned.
  • the path setting unit 806 prepares and approximates a function having A as an input and E as an output.
  • RBF Random Basis Function
  • RBF is known as a function that can easily approximate various unknown functions. For example, the kth input
  • is the standard deviation
  • is the center of RBF.
  • the minimum value is determined by the above RBF network by a general optimization method such as the steepest descent method or PSO (Particle Swarm Optimization).
  • This minimum value is input to the next Actor unit 804 as the next recommended value.
  • the surface area and the number of pixels in the two-dimensional direction with respect to the movement correction amount at the time of failure are arranged as time series for each trial number as evaluation values and the optimum solution is determined using the values of the order. It is a thing.
  • the movement correction amount moved at a constant rate in the direction of decreasing the number of pixels in the two-dimensional direction may be determined more simply.
  • FIG. 11 is a flowchart of path learning of the position control device according to the second embodiment.
  • step S ⁇ b> 1101 the gripping unit 101 of the robot arm 100 grips the male connector 110.
  • the position and orientation of the male connector 110 are registered in advance on the control unit 203 side of FIG. 8 and operated based on a control program registered on the control unit 203 side in advance.
  • step S1102 the robot arm 100 is brought close to the insertion position of the female connector 120.
  • the approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 8, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.
  • step S1103 the path determination unit 802 instructs the imaging unit 201 of the single-eye camera 102 to capture an image, and the single-eye camera 102 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown. Further, the path determination unit 802 instructs the control unit 203 and the single-eye camera 102 to capture an image near the current position, and the movement unit 204 moves the drive unit 204 based on the plurality of movement values instructed to the control unit 203. In this position, the single-eye camera captures an image in which both the male connector 110 and the female connector 120 to be inserted appear.
  • step S1104 the Actor unit 804 of the path determination unit 802 gives a control amount for fitting to the control unit 203 and causes the drive unit 204 to move the robot arm 100, and the male side connector 110; The fitting of the female connector 120 to be inserted is tried.
  • step S1105 when the connectors are in contact with each other while the robot arm 100 is being moved by the drive unit 204, the value of the force sensor 801 and the image from the monocular camera 102 are determined for each unit amount of movement amount The evaluation unit 805 and the Critic unit 803 of 802 store them.
  • step S1106 the evaluation unit 805 and the Critic unit 803 confirm whether the fitting is successful. Usually, the fit is not successful at this point. Therefore, in step S1108, the evaluation unit 805 evaluates the degree of success according to the method described with reference to FIG. 10, and provides the path setting unit 806 with an evaluation value indicating the degree of success for alignment. Then, in step S1109, the route setting unit 806 performs learning using the above-described method, and the route setting unit 806 gives the next recommended value to the Actor unit 804, and the Critic unit 803 obtains it according to the amount of reward. The Actor unit 804 outputs the received value.
  • step S 1110 the Actor unit 804 adds the value obtained according to the reward amount output from the Critic unit 803 and the next recommended value output from the route setting unit 806 to obtain a movement correction amount.
  • the Actor unit 804 sets an addition ratio of the value obtained according to the reward amount output by the Critic unit 803 and the next recommended value output by the route setting unit 806, and adds the ratio. It may be changed according to
  • step S1111 the Actor unit 804 gives the movement correction amount to the control unit 203 to move the gripping unit 101 of the robot arm 100. Thereafter, the process returns to step 1103 again, and the image is photographed at the position moved by the movement correction amount, and the fitting operation is performed. Repeat this until it succeeds. If the fitting is successful, in step S1107, after the fitting is successful, learning of the Actor unit 804 and the Critic unit 803 is performed for I from steps S1102 to S1106 when the fitting is successful. Finally, the path determination unit 802 supplies the learned data of the neural network to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.
  • the learning of the Actor unit 804 and the Critic unit 803 is performed for I when the fitting is successful, but the Actor unit 804 and the Critic are obtained using data of all trials from the disclosure of the fitting trial to the success.
  • the unit 803 may learn. In that case, although the case where a plurality of neural networks are formed according to the control amount is described in Embodiment 1, if the position of the success of the fitting is known, control is performed using the distance to the success of the fitting. It is possible to simultaneously form a plurality of neural networks suitable for the magnitude of the quantity.
  • the application of this technology is not limited to the fitting of the connector.
  • the present invention can be applied to the case of mounting an IC on a substrate, and in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method can be used to produce an effect.
  • the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount.
  • learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
  • the Actor unit 804 uses the value obtained by the Critic unit 803 according to the amount of reward, and the route setting unit 806 uses the evaluation value. Based on the recommended value obtained based on the above to obtain the movement correction amount for trial, the normal Actor-Critic model requires a large number of trial and error numbers until alignment is successful. The invention makes it possible to significantly reduce the number of alignment trials.
  • the number of alignment trials is reduced by evaluating the image from the imaging unit 201 at the time of alignment failure, but the value of the force sensor 801 at the time of alignment trial is described. Can also reduce the number of trials. For example, in alignment including fitting of a connector or insertion of two things, in the case of failure, when the value of the force sensor 801 exceeds a certain threshold, the positions of two things are completely fitted or inserted. In general, the Actor unit 804 determines whether or not the position is present. In that case, a. If it is in the process of fitting or inserting when the threshold is reached b. It is also conceivable that the value of the force sensor 801 which has been fitted and inserted but has been fitted or inserted shows a certain value. a.
  • FIG. 12 shows a flowchart in path learning of the position control device in the third embodiment.
  • the variable i is the number of times of learning of the robot arm 100
  • the variable k is the number of times of learning from when the male connector 110 and the female connector 120 are disengaged
  • the variable j is the number of loops in the flowchart of FIG. It is.
  • step S1202 the path setting unit 806 gives the movement amount to the control unit 203 via the Actor unit 804 so as to return 1 mm from the movement amount given to perform the fitting in FIG.
  • the robot arm 100 is moved by the drive unit 204.
  • 1 is added to the variable i.
  • an instruction to return 1 mm from the movement amount is given.
  • the instruction is not necessarily limited to 1 mm, and a unit amount such as 0.5 mm or 2 mm may be used.
  • step S 1204 the route setting unit 806 randomly determines control amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) centering on O (i), and controls the control unit 203 via the Actor unit 804. The amount is given, and the robot arm 100 is moved by the drive unit 204. At this time, the maximum amount of this control amount can be set arbitrarily within the range in which movement is possible.
  • step S1205 at the position after movement in step S1204, the Actor unit 804 collects the values of the force sensor 801 corresponding to the movement amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az), and
  • the Critic part 803 and the Actor part 804 take the movement amount to hold the male side connector 110 by multiplying the movement amount by -1 (-.DELTA.X, -.DELTA.Y, -.DELTA.Z, -.DELTA.Ax, -.DELTA.Ay, -.DELTA.Az)
  • the sensor value of the force sensor 801 that measures the force is recorded as learning data.
  • step S1207 the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1208, and the process returns to step S1204 to change the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) by random numbers and acquire data, The steps S1204 to S1207 are repeated until the individual data are accumulated. When the specified number of data are accumulated, the route setting unit 806 sets the variable j to 1 in step S1209, and then confirms whether the male connector 110 and the female connector 120 are disengaged in step S1210. Do.
  • step S1211 the route setting unit 806 gives a control amount to the control unit 203 via the Actor unit 804 so as to return the coordinates of the robot arm 100 to the coordinates O (i) before giving the control amount. To move the robot arm 100. Thereafter, the loop from step S1202 to step S1210 is returned by 1 mm or a unit amount from the control amount given to perform fitting until the fitting between the male connector 110 and the female connector 120 is released. A process of giving a control amount centering on the position and collecting data of the force sensor 801 is repeated. If the male connector 110 and the female connector 120 are disengaged from each other, the process proceeds to step S1212.
  • step S1212 the route setting unit 806 sets the variable i to I (I is an integer larger than the value of i when it is determined that the male connector 110 and the female connector 120 are disengaged from each other).
  • a control amount is given to the control unit 203 via the Actor unit 804 so as to return, for example, 10 mm (this may also be another value) from the movement amount given to perform fitting, and the drive unit 204 Move it.
  • step S1213 the path setting unit 806 stores the position of the coordinates of the robot arm 100 moved in step S1212 as the central position O (i + k).
  • step S 1214 the route setting unit 806 randomly determines again the control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) again centering on the center position O (i + k), A control amount is given to the control unit 203 via the Actor unit 804, and the robot arm 100 is moved by the drive unit 204.
  • step S 1215 the Critic unit 803 and the Actor unit 804 are images captured by the imaging unit 201 of the monocular camera 102 at the robot arm 100 position after movement by the control amounts ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az)
  • step S 1216 the Critic unit 803 and the Actor unit 804 record the image as one learning data as ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) in which the movement amount is multiplied by ⁇ 1. .
  • step S1217 the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1212, and the process returns to step S1214 to change the control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) by random numbers and acquire data, and the specified number J S1214 to S1217 are repeated until individual data are accumulated.
  • the maximum value of the control amount ( ⁇ X, ⁇ Y, ⁇ Z, ⁇ Ax, ⁇ Ay, ⁇ Az) in S 1204 and the random value of the control amount in S 1204 can take different values.
  • the learning data acquired by the above method performs learning of the Actor unit 804 and the Critic unit 803.
  • the robot arm 100 is slightly moved from the movement for fitting the male connector 110 and the female connector 120 while the robot arm 100 is slightly moved to the periphery for learning.
  • the explanation has been made on the assumption that sufficient learning can not be performed depending on the pixel amount of the image of the single-eye camera 102 until the difference occurs.
  • learning may be performed only with the image of the single-eye camera 102. Even when the male connector 110 and the female connector 120 are fitted, both the image of the monocular camera 102 and the value of the force sensor 801 may be used.
  • the neural network is used in a state in which the male connector 110 and the female connector 120 are fitted and in the case where the male connector 110 and the female connector 120 are not fitted. You may distinguish.
  • the input layer is formed by only the image
  • learning can be performed with higher accuracy, and even when learning is performed using only images, it is possible to perform accurate learning because the configuration of an image is different by distinguishing between cases where fitting is performed and cases where fitting is not performed.
  • the path setting unit 806 for instructing the control amount, the output layer of the moved position, the value of the force sensor 801 at the moved position as the input layer, and the moved position and the value of the force sensor 801
  • the acquisition of the Actor unit 804 enables efficient collection of learning data.
  • FIG. 14 shows a functional configuration diagram of the position control device in the fourth embodiment.
  • the difference from FIG. 8 is that a control parameter adjustment unit 1401 is added, and the control parameter adjustment unit 1401 includes a trajectory generation unit 1402, a coordinate conversion unit 1403, a gravity correction unit 1404, a compliant motion control unit 1405, and a combination.
  • a section 1406 is composed.
  • the force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is.
  • an excessive force is applied to the surrounding environment due to the operation output at the initial stage of learning, which may damage the surrounding environment such as the robot arm 100 and the male connector 110 and the female connector 120.
  • the robot arm 100 or the male-side connector can be operated by disposing the compliant motion control unit 1405 at the subsequent stage of the control parameter generation unit 202 and operating according to the external force acquired by the force sensor 801. It is prevented that excessive force is applied to the surrounding environment such as 110 and the female side connector 120.
  • the trajectory generation unit 1402 calculates the periodic control amount in control cycle units so as not to exceed at least one of these limitations, given the control period of the robot arm 100 and the maximum velocity, maximum acceleration, and maximum jerk. .
  • non-patent documents KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator.
  • ICRA Robotics and Automation
  • the following constants are given constants corresponding to the specifications of the robot arm 100.
  • xi, vi, ⁇ i and ji are variables representing the following.
  • step S1502 the robot arm 100 is brought close to the insertion position of the female connector 120.
  • the approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 14, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.

Abstract

The present invention is provided with: a path determination unit 802 which, when alignment for two objects involving insertion is included, instructs a control amount for the insertion on the basis of an image acquired from an image capturing unit 201 and a value from a force sensor 801, and learns from a result of the alignment; and a control parameter adjustment unit 1401 which outputs a period control amount adjustment value on the basis of a period control amount set for each control period so as to reach the control amount, and a control amount adapted to an external force based on the value of the force sensor 801, wherein a robot arm 100 is operated by a control amount in which a trajectory generation control and a compliant motion control obtained by using the force sensor 801 are added. It is possible to safely make an attempt even in an initial stage of learning according to the present invention, while trial and error is needed until learning ends and the environment can be destroyed in a typical reinforcement learning model.

Description

位置制御装置及び位置制御方法Position control device and position control method
 この発明は位置制御装置及び位置制御方法に関するものである。 The present invention relates to a position control device and a position control method.
 ロボットアームで組立動作を行う生産システムを構築する際には、ティーチングと呼ばれる人の手による教示作業を行うのが一般的である。しかし、このティーチングにおいてロボットは記憶された位置のみに対して動作を繰り返し行うため、製作や取付による誤差が発生する場合には、対応できない場合もある。そのため、この個体誤差を吸収するような位置補正技術が開発することが可能であれば、生産性の向上が期待できる上、ロボットの活躍する場面も大きくなる。 When constructing a production system that performs assembly operations with a robot arm, it is common to perform teaching operations by a human hand called teaching. However, in this teaching, since the robot repeatedly performs the operation only at the stored position, there may be cases where it can not be dealt with if an error occurs due to manufacturing or mounting. Therefore, if it is possible to develop a position correction technology that absorbs this individual error, it is possible to expect improvement in productivity and also increase the scene in which the robot plays an active part.
 現在の技術においても、カメラ画像を用いてコネクタ挿入作業の直前までの位置補正を行う技術は存在する(特許文献1)。また、力覚センサ、ステレオカメラ、等複数のデバイスを用いれば組立(挿入、ワーク保持等)に関する位置の誤差を吸収することはできる。しかし、位置補正量を決定するために、同参考文献のように把持したコネクタの中心座標、挿入する側のコネクタの中心座標などの量を明示的に画像情報から計算する必要がある。この計算はコネクタの形状に依存し、使用コネクタごとに設計者が設定しなければならない。また、3次元情報が距離カメラなどから取得できればこの計算も比較的容易であるが、2次元画像情報から取得するためにはコネクタ毎に画像処理アルゴリズムを開発する必要があるため、多くの設計コストがかかってしまう。 Also in the present technology, there is a technology for performing position correction just before the connector insertion operation using a camera image (Patent Document 1). Also, if a plurality of devices such as a force sensor, a stereo camera, etc. are used, it is possible to absorb an error in position related to assembly (insertion, work holding, etc.). However, in order to determine the position correction amount, it is necessary to explicitly calculate the amount of the center coordinates of the gripped connector and the center coordinates of the connector to be inserted as described in the reference from the image information. This calculation depends on the shape of the connector and must be set by the designer for each used connector. This calculation is also relatively easy if three-dimensional information can be acquired from a distance camera etc. However, it is necessary to develop an image processing algorithm for each connector in order to acquire from two-dimensional image information. It takes time.
 また、ロボットが自ら学習し適切な行動を獲得する手法として、深層学習や深層強化学習と呼ばれる手法が存在する。しかし、これらの学習によって適切な行動を獲得するためには、通常、大量の適切な学習データを収集する必要がある。また、強化学習などの手法を用いてデータを収集する場合、何度も繰り返し同じシーンを体験する必要があり、膨大な試行数が必要な上、未体験なシーンに対しては性能が保証できない。そのため、さまざまなシーンの学習データを万遍なく集める必要があり、多くの手間がかかる。
 例えば、特許文献2のように一回の成功試行で最適経路を求めるような手法も存在するが、深層学習や深層強化学習に使えるデータを集めることは出来ず、相当回数分の学習を行う必要がある。
In addition, there are methods called deep learning and deep reinforcement learning as methods by which the robot learns itself and acquires appropriate actions. However, in order to acquire appropriate behavior by such learning, it is usually necessary to collect a large amount of appropriate learning data. In addition, when collecting data using techniques such as reinforcement learning, it is necessary to experience the same scene over and over again, which requires a large number of trials, and performance can not be guaranteed for unexperienced scenes . Therefore, it is necessary to gather learning data of various scenes uniformly, and it takes a lot of time.
For example, there is also a method of obtaining an optimum route by one success trial as in Patent Document 2, but data that can be used for deep learning or deep reinforcement learning can not be collected, and it is necessary to perform learning for a considerable number of times. There is.
WO98-017444号公報WO 98-017444 特開2005-125475号公報JP 2005-125475 A
 二つのモノについて挿入を伴う位置合わせをおこなう場合、学習するために位置を指示する機能とロボットの位置制御するためにサーボモータ等を動作させる機能は通常は独立に存在する。したがってモノに与える荷重を考慮していないため、学習のために与える変位量によっては、モノに過大な荷重を与えてしまうという課題があった。 When performing alignment with insertion for two objects, the function of specifying the position to learn and the function of operating a servomotor or the like to control the position of the robot usually exist independently. Accordingly, since the load given to the object is not considered, there is a problem that an excessive load is given to the object depending on the displacement amount given for learning.
 本発明は上記の課題を解決するためになされたものであって、学習するための機能とロボットの位置制御するための機能が別であってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することを目的とする。 The present invention has been made to solve the above-described problems, and it is possible to prevent excessive load on objects even if the function for learning and the function for position control of the robot are different. Intended to collect data.
 この発明に係る位置制御装置は、二つのモノについて挿入を伴う位置合わせを含む場合、撮像部から取得された画像と力覚センサの値に基づいて挿入するための制御量を指示するとともに位置合わせに対する結果から学習する経路決定部と、制御量に到達するために一制御周期ごとに設定される周期制御量と、力覚センサの値に基づく外力に適応した制御量と、に基づいて周期制御量調整値を出力する合成部と、を備えた。 When the position control device according to the present invention includes alignment with insertion for two objects, the control amount for insertion is designated based on the image acquired from the imaging unit and the value of the force sensor, and the alignment is performed. Cycle control based on the path determination unit that learns from the results for the above, the cycle control amount set for each control cycle to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor And a combining unit for outputting the amount adjustment value.
 この発明によれば、学習するための機能とロボットの位置制御するための機能が別であってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することができる。 According to the present invention, even if the function for learning and the function for position control of the robot are different, it is possible to collect learning data while preventing excessive load on objects.
実施の形態1におけるロボットアーム100とオス側コネクタ110、メス側コネクタ120が配置された図。FIG. 2 is a diagram in which a robot arm 100, a male connector 110, and a female connector 120 according to Embodiment 1 are arranged. 実施の形態1における位置制御装置の機能構成図。FIG. 2 is a functional configuration diagram of a position control device according to Embodiment 1. 実施の形態1における位置制御装置のハードウエア構成図。FIG. 2 is a hardware configuration diagram of a position control device according to Embodiment 1. 実施の形態1における位置制御装置の位置制御におけるフローチャート。6 is a flowchart of position control of the position control device according to the first embodiment. 実施の形態1における単眼カメラ102が撮影した挿入開始位置とその周辺付近でのカメラ画像と制御量を示す図の例。FIG. 8 shows an example of a diagram showing an insertion start position captured by the single-eye camera 102 according to Embodiment 1, and a camera image and a control amount near the periphery thereof. 実施の形態1におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図。FIG. 2 is a diagram showing an example of a neural network according to the first embodiment and a learning rule of the neural network. 実施の形態1におけるニューラルネットワークにおいて、複数のネットワークをもちいたフローチャート。7 is a flowchart using a plurality of networks in the neural network in Embodiment 1. 実施の形態2における位置制御装置の機能構成図。FIG. 7 is a functional configuration diagram of a position control device in Embodiment 2. 実施の形態2における位置制御装置のハードウエア構成図。FIG. 7 is a hardware configuration diagram of a position control device in a second embodiment. 実施の形態2におけるオス側コネクタ110とメス側コネクタ120との嵌合の試行の様子を示す図。FIG. 9 is a view showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment. 実施の形態2における位置制御装置の経路学習におけるフローチャート。10 is a flowchart of path learning of the position control device according to the second embodiment. 実施の形態3における位置制御装置の経路学習におけるフローチャート。15 is a flowchart of path learning in the position control device in Embodiment 3. 実施の形態3におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図。FIG. 16 is a diagram showing an example of a neural network according to a third embodiment and a learning rule of the neural network. 実施の形態4における位置制御装置の機能構成図。FIG. 14 is a functional configuration diagram of a position control device in a fourth embodiment. 実施の形態4における位置制御装置の経路学習におけるフローチャート。FIG. 16 is a flowchart of path learning of the position control device in Embodiment 4. FIG.
実施の形態1.
 以下、この発明の実施の形態について説明する。
Embodiment 1
Hereinafter, embodiments of the present invention will be described.
 実施の形態1においては、各コネクタの挿入位置を学習し、生産ラインで組み立てを行うロボットアームとその位置制御方法について説明する。 In the first embodiment, a robot arm that learns the insertion position of each connector and performs assembly on a production line and a position control method thereof will be described.
 構成を説明する。図1は、実施の形態1におけるロボットアーム100とオス側コネクタ110、メス側コネクタ120が配置された図である。ロボットアーム100にはオス側コネクタ110を把持する把持部101が備えられてあり、この把持部を見えるような位置に単眼カメラ102がロボットアーム100に取り付けてある。この単眼カメラ102位置は、ロボットアーム100の先端の把持部101がオス側コネクタ110を把持した際に、把持されたオス側コネクタ110の先端部と挿入される側のメス側コネクタ120が見えるように設置する。 The configuration will be described. FIG. 1 is a view in which a robot arm 100, a male side connector 110, and a female side connector 120 according to the first embodiment are arranged. The robot arm 100 is provided with a gripping portion 101 for gripping the male connector 110, and the monocular camera 102 is attached to the robot arm 100 so that the gripping portion can be seen. The position of this monocular camera 102 is such that when the grip portion 101 at the tip of the robot arm 100 grips the male connector 110, the tip of the griped male connector 110 and the female connector 120 on the inserted side can be seen Install in
 図2は、実施の形態1における位置制御装置の機能構成図である。
 図2において、図1における単眼カメラ102の機能であり、画像を撮影する撮像部201と、撮像された画像を用いてロボットアーム100の位置の制御量を生成する制御パラメータ生成部202と、位置の制御量を用いてロボットアーム100の駆動部204に対し、電流・電圧値を制御する制御部203と、制御部203から出力された電流・電圧値に基づいてロボットアーム100の位置を変更する駆動部204から構成されている。
FIG. 2 is a functional configuration diagram of the position control device in the first embodiment.
2, an imaging unit 201 for capturing an image, a control parameter generating unit 202 for generating a control amount of the position of the robot arm 100 using the captured image, and a position, which are functions of the single-eye camera 102 in FIG. Control unit 203 that controls the current / voltage value of the drive unit 204 of the robot arm 100 using the control amount of <'> and changing the position of the robot arm 100 based on the current and voltage values output from the control unit 203 A drive unit 204 is configured.
 制御パラメータ生成部202は、単眼カメラ102の機能であり、画像を撮影する撮像部201から画像を取得すると、ロボットアーム100の位置(X、Y、Z、Ax、Ay、Az)の値に対する制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を決定し、制御部203に制御量を出力する。(X,Y,Zはロボットアームの位置、Ax、Ay、Azは、ロボットアーム100の姿勢角度)
 制御部203は、受け取ったロボットアーム100の位置(X、Y、Z、Ax、Ay、Az)の値に対する制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)に基づいて駆動部204を構成する各デバイスに対する電流・電圧値を決定し制御する。
 駆動部204は、制御部203から受けた各デバイスに対する電流・電圧値で動作することで、ロボットアーム100が(X+ΔX、Y+ΔY、Z+ΔZ、Ax+ΔAx、Ay+ΔAy、Az+ΔAz)の位置まで移動する。
The control parameter generation unit 202 is a function of the monocular camera 102, and when an image is acquired from the imaging unit 201 that captures an image, control on the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 is performed. The quantities (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are determined, and the control amount is output to the control unit 203. (X, Y, Z are positions of robot arm, Ax, Ay, Az are attitude angles of robot arm 100)
The control unit 203 controls the drive unit 204 based on the received control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) for the value of the position (X, Y, Z, Ax, Ay, Az) of the robot arm 100 received. Determine and control current and voltage values for each device to be configured.
The drive unit 204 moves the robot arm 100 to the position of (X + ΔX, Y + ΔY, Z + ΔZ, Ax + ΔAx, Ay + ΔAy, Az + ΔAz) by operating with the current / voltage value for each device received from the control unit 203.
 図3は、実施の形態1における位置制御装置のハードウエア構成図である。
 単眼カメラ102は、入出力インターフェース301を経由してプロセッサ302、メモリ303に、有線無線に関わらず通信可能に接続される。入出力インターフェース301、プロセッサ302、メモリ303で図2における制御パラメータ生成部202の機能を構成する。入出力インターフェース301はまた、制御部203に対応する制御回路304と有線無線に関わらず通信可能に接続される。制御回路304はまた、電気的にモータ305と接続される。モータ305は、図2における駆動部204に対応し、各デバイスの位置を制御するための部品として構成される。尚、本実施の形態において、駆動部204に対応するハードウエアの形態としてモータ305としたが、位置を制御できるハードウエアであればよい。したがって、単眼カメラ201と入出力インターフェース301間、入出力インターフェース301と制御回路間304間は別体として構成されていてもよい。
FIG. 3 is a hardware block diagram of the position control device in the first embodiment.
The monocular camera 102 is communicably connected to the processor 302 and the memory 303 via the input / output interface 301 regardless of wired or wireless communication. The input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG. The input / output interface 301 is communicably connected to the control circuit 304 corresponding to the control unit 203 regardless of wired or wireless communication. The control circuit 304 is also electrically connected to the motor 305. The motor 305 corresponds to the drive unit 204 in FIG. 2 and is configured as a component for controlling the position of each device. In the present embodiment, although the motor 305 is used as a form of hardware corresponding to the drive unit 204, any hardware capable of controlling the position may be used. Therefore, the monocular camera 201 and the input / output interface 301 and the input / output interface 301 and the control circuit 304 may be separately provided.
 次に動作について説明する。
 図4は、実施の形態1における位置制御装置の位置制御におけるフローチャートである。
 まず、ステップS101において、ロボットアーム100の把持部101は、オス側コネクタ110を把持する。このオス側コネクタ110の位置や姿勢は図2の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいて動作される。
Next, the operation will be described.
FIG. 4 is a flowchart of position control of the position control device according to the first embodiment.
First, in step S101, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and operated based on a control program registered on the side of the control unit 203 in advance.
 次に、ステップS102において、ロボットアーム100をメス側コネクタ120の挿入位置近辺まで近づける。このメス側コネクタ110のおおよその位置や姿勢は、図2の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいてオス側コネクタ110の位置が、動作される。
 次に、ステップS103において、制御パラメータ生成部202は単眼カメラ102の撮像部201に対し、画像を撮像するよう指示し、単眼カメラ103は、把持部101が把持しているオス側コネクタ110と、挿入先となるメス側コネクタ120とが両方映っている画像を撮像する。
Next, in step S102, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and posture of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 2, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated.
Next, in step S103, the control parameter generation unit 202 instructs the imaging unit 201 of the single eye camera 102 to take an image, and the single eye camera 103 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown.
 次に、ステップS104において、制御パラメータ生成部202は、撮像部201から画像を取得し、制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を決定する。この制御量の決定ついては、制御パラメータ生成部202は、図3のプロセッサ302、メモリ303をハードとして用いるとともに、ニューラルネットワークを用いて制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を計算する。ニューラルネットワークを用いた制御量の計算方法は後述する。 Next, in step S104, the control parameter generation unit 202 acquires an image from the imaging unit 201, and determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). With regard to the determination of the control amount, the control parameter generation unit 202 uses the processor 302 and the memory 303 of FIG. 3 as hardware and calculates a control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) using a neural network. Do. The calculation method of the control amount using a neural network will be described later.
 次に、ステップS105において、制御部203は、制御パラメータ生成部202が出力した制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を取得するとともに、予め決めておいた閾値と制御量のすべての成分を比較する。制御量のすべての成分が閾値以下であれば、ステップS107へ進み、制御部203は、オス側コネクタ110をメス側コネクタ120へ挿入するよう駆動部204を制御する。
 制御量のいずれかの成分が閾値より大きければ、ステップS106において、制御部203は、制御パラメータ生成部202が出力した制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を用いて駆動部204を制御し、ステップS103へ戻る。
Next, in step S105, the control unit 203 acquires the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output by the control parameter generation unit 202, and at the same time, determines the predetermined threshold and control amount. Compare all ingredients. If all the components of the control amount are equal to or less than the threshold value, the process proceeds to step S107, and the control unit 203 controls the drive unit 204 to insert the male connector 110 into the female connector 120.
If any component of the control amount is larger than the threshold, the control unit 203 uses the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output by the control parameter generation unit 202 in step S106. Control the step 204 and return to step S103.
 次に図4のステップS104でのニューラルネットワークを用いた制御量の計算方法について説明する。
 ニューラルネットワークを用いた制御量の計算を行う前に、事前準備として、ニューラルネットワークよって入力画像から嵌合成功までの移動量が算出できるようにするため、事前に、画像と必要な移動量のセットを集める。例えば、位置が既知である嵌合状態のオス側コネクタ110とメス側コネクタ120に対し、ロボットアーム100の把持部101でオス側コネクタ110を把持する。そして、既知の引き抜き方向に把持部101を動かしながら挿入開始位置まで動かすとともに、単眼カメラ102で複数枚画像を取得する。また、挿入開始位置を制御量(0,0,0,0,0,0)として嵌合状態から挿入開始までの移動量だけの移動量だけはなく、その周辺の移動量、すなわち制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)にとそれに対応する画像も取得する。
  図5は、実施の形態1における単眼カメラ102が撮影した挿入開始位置とその周辺付近でのカメラ画像と制御量を示す図の例である。
Next, a method of calculating the control amount using the neural network in step S104 of FIG. 4 will be described.
Prior to calculation of the control amount using the neural network, as preparation in advance, in order to be able to calculate the movement amount from the input image to the fitting success by the neural network, a set of the image and the necessary movement amount in advance Collect For example, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images. Also, with the insertion start position as the control amount (0, 0, 0, 0, 0, 0), there is not only a movement amount of only the movement amount from the fitting state to the insertion start Images corresponding to ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are also acquired.
FIG. 5 is an example of a diagram showing an insertion start position photographed by the single-eye camera 102 according to the first embodiment, a camera image in the vicinity of the position, and a control amount.
 そして、嵌合状態から挿入開始位置までの移動量と単眼カメラ102における挿入開始位置及び周辺の位置の画像からなる複数のセットを用いて、一般的なニューラルネットワークの学習則に基づき(例:確率的勾配法)学習させる。
 ニューラルネットワークにはCNNやRNNなど色々な形態が存在するが、本発明はその形態に依存せず、任意の形態を使用することができる。
Then, using a plurality of sets of movement amounts from the fitted state to the insertion start position and images of the insertion start position and peripheral positions in the single-eye camera 102, based on a general neural network learning rule (example: probability Gradient method) to learn.
Although various forms such as CNN and RNN exist in the neural network, the present invention does not depend on the form and any form can be used.
 図6は、実施の形態1におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図である。
 入力層には、単眼カメラ102から得られた画像(例えば各ピクセルの輝度、色差の値)が入力され、出力層は制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)が出力される。
 ニューラルネットワークの学習過程において、入力された画像から中間層を経て得られた出力層の出力値が画像セットで記憶された制御量に近似させるために中間層のパラメータを最適化させることが行われる。その近似方法として確率的勾配法等がある。
FIG. 6 is a diagram showing an example of a neural network in the first embodiment and a learning rule of the neural network.
The input layer receives an image (for example, luminance and color difference value of each pixel) obtained from the monocular camera 102, and the output layer outputs control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). .
In the learning process of the neural network, optimization of the parameters of the intermediate layer is performed so that the output value of the output layer obtained from the input image through the intermediate layer approximates the control amount stored in the image set . There is a stochastic gradient method as an approximation method.
 したがって、図5に示すように嵌合状態から挿入開始までの移動量だけの移動量だけはなく、その周辺の移動にとそれに対応する画像を取得して学習させることで、より正確な学習を行うことができる。
 また、図5においては、オス側コネクタ110は単眼カメラ102に対して位置が固定であり、メス側コネクタ120のみの位置が変化した場合について示しているが、実際は、ロボットアーム100の把持部101が、正確な位置でオス側コネクタ110を把持するわけではなく、個体差等によって、オス側コネクタ110の位置がずれた場合も存在する。したがって、この学習の過程においてオス側コネクタ110が正確な位置からずれた場合の挿入開始位置とその付近の位置の複数の制御量と画像のセットを取得して学習することで、オス側コネクタ110とメス側コネクタ120の両方の個体差に対応できた学習が行われる。
Therefore, as shown in FIG. 5, not only the movement amount of the movement amount from the fitting state to the insertion start, but more accurate learning is obtained by acquiring and learning the image corresponding to the movement around it. It can be carried out.
In FIG. 5, the male connector 110 is fixed in position with respect to the single-eye camera 102, and only the position of the female connector 120 is changed. However, the male side connector 110 is not gripped at the correct position, and there are cases where the position of the male side connector 110 is shifted due to individual differences and the like. Therefore, the male connector 110 is acquired by acquiring and learning a plurality of control amounts and images of the insertion start position and the position in the vicinity thereof when the male connector 110 deviates from the correct position in the process of learning. The learning which can respond to the individual difference of both of and female side connector 120 is performed.
 ただし、ここで注意が必要なのは、制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)は撮影した時点の嵌合状態位置から挿入開始位置までの移動量を除いて算出するため、挿入開始位置から嵌合状態位置までの移動量については、図4のステップS107で用いるために、別途記憶する必要がある。また、上記座標は単眼カメラの座標系として求まるため、制御部203は単眼カメラの座標系とをロボットアーム100全体の座標系が異なる場合には変換したうえでロボットアーム100を制御する必要がある。 However, it should be noted here that the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) is calculated excluding the amount of movement from the fitted state position at the time of shooting to the insertion start position. The movement amount from the position to the fitted state position needs to be separately stored for use in step S107 of FIG. Further, since the above coordinates can be obtained as a coordinate system of a monocular camera, the control unit 203 needs to control the robot arm 100 after converting the coordinate system of the monocular camera if the coordinate system of the entire robot arm 100 is different. .
 この実施例において、単眼カメラをロボットアーム100に固定しているため、メス側コネクタ120が置かれている座標系と、単眼カメラ102の座標系が異なるためである。したがって、単眼カメラ102がメス側コネクタ120の位置と同じ座標系であれば、単眼カメラ102の座標系からロボットアーム100の座標系への変換は不要となる。 In this embodiment, since the monocular camera is fixed to the robot arm 100, the coordinate system in which the female connector 120 is placed is different from the coordinate system of the monocular camera 102. Therefore, if the monocular camera 102 has the same coordinate system as the position of the female connector 120, the conversion from the coordinate system of the monocular camera 102 to the coordinate system of the robot arm 100 is unnecessary.
 次に、図4の動作の詳細と動作例について説明する、
 ステップS101において、ロボットアーム100がオス側コネクタ110を把持するために、事前に登録した動作通りオス側コネクタ110を把持し、ステップS102において、メス側コネクタ120はほぼ上まで移動される。
Next, details of the operation of FIG. 4 and an operation example will be described.
In step S101, the robot arm 100 grips the male connector 110 according to the operation registered in advance in order to grip the male connector 110. In step S102, the female connector 120 is moved substantially upward.
 この時に、把持しているオス側コネクタ110の把持する直前の位置が常に一定とは限らない。このオス側コネクタ110の位置をセットする機械の微妙な動作ずれ等で、微妙な誤差が常に発生している可能性がある。同様にメス側コネクタ120も何らかの誤差を持っている可能性もある。 At this time, the position immediately before gripping of the male connector 110 being gripped is not always constant. A slight error may always occur due to a slight movement deviation of a machine that sets the position of the male connector 110 or the like. Similarly, the female connector 120 may also have some errors.
 そのため、ステップS103において、図5のようにロボットアーム100に付属している単眼カメラ102の撮像部201で撮影された画像に、オス側コネクタ110とメス側コネクタ120両方が映っている画像を取得していることが重要となる。単眼カメラ102のロボットアーム100に対する位置は常に固定されているため、オス側コネクタ110とメス側コネクタ120との相対的な位置情報がこの画像には反映されている。 Therefore, in step S103, as shown in FIG. 5, an image captured by both the male connector 110 and the female connector 120 in the image captured by the imaging unit 201 of the single-lens camera 102 attached to the robot arm 100 is acquired. It is important to do. Since the position of the monocular camera 102 with respect to the robot arm 100 is always fixed, relative positional information between the male connector 110 and the female connector 120 is reflected in this image.
 ステップS104において、この相対的な位置情報を事前に学習した図6に示すようなニューラルネットワークを持つ制御パラメータ生成部202により制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)が計算される。ただし、学習の出来・不出来によっては、制御パラメータ生成部202が出力する制御量が挿入開始位置まで動作できない場合もある。その場合、ステップS103~S106のループを複数回繰り返すことによってステップS105に示す閾値以下となるように制御パラメータ生成部202が繰り返し計算し、制御部203と駆動部204が制御してロボットアーム100の位置を制御する場合もある。 In step S104, the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) is calculated by the control parameter generation unit 202 having a neural network as shown in FIG. 6 in which the relative position information is learned in advance. . However, depending on the success or failure of learning, the control amount output from the control parameter generation unit 202 may not operate up to the insertion start position. In such a case, the control parameter generation unit 202 repeatedly calculates so that the loop of steps S103 to S106 is repeated a plurality of times so that the threshold does not exceed the threshold shown in step S105, and the control unit 203 and drive unit 204 control. There is also a case to control the position.
 S105に示す閾値は嵌合するオス側コネクタ110とメス側コネクタ120の要求精度によって決まる。例えば、コネクタとの嵌めあいが緩く、元々コネクタの特性として精度がそこまで必要のない場合には、閾値を大きく設定できる。また逆の場合には閾値を小さく設定することになる。一般的に製造工程の場合には、製作が許容できる誤差が規定されることが多いため、この値を用いることも可能である。 The threshold shown in S105 is determined by the required accuracy of the mating male connector 110 and female connector 120. For example, when the fitting with the connector is loose and the accuracy is not necessary so far as the characteristics of the connector, the threshold can be set large. In the opposite case, the threshold is set smaller. In general, in the case of a manufacturing process, it is also possible to use this value because an error that can be tolerated by manufacturing is often defined.
 また、学習の出来・不出来によっては、制御パラメータ生成部202が出力する制御量が挿入開始位置まで動作できない場合を想定すると、挿入開始位置を複数位置設定してもよい。オス側コネクタ110とメス側コネクタ120との距離を十分にとらないまま挿入開始位置を設定してしまうと挿入開始を行う前にオス側コネクタ110とメス側コネクタ120が当接し、いずれかを破損してしまうリスクも存在する。その場合は、例えばオス側コネクタ110とメス側コネクタ120とのクリアランスを最初は5mm、次は20mm、次は10mmというように、図4におけるステップS103~ステップS106の間のループの回数に応じて挿入開始位置を設定してもよい。 Further, assuming that the control amount output from the control parameter generation unit 202 can not operate up to the insertion start position depending on the success or failure of learning, a plurality of insertion start positions may be set. If the insertion start position is set without taking a sufficient distance between the male connector 110 and the female connector 120, the male connector 110 and the female connector 120 abut each other before the insertion is started, and one of them is broken. There is also a risk of In that case, for example, the clearance between the male connector 110 and the female connector 120 is 5 mm at the beginning, 20 mm at the next, and 10 mm at the next, depending on the number of loops between step S103 and step S106 in FIG. The insertion start position may be set.
 尚、本実施の形態においては、コネクタを用いて説明したが、この技術の適用はコネクタの嵌合に限られない。例えば基板にICを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入するにおいても、同様の方法を用いれば効果を奏するものである。
 また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがある。
Although the present embodiment has been described using the connector, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb individual differences when performing alignment between objects and objects.
 したがって、実施の形態1において、二つのモノが存在する画像を撮像する撮像部201と、撮像された二つのモノの画像の情報をニューラルネットワークの入力層に入力し、二つのモノの位置関係を制御するための位置の制御量をニューラルネットワークの出力層として出力する制御パラメータ生成部202と、出力された位置の制御量を用いて二つのモノの位置関係を制御するための電流または電圧を制御する制御部203と、二つのモノの位置関係を制御するための電流または電圧を用いて二つのモノの位置関係の一方の位置を移動させる駆動部204を備えたので、個々のモノの個体差または二つのモノの位置関係の誤差があっても単眼カメラのみで位置合わせを行うことができるという効果がある。 Therefore, in the first embodiment, the imaging unit 201 for capturing an image in which two things are present, and information of the two captured images of things are input to the input layer of the neural network, and the positional relationship between the two things is obtained. A control parameter generation unit 202 that outputs a control amount of position for controlling as an output layer of a neural network, and a current or voltage for controlling the positional relationship between two things using the control amount of the output position Control unit 203 and a drive unit 204 for moving the position of one of the two objects using the current or voltage for controlling the position of the two objects. Alternatively, even if there is an error in the positional relationship between two objects, there is an effect that alignment can be performed with only a single-eye camera.
 今回、ニューラルネットワークを一つ使う実施例について説明したが、必要に応じて複数使用する必要が出てくる。なぜならば、今回のように入力を画像、出力を数値とした場合、この数値の近似精度には限界があり、状況によっては数%程度の誤差が出てきてしまう。図4のステップ2の挿入開始付近の位置から、挿入開始位置までの量次第では、ステップS105の判定が常にNoになってしまい動作が完了しない場合がある。そのような場合には、図7のように複数のネットワークを用いる。
 図7は、実施の形態1におけるニューラルネットワークにおいて、複数のネットワークをもちいたフローチャートである。図4のステップS104の詳細ステップを示している。複数のパラメータは図2の制御パラメータ生成部に含まれている。
Although the embodiment using one neural network has been described, it is necessary to use more than one as needed. The reason is that, when the input is an image and the output is a numerical value as in this case, the approximation accuracy of this numerical value is limited, and an error of about several percent may occur depending on the situation. Depending on the amount from the position near the insertion start in step 2 of FIG. 4 to the insertion start position, the determination in step S105 may always be No and the operation may not be completed. In such a case, a plurality of networks are used as shown in FIG.
FIG. 7 is a flowchart using a plurality of networks in the neural network in the first embodiment. It shows the detailed steps of step S104 in FIG. A plurality of parameters are included in the control parameter generator of FIG.
 ステップS701において、制御パラメータ生成部202は、入力された画像に基づいてどのネットワークを用いるかを選択する。
 ループ回数が1回目または得られた制御量が25mm以上の場合はニューラルネットワーク1を選択してステップS702に進む。また、ループ回数が2回目以降で得られた制御量が5mm以上25mm未満の場合はニューラルネットワーク2を選択してステップS703に進む。さらにループ回数が2回目以降で得られた制御量が5mm未満の場合はニューラルネットワーク3を選択してステップS704に進む。ステップS702~ステップS704において選択されたニューラルネットワークを用いて制御量を算出する。
 例えば、各ニューラルネットワークはオス側コネクタ110とメス側コネクタ120の距離もしくは制御量応じて学習されており、図中のニューラルネットワーク3は誤差が±1mm、±1度の範囲内の学習データを、ニューラルネットワーク2は±1~±10mm、±1~±5度の範囲の学習データを、と段階的に学習するデータの範囲をかえている。ここで各ニューラルネットワークにおいて使用する画像の範囲をオーバーラップさせない方が効率的である。
 また、この図7では3つの例を示しているが、ネットワークの数は特に制限がない。このような方式を用いる場合には、どのネットワークを使用するのかを決めるステップS701の判別機能を「ネットワーク選択スイッチ」として用意する必要がある。
このネットワーク選択スイッチは、ニューラルネットワークでも構成できる。この場合、入力層への入力画像、出力層の出力はネットワーク番号になる。画像データは、全てのネットワークで使用している画像、ネットワーク番号のペアを使用する。
In step S701, the control parameter generation unit 202 selects which network to use based on the input image.
If the loop count is the first or the obtained control amount is 25 mm or more, the neural network 1 is selected and the process proceeds to step S702. If the control amount obtained in the second and subsequent loop times is 5 mm or more and less than 25 mm, the neural network 2 is selected, and the process proceeds to step S703. Furthermore, if the control amount obtained in the second and subsequent loop times is less than 5 mm, the neural network 3 is selected and the process proceeds to step S704. The control amount is calculated using the neural network selected in steps S702 to S704.
For example, each neural network is learned according to the distance or control amount between the male connector 110 and the female connector 120, and the neural network 3 in the figure has learning data with an error of ± 1 mm, ± 1 degree, The neural network 2 changes the range of data in which learning data in the range of ± 1 to ± 10 mm, ± 1 to ± 5 degrees is learned stepwise. Here, it is more efficient not to overlap the range of the image used in each neural network.
Further, although three examples are shown in FIG. 7, the number of networks is not particularly limited. In the case of using such a scheme, it is necessary to prepare the determination function of step S 701 for determining which network to use as the “network selection switch”.
The network selection switch can also be configured as a neural network. In this case, the input image to the input layer and the output of the output layer are network numbers. The image data uses image / network number pairs used in all networks.
  尚、複数のニューラルネットワークを用いた例についてもコネクタを用いて説明したが、この技術の適用はコネクタの嵌合に限られない。例えば基板にICを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入するにおいても、同様の方法を用いれば効果を奏するものである。
 また、複数のニューラルネットワークを用いた例についても必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。
Although an example using a plurality of neural networks has also been described using a connector, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and it is effective to use a similar method even when inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate.
Further, the example using a plurality of neural networks is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
 したがって、二つのモノが存在する画像を撮像する撮像部201と、撮像された二つのモノの画像の情報をニューラルネットワークの入力層に入力し、二つのモノの位置関係を制御するための位置の制御量をニューラルネットワークの出力層として出力する制御パラメータ生成部202と、出力された位置の制御量を用いて二つのモノの位置関係を制御するための電流または電圧を制御する制御部203と、二つのモノの位置関係を制御するための電流または電圧を用いて二つのモノの位置関係の一方の位置を移動させる駆動部204を備え、制御パラメータ生成部202は、複数のニューラルネットワークから一つを選択する構成としたので、個々のモノの個体差または二つのモノの位置関係の誤差があっても位置合わせを行うことをより精度よく行えるという効果がある。 Therefore, an imaging unit 201 for capturing an image in which two things are present, and information on images of the two captured things are input to an input layer of a neural network to control the positional relationship between the two things. A control parameter generation unit 202 which outputs a control amount as an output layer of a neural network, and a control unit 203 which controls a current or a voltage for controlling the positional relationship between two things using the output control amount of the position; The control parameter generation unit 202 includes a drive unit 204 for moving the position of one of the two objects by using a current or voltage for controlling the position of the two objects, and the control parameter generation unit 202 In order to perform alignment even if there is an individual difference between the individual objects or an error in the positional relationship between the two objects. There is an effect that accuracy can be performed well.
 実施の形態2.
 実施の形態1においては、位置が既知である嵌合状態のオス側コネクタ110とメス側コネクタ120に対し、ロボットアーム100の把持部101でオス側コネクタ110を把持する。そして、既知の引き抜き方向に把持部101を動かしながら挿入開始位置まで動かすとともに、単眼カメラ102で複数枚画像を取得していた。実施の形態2においては、オス側コネクタ110とメス側コネクタ120の嵌合位置が未知であった場合について説明する。
Second Embodiment
In the first embodiment, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, while moving the gripping unit 101 in a known pulling direction to the insertion start position, the single-eye camera 102 acquires a plurality of images. In the second embodiment, the case where the fitting position of the male connector 110 and the female connector 120 is unknown will be described.
 ロボットが自ら学習し適切な行動を獲得する手法の先行研究として、強化学習と呼ばれる手法が研究されている。この手法では、ロボットが様々な動作を試行錯誤的に行い、良い結果を出した行動を記憶しながら結果として行動を最適化するのだが、行動の最適化のためには大量な試行回数を必要としている。
 この試行回数を減らす手法として、強化学習の中で方策オン(on policy)と呼ばれる枠組みが一般的に用いられている。しかしながら、この枠組みをロボットアームのティーチングに応用するには、ロボットアームや制御信号に特化した様々な工夫を行う必要があるため困難であり、実用化までには至っていない。
 実施の形態2では、実施の形態1におけるようなロボットが様々な動作を試行錯誤的に行い、良い結果を出した行動を記憶しながら結果として行動を最適化のための大量な試行回数を軽減することができる形態について説明する。
A method called reinforcement learning has been studied as a previous study of a method in which a robot learns by itself and acquires appropriate behavior. In this method, the robot performs various motions by trial and error, and as a result optimizes the behavior while memorizing the behavior that produced a good result, but a large number of trials is required to optimize the behavior And
As a method of reducing the number of trials, a framework called "on policy" is commonly used in reinforcement learning. However, in order to apply this framework to teaching of a robot arm, it is difficult to devise various devices specialized for the robot arm and control signals, which is difficult and has not been put to practical use.
In the second embodiment, the robot as in the first embodiment performs various operations in a trial-and-error manner, and as a result stores the behavior that has produced a good result, while reducing the number of trials for optimizing the behavior as a result The form which can be done is explained.
 システム構成を説明する。特に記述しない部分については実施の形態1と同じである。
全体のハードウエア構成としては実施の形態1の図1と同じであるが、ロボットアーム100には把持部101にかかる負荷を計測する力覚センサ801(図1には図示していない)が付加されている点が異なる。
Describe the system configuration. Parts not specifically described are the same as in the first embodiment.
The entire hardware configuration is the same as that of FIG. 1 of the first embodiment, but a force sensor 801 (not shown in FIG. 1) for measuring the load applied to the grip unit 101 is added to the robot arm 100 It differs in that it is done.
 図8は、実施の形態2における位置制御装置の機能構成図を示す。図2との違いは、力覚センサ801、経路決定部802、が追加されており、かつ経路決定部802は、Critic部803、Actor部804、評価部805、経路設定部806から構成されている。
 図9は、実施の形態2における位置制御装置のハードウエア構成図である。図3と異なるのは、力覚センサ801が入出力インターフェース301と電気的または通信可能に接続されている点のみである。また、入出力インターフェース301、プロセッサ302、メモリ303は、図8の制御パラメータ生成部202の機能を構成するとともに、経路決定部802の機能も構成する。したがって力覚センサ801と単眼カメラ201と入出力インターフェース301間、入出力インターフェース301と制御回路間304間は別体として構成されていてもよい。
FIG. 8 shows a functional block diagram of the position control device in the second embodiment. The difference from FIG. 2 is that a force sensor 801 and a route determination unit 802 are added, and the route determination unit 802 is configured of a Critic unit 803, an Actor unit 804, an evaluation unit 805, and a route setting unit 806. There is.
FIG. 9 is a hardware block diagram of the position control device in the second embodiment. The only difference from FIG. 3 is that the force sensor 801 is electrically or communicably connected to the input / output interface 301. The input / output interface 301, the processor 302, and the memory 303 configure the function of the control parameter generation unit 202 in FIG. 8 and also configure the function of the path determination unit 802. Therefore, the force sensor 801, the monocular camera 201, and the input / output interface 301, and the input / output interface 301 and the control circuit 304 may be separately provided.
 次に図8の詳細について説明する。
 力覚センサ801は、ロボットアーム100の把持部101にかかる負荷を計測するものであり、例えば図1でいうオス側コネクタ110とメス側コネクタ120が当接した場合の力の値を計測できるものである。
 Critic部803及びActor部804は、S3、S4は従来の強化学習でいう、Critic部、Actor部と同じである。
 ここで従来の強化学習手法について説明する。本実施例では強化学習の中でもActor-Criticモデルと呼ばれるモデルを使用している(参考文献:強化学習 :  R.S.Sutton and A.G.Barto 2000年12月出版)。Actor部804、Critic部803は環境の状態を撮像部201や力覚センサ801を通じて取得している。Actor部804は、センサデバイスを用いて取得した環境状態Iを入力とし、ロボットコントローラへ制御量Aを出力する関数である。Critic部803はActor部804に嵌合が適切に成功するよう、入力Iに対してActor部804が出力Aを適切に学習するための機構である。
 以下、従来の強化学習手法の方式に関して記載する。
Next, the details of FIG. 8 will be described.
The force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is.
In the Critic unit 803 and the Actor unit 804, S3 and S4 are the same as the Critic unit and the Actor unit in the conventional reinforcement learning.
Here, the conventional reinforcement learning method will be described. In this embodiment, a model called Actor-Critic model is also used among reinforcement learning (Reference: reinforcement learning: RSSutton and AGBarto, published in December 2000). The Actor unit 804 and the Critic unit 803 acquire the state of the environment through the imaging unit 201 and the force sensor 801. The Actor unit 804 is a function that receives the environmental condition I acquired using the sensor device and outputs the control amount A to the robot controller. The Critic unit 803 is a mechanism for the Actor unit 804 to appropriately learn the output A with respect to the input I so that the fitting to the Actor unit 804 is properly and successfully achieved.
Hereinafter, the method of the conventional reinforcement learning method will be described.
 強化学習では、報酬Rと呼ばれる量を定義し、そのRを最大化するような行動AをActor部804が獲得できるようにしている。一例として、学習する作業を実施の形態1に示すようなオス側コネクタ110とメス側コネクタ120との嵌合とすると、嵌合が成功した時にR = 1, そうでない時はR = 0などと定義される。行動Aは、今回は現時点の位置(X、Y、Z、Ax、Ay、Az)からの移動補正量を示し、A =(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)である。ここで、X,Y,Zはロボットの中心部を原点とする位置座標を示し、Ax、Ay、Azは、それぞれ、X軸、Y軸、Z軸を中心とする回転量を示している。移動補正量は、現在の地点からオス側コネクタ110の嵌合について最初に試行するための嵌合開始位置からの制御量である。環境状態、すなわち試行結果の観測は撮像部201からの画像と力覚センサ801の値から得られる。 In reinforcement learning, an amount called a reward R is defined, and the Actor unit 804 can acquire an action A that maximizes R. As an example, assuming that the work to be learned is the fitting between the male connector 110 and the female connector 120 as shown in the first embodiment, R = 1 when the fitting is successful, R = 0 otherwise. It is defined. The action A indicates the movement correction amount from the current position (X, Y, Z, Ax, Ay, Az) this time, and A = (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). Here, X, Y, Z indicate position coordinates with the central portion of the robot as the origin, and Ax, Ay, Az indicate the amounts of rotation about the X axis, Y axis, and Z axis, respectively. The movement correction amount is a control amount from the fitting start position for the first attempt of fitting the male connector 110 from the current point. The observation of the environmental condition, that is, the trial result is obtained from the image from the imaging unit 201 and the value of the force sensor 801.
 強化学習では、状態価値関数V(I)と呼ばれる関数をCritic部803で学習する。ここで、時刻t = 1(例えば嵌合試行開始時)の時に、状態I(1)にて行動A(1)をとり、時刻t = 2(例えば1回目の嵌合試行終了後2回目の嵌合開始前)の時に環境がI(2)に変化し、報酬量R(2)(初回の嵌合試行結果)を得たとする。様々な更新式が考えられるが、下記を一例として挙げる。
V(I)の更新式は以下で定義される。
In reinforcement learning, a Critic unit 803 learns a function called a state value function V (I). Here, when time t = 1 (for example, at the start of the fitting trial), action A (1) is taken in state I (1), and time t = 2 (for example, the second after the first fitting trial is completed) It is assumed that the environment changes to I (2) at the time of the start of the fitting) and the amount of reward R (2) (the result of the first fitting trial) is obtained. Although various update formulas are conceivable, the following is mentioned as an example.
The update equation for V (I) is defined below.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002

 ここで、δは予測誤差、αは学習係数であり0 ~ 1までの正の実数、γは割引率であり0~ 1までの正の実数である。
Actor部804は入力をI、出力をA(I)とし以下の通り、A(I)が更新される。
δ>0の時
Figure JPOXMLDOC01-appb-M000002

Here, δ is a prediction error, α is a learning coefficient, a positive real number from 0 to 1, and γ is a discount rate, a positive real number from 0 to 1.
The Actor unit 804 has an input of I and an output of A (I), and A (I) is updated as follows.
When δ> 0
Figure JPOXMLDOC01-appb-M000003

δ≦0の時
Figure JPOXMLDOC01-appb-M000003

When δ ≦ 0
Figure JPOXMLDOC01-appb-M000004

 ここで、σは出力の標準偏差の値を示し、Actorは状態Iにおいて、A(I)に平均0、分散をσ2とした分布を持つ乱数を加算する。すなわち、試行の結果いかんにかかわらず、ランダムに2回目の移動補正量が決定されるようなものである。
 なお、上記の更新式を一例として用いているが、Actor-Criticモデルも様々な更新式があり、上記にとらわれず一般的に使用されているモデルであれば変更が可能である。
Figure JPOXMLDOC01-appb-M000004

Here, σ indicates the value of the standard deviation of the output, and Actor adds a random number having a distribution with an average of 0 and a variance of σ 2 to A (I) in state I. That is, regardless of the result of the trial, the second movement correction amount is determined randomly.
Although the above-mentioned update formula is used as an example, the Actor-Critic model also has various update formulas, and any model that is generally used regardless of the above can be changed.
 ただし、Actor部804は上記の構成にて各状態にあった適切な行動を覚えることになるが、実施の形態1のとおりに動くのは学習が完了した時点である。学習中は経路設定部806から学習時の推奨行動が計算され受け渡されるため、学習時は制御部203に対して、経路設定部806からの移動信号をそのまま受けて制御部203が駆動部204を制御することになる。
 すなわち、Actor-Criticの従来のモデルでは、嵌合が成功した時にR = 1, そうでない時はR = 0と定義されるため、嵌合が成功した時に初めて学習が行われ、かつ嵌合が成功するまでは、試行に用いられる移動補正量はランダムに与えられるため、試行の失敗度合に応じた次の試行のための移動補正量の決定は行われない。これは、Actor-Criticの従来のモデルだけでなく、Q-Learningなど他の強化学習モデルを用いても嵌合の成功と失敗そのものしか評価しないため、同様な結果となる。本発明の本実施の形態においては、この失敗度合を評価して次の試行のための移動補正量の決定するプロセスについて説明する。
However, although the Actor unit 804 learns the appropriate action in each state with the above configuration, the action according to the first embodiment is when learning is completed. During learning, since the recommended action at the time of learning is calculated and delivered from the route setting unit 806, at the time of learning, the control unit 203 receives the movement signal from the route setting unit 806 as it is. Will control.
That is, in the Actor-Critic conventional model, R = 1 when fitting is successful and R = 0 otherwise, learning is performed only when fitting is successful, and fitting is Since the movement correction amount used for the trial is randomly given until the success, the determination of the movement correction amount for the next trial according to the trial failure degree is not performed. This is the same result because not only the conventional model of Actor-Critic but also other reinforcement learning models such as Q-Learning are used to evaluate only the success and failure of the fitting itself. In the present embodiment of the present invention, a process of evaluating the degree of failure and determining the movement correction amount for the next trial will be described.
 評価部805は、各嵌合試行時における評価を行う関数を生成する。
 図10は、実施の形態2におけるオス側コネクタ110とメス側コネクタ120との嵌合の試行の様子を示す図である。
 例えば図10(A)のような画像が試行の結果として手に入ったとする。この試行では、コネクタの嵌めあい位置が大きくずれるため失敗している。この時にどの程度成功に近いのかを計測し数値化し、成功度合を示す評価値を求める。数値化の方法として、例えば図10(B)のように、画像中にて挿入先側のコネクタ表面積(ピクセル数)を計算する方法がある。この方法では、オス側コネクタ110とメス側コネクタ120の挿入失敗を、ロボットアーム100の力覚センサ801によって検知した時にメス側コネクタ120嵌合面の表面のみ他の背景とは異なる色を塗布、あるいはシールを貼ってあることによって、画像からのデータ取得と計算がより簡易になる。また、これまで述べた方法はカメラの数が一台の場合だが、複数台のカメラを並べ撮影し、撮影されたそれぞれの画像を用いた結果を総合しても構わない。 また、コネクタ表面積以外にも2次元方向(例えばX,Y方向)のピクセル数等を取得しても同様なことが評価できる。
The evaluation unit 805 generates a function that performs evaluation at each fitting trial.
FIG. 10 is a diagram showing a trial of fitting of the male connector 110 and the female connector 120 in the second embodiment.
For example, it is assumed that an image as shown in FIG. 10A is obtained as a result of the trial. In this trial, the fitting position of the connector is largely misaligned and fails. At this time, how close to success is measured and quantified to obtain an evaluation value indicating the degree of success. As a method of digitization, for example, as shown in FIG. 10B, there is a method of calculating a connector surface area (number of pixels) on the insertion side in an image. In this method, when the insertion failure of the male connector 110 and the female connector 120 is detected by the force sensor 801 of the robot arm 100, only the surface of the mating surface of the female connector 120 is coated with a color different from other backgrounds. Alternatively, attaching a sticker makes data acquisition and calculation from images easier. Further, although the method described so far is the case where the number of cameras is one, a plurality of cameras may be arranged side by side, and the results obtained by using each of the photographed images may be integrated. In addition to the connector surface area, the same thing can be evaluated even if the number of pixels in the two-dimensional direction (for example, the X and Y directions) is acquired.
 経路設定部806は、処理として二つのステップにわかれる。
 第一ステップでは、評価部805にて処理された評価結果とロボットが実施に動いた動きを学習する。ロボットの移動補正量をA、評価部805にて処理された成功度合を示す評価値をEとした時、経路設定部806はAを入力とし、Eを出力とする関数を用意し、近似する。関数としては一例としてRBF (Radial Basis Function)ネットワークが上げる。RBFは、様々な未知な関数を簡単に近似することが可能な関数として知られている。
 例えば、k番目の入力
The route setting unit 806 is divided into two steps as processing.
In the first step, the evaluation result processed by the evaluation unit 805 and the motion that the robot has moved to practice are learned. Assuming that the movement correction amount of the robot is A, and the evaluation value indicating the degree of success processed by the evaluation unit 805 is E, the path setting unit 806 prepares and approximates a function having A as an input and E as an output. . As an example, RBF (Radial Basis Function) network raises as a function. RBF is known as a function that can easily approximate various unknown functions.
For example, the kth input
Figure JPOXMLDOC01-appb-M000005

に対して出力f(x)は、以下のように定義される。
Figure JPOXMLDOC01-appb-M000005

Whereas the output f (x) is defined as:
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000007

ここで、σは標準偏差、μはRBFの中心を意味する。
Figure JPOXMLDOC01-appb-M000007

Here, σ is the standard deviation, and μ is the center of RBF.
RBFにて学習するデータは、単体ではなく、試行開始時から最新のデータまでの全てを用いる。例えば、現在、N回目の試行の場合には、N個のデータが準備されている。学習によって上記のW=(w_1,・・・w_J)を決める必要があり、その決定については様々な方法が考えられるが、下記のようなRBF補完が一例としてあげられる。 The data learned in RBF is not single but all data from the start of the trial to the latest data. For example, in the case of the Nth trial, N data are currently prepared. Although it is necessary to determine the above W = (w_1,... W_J) by learning, various methods can be considered for the determination, and the following RBF complementation can be mentioned as an example.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009

とした時に
Figure JPOXMLDOC01-appb-M000009

When
Figure JPOXMLDOC01-appb-M000010

にて、学習が完了する。
Figure JPOXMLDOC01-appb-M000010

Learning is completed.
 RBF補完によって近似を終えた後は、最急降下法やPSO (Particle Swam Optimization)などの一般的な最適化手法により最小値を上記RBFネットワークにより求める。この最小値を次の推奨値として次のActor部804へ入力する。
 要するに、上記事例を具体的に説明すると、失敗した時の移動補正量に対する表面積や2次元方向のピクセル数を評価値として試行回数ごとに時系列に並べてその並びの値を用いて最適解を求めるものである。もっとシンプルに2次元方向のピクセル数を減少させる方向に一定割合で移動させた移動補正量を求めてもよい。
After the approximation is completed by the RBF complement, the minimum value is determined by the above RBF network by a general optimization method such as the steepest descent method or PSO (Particle Swarm Optimization). This minimum value is input to the next Actor unit 804 as the next recommended value.
In short, to describe the above case specifically, the surface area and the number of pixels in the two-dimensional direction with respect to the movement correction amount at the time of failure are arranged as time series for each trial number as evaluation values and the optimum solution is determined using the values of the order. It is a thing. The movement correction amount moved at a constant rate in the direction of decreasing the number of pixels in the two-dimensional direction may be determined more simply.
 次に動作フローを図11に示す。
 図11は、実施の形態2における位置制御装置の経路学習におけるフローチャートである。
 まず、ステップS1101において、ロボットアーム100の把持部101は、オス側コネクタ110を把持する。このオス側コネクタ110の位置や姿勢は図8の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいて動作される。
Next, an operation flow is shown in FIG.
FIG. 11 is a flowchart of path learning of the position control device according to the second embodiment.
First, in step S <b> 1101, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the control unit 203 side of FIG. 8 and operated based on a control program registered on the control unit 203 side in advance.
 次に、ステップS1102において、ロボットアーム100をメス側コネクタ120の挿入位置近辺まで近づける。このメス側コネクタ110のおおよその位置や姿勢は、図8の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいてオス側コネクタ110の位置が、動作される。ここまでは実施の形態1における図4のフローチャートのステップS101~S102と同じである。 Next, in step S1102, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 8, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.
 次に、ステップS1103において、経路決定部802は、単眼カメラ102の撮像部201に対し、画像を撮像するよう指示し、単眼カメラ102は、把持部101が把持しているオス側コネクタ110と、挿入先となるメス側コネクタ120とが両方映っている画像を撮像する。さらに、経路決定部802は、制御部203と単眼カメラ102に対し、現在位置付近での画像を撮像するよう指示し、制御部203に指示した複数の移動値に基づいて駆動部204によって移動された位置において単眼カメラはオス側コネクタ110と、挿入先となるメス側コネクタ120とが両方映っている画像を撮像する。 Next, in step S1103, the path determination unit 802 instructs the imaging unit 201 of the single-eye camera 102 to capture an image, and the single-eye camera 102 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown. Further, the path determination unit 802 instructs the control unit 203 and the single-eye camera 102 to capture an image near the current position, and the movement unit 204 moves the drive unit 204 based on the plurality of movement values instructed to the control unit 203. In this position, the single-eye camera captures an image in which both the male connector 110 and the female connector 120 to be inserted appear.
 次に、ステップS1104において、経路決定部802部のActor部804は、嵌合を行うための制御量を制御部203に与えて駆動部204によってロボットアーム100を移動させ、オス側コネクタ110と、挿入先となるメス側コネクタ120の嵌合を試行する。
 次にステップS1105において、駆動部204によってロボットアーム100を移動中にコネクタ同士が接触した場合には移動量の単位量ごとに力覚センサ801の値と、単眼カメラ102からの画像を経路決定部802の評価部805とCritic部803が記憶する。
Next, in step S1104, the Actor unit 804 of the path determination unit 802 gives a control amount for fitting to the control unit 203 and causes the drive unit 204 to move the robot arm 100, and the male side connector 110; The fitting of the female connector 120 to be inserted is tried.
Next, in step S1105, when the connectors are in contact with each other while the robot arm 100 is being moved by the drive unit 204, the value of the force sensor 801 and the image from the monocular camera 102 are determined for each unit amount of movement amount The evaluation unit 805 and the Critic unit 803 of 802 store them.
 そして、ステップS1106において嵌合が成功したかを評価部805とCritic部803が確認する。
 通常、この時点では嵌合が成功しない。そのため、ステップS1108において評価部805は、成功度合を図10で説明した方法で評価し、位置合わせに対する成功度合を示す評価値を経路設定部806に与える。
 そして、ステップS1109において、経路設定部806は、上述した方法を用いて学習を行い、経路設定部806は、次の推奨値をActor部804に与えるとともに、Critic部803が報酬量に応じて求めた値を出力し、Actor部804が受信する。ステップS1110において、Actor部804は、Critic部803が出力した報酬量に応じて求めた値と経路設定部806が出力した次の推奨値を加算して移動補正量を求める。尚、このステップにおいて、経路設定部806が出力した次の推奨値を用いるだけで十分な効果がある場合には、Critic部803が出力した報酬量に応じて求めた値を加算する必要がないことは言うまでもない。また、Actor部804は、移動補正量を求めるために、Critic部803が出力した報酬量に応じて求めた値と経路設定部806が出力した次の推奨値の加算比率を設定し、加算比率に応じて変更してもよい。
Then, in step S1106, the evaluation unit 805 and the Critic unit 803 confirm whether the fitting is successful.
Usually, the fit is not successful at this point. Therefore, in step S1108, the evaluation unit 805 evaluates the degree of success according to the method described with reference to FIG. 10, and provides the path setting unit 806 with an evaluation value indicating the degree of success for alignment.
Then, in step S1109, the route setting unit 806 performs learning using the above-described method, and the route setting unit 806 gives the next recommended value to the Actor unit 804, and the Critic unit 803 obtains it according to the amount of reward. The Actor unit 804 outputs the received value. In step S 1110, the Actor unit 804 adds the value obtained according to the reward amount output from the Critic unit 803 and the next recommended value output from the route setting unit 806 to obtain a movement correction amount. In this step, when using the next recommended value output by the route setting unit 806 has sufficient effect, it is not necessary to add the value obtained according to the amount of reward output by the Critic unit 803. Needless to say. In addition, in order to obtain the movement correction amount, the Actor unit 804 sets an addition ratio of the value obtained according to the reward amount output by the Critic unit 803 and the next recommended value output by the route setting unit 806, and adds the ratio. It may be changed according to
その後、ステップS1111において、Actor部804は、移動補正量を制御部203に与えてロボットアーム100の把持部101を移動させる。
 その後、再度、ステップ1103に戻り、移動補正量によって移動された位置で画像を撮影し、嵌合動作を行う。これを成功するまで繰り返す。
 嵌合が成功した場合、ステップS1107において、嵌合成功後は、嵌合成功した時のステップS1102からS1106までのIについてActor部804及びCritic部803の学習を行う。最後に経路決定部802は この学習されたニューラルネットワークのデータを制御パラメータ生成部202に与えることで、実施の形態1における動作が可能となる。
Thereafter, in step S1111, the Actor unit 804 gives the movement correction amount to the control unit 203 to move the gripping unit 101 of the robot arm 100.
Thereafter, the process returns to step 1103 again, and the image is photographed at the position moved by the movement correction amount, and the fitting operation is performed. Repeat this until it succeeds.
If the fitting is successful, in step S1107, after the fitting is successful, learning of the Actor unit 804 and the Critic unit 803 is performed for I from steps S1102 to S1106 when the fitting is successful. Finally, the path determination unit 802 supplies the learned data of the neural network to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.
 尚、上記ステップS1107において、嵌合成功した場合IについてActor部804及びCritic部803の学習を行うとしているが、嵌合試行開示から成功まで全ての試行時のデータを用いてActor部804及びCritic部803が学習してもよい。その場合、実施の形態1において、制御量に応じて複数のニューラルネットワークを形成する場合について、記載しているが、嵌合の成功の位置がわかれば、嵌合成功までの距離を用いて制御量の大きさに応じた適切な複数のニューラルネットワークを同時に形成させることが可能となる。 In the step S1107, the learning of the Actor unit 804 and the Critic unit 803 is performed for I when the fitting is successful, but the Actor unit 804 and the Critic are obtained using data of all trials from the disclosure of the fitting trial to the success. The unit 803 may learn. In that case, although the case where a plurality of neural networks are formed according to the control amount is described in Embodiment 1, if the position of the success of the fitting is known, control is performed using the distance to the success of the fitting. It is possible to simultaneously form a plurality of neural networks suitable for the magnitude of the quantity.
 強化学習モジュールとしてActor-Criticモデルをベースに記載したが、Q-Learningなど他の強化学習モデルを用いても構わない。
 関数近似としてRBFネットワークをあげたが、他の関数近似手法(線形、二次関数、など)を用いても構わない。
 評価手法として、コネクタの表面に色違いにする手法をあげたが、他の画像処理技術によりコネクタ間のずれ量等を評価手法としても構わない。
Although described based on the Actor-Critic model as a reinforcement learning module, other reinforcement learning models such as Q-Learning may be used.
Although the RBF network is mentioned as the function approximation, other function approximation methods (linear, quadratic function, etc.) may be used.
Although the method of making the surface of the connector different in color has been described as the evaluation method, the amount of displacement between the connectors may be used as the evaluation method by other image processing techniques.
また、実施の形態1及び本実施の形態で述べたように、この技術の適用はコネクタの嵌合に限られない。例えば基板にICを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入する場合においても、同様の方法を用いれば効果を奏するものである。
 また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。
Further, as described in Embodiment 1 and the present embodiment, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method can be used to produce an effect.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
 したがって、本実施形態においては、制御量を学習するためにActor-Criticモデルを用いる際、Actor部804は、Critic部803が報酬量に応じて求めた値と、経路設定部806が評価値に基づいて求めた推奨値とを加算して試行するための移動補正量を求めることで、通常のActor-Criticモデルでは、位置合わせが成功するまでは非常に多くの試行錯誤数が必要だが、本発明により大幅に位置合わせの試行数を削減することが可能である。 Therefore, in the present embodiment, when using the Actor-Critic model to learn the control amount, the Actor unit 804 uses the value obtained by the Critic unit 803 according to the amount of reward, and the route setting unit 806 uses the evaluation value. Based on the recommended value obtained based on the above to obtain the movement correction amount for trial, the normal Actor-Critic model requires a large number of trial and error numbers until alignment is successful. The invention makes it possible to significantly reduce the number of alignment trials.
 尚、本実施の形態においては、位置合わせ失敗時の撮像部201からの画像を評価することによって位置合わせの試行回数を削減することについて記載したが、位置合わせ試行時の力覚センサ801の値を用いても試行回数を削減することができる。例えば、コネクタの嵌合または二つのモノの挿入を含む位置合わせにおいて、失敗時は力覚センサ801の値がある閾値以上になった時に二つのモノの位置が嵌合または挿入が完了している位置にあるか否かをActor部804が判断することが一般的である。その場合に、a.閾値に達した時点で嵌合または挿入途中だった場合、b.嵌合と挿入は完了しているが嵌合または挿入途中の力覚センサ801の値が、ある程度の値を示す場合なども考えられる。
 a.の場合は、力覚センサ801の値と画像の両方を学習させる方法があり、詳細は実施の形態3に記載の方法を用いれば実施できる。
 b.の場合も、力覚センサ801の値のみで学習する方法として実施の形態3に記載の方法を用いれば、実施できる。また、別の方法として、Actor-Criticモデルでの報酬Rの定義において、嵌合または挿入最中にかかった最大負荷をFとし、Aを正の定数とした時、成功時、R = (1-A/F), 失敗時 R = 0と定義しても同様の効果を奏することができる。
In the present embodiment, the number of alignment trials is reduced by evaluating the image from the imaging unit 201 at the time of alignment failure, but the value of the force sensor 801 at the time of alignment trial is described. Can also reduce the number of trials. For example, in alignment including fitting of a connector or insertion of two things, in the case of failure, when the value of the force sensor 801 exceeds a certain threshold, the positions of two things are completely fitted or inserted. In general, the Actor unit 804 determines whether or not the position is present. In that case, a. If it is in the process of fitting or inserting when the threshold is reached b. It is also conceivable that the value of the force sensor 801 which has been fitted and inserted but has been fitted or inserted shows a certain value.
a. In the case of the above, there is a method of learning both the value of the force sensor 801 and the image, and the details can be implemented using the method described in the third embodiment.
b. Also in the case of the above, the method described in the third embodiment can be implemented as a method of learning only with the value of the force sensor 801. As another method, in the definition of reward R in the Actor-Critic model, when the maximum load applied during fitting or insertion is F and A is a positive constant, R = (1 when successful) The same effect can be achieved by defining -A / F) and R = 0 at failure.
 実施の形態3.
 本実施の形態においては、実施の形態2において、位置合わせが成功した後に行う学習過程において効率的にデータを収集する方法について説明する。したがって特に説明しない場合については実施の形態2と同じものとする。すなわち、実施の形態3における位置制御装置の機能構成図は図8であり、ハードウエア構成図は図9となる。
Third Embodiment
In the present embodiment, a method of efficiently collecting data in a learning process performed after successful alignment in Embodiment 2 will be described. Therefore, the case where it is not particularly described is the same as the second embodiment. That is, the functional block diagram of the position control device in the third embodiment is shown in FIG. 8, and the hardware block diagram is in FIG.
 動作においては、実施の形態2における図11のステップS1107の動作の際により効率的に学習データを収集する方法について以下説明する。 In operation, a method of collecting learning data more efficiently in the operation of step S1107 of FIG. 11 in the second embodiment will be described below.
 図12は、実施の形態3における位置制御装置の経路学習におけるフローチャートを示している。
 まず、ステップS1201において、図11のステップS1107においてオス側コネクタ110とメス側コネクタ120の嵌合が成功した場合、経路設定部806は、変数をi=0, j =1, k =1 として初期化する。 変数iは、以降のロボットアーム100の学習回数、変数kは、オス側コネクタ110とメス側コネクタ120のとの嵌合が外れた時からの学習回数、変数jは図12のフローチャートのループ回数である。
FIG. 12 shows a flowchart in path learning of the position control device in the third embodiment.
First, when the male connector 110 and the female connector 120 are successfully fitted in step S1107 of FIG. 11 in step S1201, the path setting unit 806 sets the variables to i = 0, j = 1, k = 1 and initially. Turn The variable i is the number of times of learning of the robot arm 100, the variable k is the number of times of learning from when the male connector 110 and the female connector 120 are disengaged, and the variable j is the number of loops in the flowchart of FIG. It is.
 次に、ステップS1202において、経路設定部806は、図11ステップS1104において嵌合を行うために与えた移動量から1mm分、戻すようにActor部804を経由して制御部203に移動量を与え、駆動部204によってロボットアーム100を移動させる。そして変数iに対して1加算する。ここで、移動量から1mm戻す指示を与えたが、必ずしも1mmに限る必要はなく、0.5mmでも2mmなどの単位量でもよい。 Next, in step S1202, the path setting unit 806 gives the movement amount to the control unit 203 via the Actor unit 804 so as to return 1 mm from the movement amount given to perform the fitting in FIG. The robot arm 100 is moved by the drive unit 204. Then, 1 is added to the variable i. Here, an instruction to return 1 mm from the movement amount is given. However, the instruction is not necessarily limited to 1 mm, and a unit amount such as 0.5 mm or 2 mm may be used.
 次に、ステップS1203において、経路設定部806はその時の座標をO(i)(この時i = 1)として記憶する。
 ステップS1204において、経路設定部806はO(i)を中心に、ランダムに制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を決定し、Actor部804を経由して制御部203に制御量与え、駆動部204によってロボットアーム100を移動させる。この時、この制御量の最大量は、移動ができる範囲で任意に設定することができる。
Next, in step S1203, the route setting unit 806 stores the coordinates at that time as O (i) (i = 1 at this time).
In step S 1204, the route setting unit 806 randomly determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) centering on O (i), and controls the control unit 203 via the Actor unit 804. The amount is given, and the robot arm 100 is moved by the drive unit 204. At this time, the maximum amount of this control amount can be set arbitrarily within the range in which movement is possible.
 次にステップS1205において、ステップS1204において移動後の位置において、Actor部804は、移動量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)に対応する力覚センサ801の値を収集するとともに、ステップS1206において、Critic部803とActor部804は、移動量に-1を乗じた(-ΔX、-ΔY、-ΔZ、-ΔAx、-ΔAy、-ΔAz)とオス側コネクタ110を保持するためにかかる力を計測する力覚センサ801のセンサ値を学習データとして記録する。 Next, in step S1205, at the position after movement in step S1204, the Actor unit 804 collects the values of the force sensor 801 corresponding to the movement amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz), and In S1206, the Critic part 803 and the Actor part 804 take the movement amount to hold the male side connector 110 by multiplying the movement amount by -1 (-.DELTA.X, -.DELTA.Y, -.DELTA.Z, -.DELTA.Ax, -.DELTA.Ay, -.DELTA.Az) The sensor value of the force sensor 801 that measures the force is recorded as learning data.
 次にステップS1207において、経路設定部806は、集めたデータ数が規定数Jに到達できたかを判定する。データ数が足りなければ、ステップS1208において変数j に1加算してステップS1204に戻り、制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を乱数によって変えてデータを取得し、規定数J個のデータが溜まるまでS1204~S1207繰り返す。
 規定数のデータが溜まったら、ステップS1209において、経路設定部806は、変数j を1にしたうえで、ステップS1210において、オス側コネクタ110とメス側コネクタ120のとの嵌合が外れたかを確認する。
Next, in step S1207, the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1208, and the process returns to step S1204 to change the control amount (.DELTA.X, .DELTA.Y, .DELTA.Z, .DELTA.Ax, .DELTA.Ay, .DELTA.Az) by random numbers and acquire data, The steps S1204 to S1207 are repeated until the individual data are accumulated.
When the specified number of data are accumulated, the route setting unit 806 sets the variable j to 1 in step S1209, and then confirms whether the male connector 110 and the female connector 120 are disengaged in step S1210. Do.
 外れていなかったら、ステップS1211を経由してステップS1202に戻る。
 ステップS1211において経路設定部806は、ロボットアーム100の座標を、制御量を与える前の座標O(i)に戻すようにActor部804を経由して制御部203に制御量を与え、駆動部204によってロボットアーム100を移動させる。
 その後、ステップS1202からステップS1210までのループをオス側コネクタ110とメス側コネクタ120のとの嵌合が外れるまで、嵌合を行うために与えた制御量から1mmもしくは単位量戻す処理と、戻した位置を中心に制御量を与えて力覚センサ801のデータを収集する処理とを繰り返す。オス側コネクタ110とメス側コネクタ120のとの嵌合が外れた場合は、ステップS1212に進む。
If not, the process returns to step S1202 via step S1211.
In step S1211, the route setting unit 806 gives a control amount to the control unit 203 via the Actor unit 804 so as to return the coordinates of the robot arm 100 to the coordinates O (i) before giving the control amount. To move the robot arm 100.
Thereafter, the loop from step S1202 to step S1210 is returned by 1 mm or a unit amount from the control amount given to perform fitting until the fitting between the male connector 110 and the female connector 120 is released. A process of giving a control amount centering on the position and collecting data of the force sensor 801 is repeated. If the male connector 110 and the female connector 120 are disengaged from each other, the process proceeds to step S1212.
 ステップS1212において、経路設定部806は、変数iをI(Iはオス側コネクタ110とメス側コネクタ120のとの嵌合が外れたと判定された時のiの値よりも大きい整数)とするとともに、嵌合を行うために与えた移動量から例えば10mm(ここもその他の値でもよい)戻すようにActor部804を経由して制御部203に制御量を与え、駆動部204によってロボットアーム100を移動させる。 In step S1212, the route setting unit 806 sets the variable i to I (I is an integer larger than the value of i when it is determined that the male connector 110 and the female connector 120 are disengaged from each other). A control amount is given to the control unit 203 via the Actor unit 804 so as to return, for example, 10 mm (this may also be another value) from the movement amount given to perform fitting, and the drive unit 204 Move it.
 次に、ステップS1213において、経路設定部806は、ステップS1212で移動したロボットアーム100の座標の位置を中心位置O(i+k)として記憶する。
 次に、ステップS1214において、経路設定部806は、中心位置O(i+k)を中心に、再度、ランダムに制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を決定した上で、Actor部804を経由して制御部203に制御量与え、駆動部204によってロボットアーム100を移動させる。
Next, in step S1213, the path setting unit 806 stores the position of the coordinates of the robot arm 100 moved in step S1212 as the central position O (i + k).
Next, in step S 1214, the route setting unit 806 randomly determines again the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) again centering on the center position O (i + k), A control amount is given to the control unit 203 via the Actor unit 804, and the robot arm 100 is moved by the drive unit 204.
 ステップS1215において、Critic部803とActor部804は、制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)にて移動後のロボットアーム100位置において、単眼カメラ102の撮像部201が撮像した画像を取得する。
 ステップS1216において、Critic部803とActor部804は、移動量に-1を乗じた(-ΔX、-ΔY、-ΔZ、-ΔAx、-ΔAy、-ΔAz)と画像を1つの学習データとして記録する。
In step S 1215, the Critic unit 803 and the Actor unit 804 are images captured by the imaging unit 201 of the monocular camera 102 at the robot arm 100 position after movement by the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) To get
In step S 1216, the Critic unit 803 and the Actor unit 804 record the image as one learning data as (−ΔX, −ΔY, −ΔZ, −ΔAx, −ΔAy, −ΔAz) in which the movement amount is multiplied by −1. .
 ステップS1217において、経路設定部806は、集めたデータ数が規定数Jに到達できたかを判定する。データ数が足りなければ、ステップS1212において変数j に1加算してステップS1214に戻り、制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を乱数によって変えてデータを取得し、規定数J個のデータが溜まるまでS1214~S1217繰り返す。
 なお、S1204における制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)とS1204での制御量のランダム値の最大値は異なる値を取ることができる。
 以上の方法で取得した学習データは、Actor部804及びCritic部803の学習を行う。
In step S1217, the route setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1212, and the process returns to step S1214 to change the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) by random numbers and acquire data, and the specified number J S1214 to S1217 are repeated until individual data are accumulated.
The maximum value of the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) in S 1204 and the random value of the control amount in S 1204 can take different values.
The learning data acquired by the above method performs learning of the Actor unit 804 and the Critic unit 803.
 図13は実施の形態3におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図である。
 実施の形態1、2については、力覚センサ801のデータを用いた学習方法について記載していなかった。実施形態1と2は、入力層は画像のみであったのに対し、実施の形態3においては、入力層に画像に替えて力覚センサ801の値をいれればよい。力覚センサ801の値は、3つ(力と2方向のモーメント)の場合と、6つ(3方向と3方向モーメント)いずれでもよい。出力層は制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)が出力される。尚、オス側コネクタ110とメス側コネクタ120のとの嵌合が外れている場合には、入力層に画像と力覚センサ801の値が同時に入力されることとなる。
 ニューラルネットワークの学習過程において、入力された画像及び力覚センサ801の値から中間層を経て得られた出力層の出力値が画像及び力覚センサ801の値とセットで記憶された制御量に近似させるために中間層のパラメータを最適化させることが行われ、学習されることなる。
 最後に経路決定部802は この学習されたニューラルネットワークのデータを制御パラメータ生成部202に与えることで、実施の形態1における動作が可能となる。
FIG. 13 is a diagram showing an example of a neural network according to the third embodiment and a learning rule of the neural network.
The first and second embodiments have not described a learning method using data of the force sensor 801. In the first and second embodiments, the input layer is only an image, while in the third embodiment, the value of the force sensor 801 may be inserted into the input layer instead of the image. The value of the force sensor 801 may be either three (force and moment in two directions) or six (three moment and three moment). The output layer outputs control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). When the male connector 110 and the female connector 120 are not fitted to each other, the image and the value of the force sensor 801 are simultaneously input to the input layer.
In the learning process of the neural network, the output value of the output layer obtained through the intermediate layer from the input image and the value of the force sensor 801 approximates the value of the image and the force sensor 801 and the control amount stored in a set. The optimization of the parameters of the middle layer is performed to learn, and is learned.
Finally, the path determination unit 802 supplies the learned data of the neural network to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.
 尚、本実施の形態においては、オス側コネクタ110とメス側コネクタ120のとの嵌合のための移動から少しずつ戻しつつ、ロボットアーム100を微小に周辺に移動させて学習させるために、嵌合がはずれるまでは単眼カメラ102の画像の画素量次第では十分な学習できない前提で説明していた。
 しかしながら単眼カメラ102の画像が十分高精細でロボットアーム100を微小に周辺に移動させた画像であっても十分に学習可能である場合は、単眼カメラ102の画像のみで学習してもよいし、オス側コネクタ110とメス側コネクタ120とが嵌合している場合であっても単眼カメラ102の画像と力覚センサ801の値を両方用いてもよい。
In the present embodiment, the robot arm 100 is slightly moved from the movement for fitting the male connector 110 and the female connector 120 while the robot arm 100 is slightly moved to the periphery for learning. The explanation has been made on the assumption that sufficient learning can not be performed depending on the pixel amount of the image of the single-eye camera 102 until the difference occurs.
However, in the case where the image of the single-eye camera 102 is sufficiently high definition and an image in which the robot arm 100 is slightly moved to the periphery is sufficiently learnable, learning may be performed only with the image of the single-eye camera 102. Even when the male connector 110 and the female connector 120 are fitted, both the image of the monocular camera 102 and the value of the force sensor 801 may be used.
 さらに、実施の形態1、2において、複数のニューラルネットワークを使用するケースについて説明している。本実施の形態においても、例えばオス側コネクタ110とメス側コネクタ120とが嵌合している状態と、オス側コネクタ110とメス側コネクタ120とが嵌合していない場合とで、ニューラルネットワークを区別してもよい。上記に説明したようにオス側コネクタ110とメス側コネクタ120とが嵌合している状態では力覚センサ801のみを入力層と形成し、嵌合からはずれたら画像のみで入力層を形成した方がより精度のよい学習が行えるし、画像のみで学習させる場合でも嵌合している場合としていない場合を区別することで、画像の構成がことなるために精度よい学習が行える。 Furthermore, in Embodiments 1 and 2, the case of using a plurality of neural networks is described. Also in the present embodiment, for example, the neural network is used in a state in which the male connector 110 and the female connector 120 are fitted and in the case where the male connector 110 and the female connector 120 are not fitted. You may distinguish. As described above, in the state where the male connector 110 and the female connector 120 are fitted, only the force sensor 801 is formed as the input layer, and when it is out of the fitting, the input layer is formed by only the image In this case, learning can be performed with higher accuracy, and even when learning is performed using only images, it is possible to perform accurate learning because the configuration of an image is different by distinguishing between cases where fitting is performed and cases where fitting is not performed.
 尚、実施の形態1、2で述べたように、本実施の形態にいても、この技術の適用はコネクタの嵌合に限られない。例えば基板にICを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入する場合おいても、同様の方法を用いれば効果を奏するものである。
 また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。
As described in the first and second embodiments, the application of this technique is not limited to the fitting of the connector even in the present embodiment. For example, the present invention can be applied to the case of mounting an IC on a substrate, and even in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method is effective.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
 したがって、本実施の形態においては、二つのモノについて挿入を伴う位置合わせを含む場合、制御量を学習するために、挿入状態から抜き出す際に挿入状態からの経路上とその周辺とに移動させるよう制御量を指示する経路設定部806と、移動された位置の出力層、移動された位置の力覚センサ801の値を入力層として学習させるために移動された位置と力覚センサ801の値を取得するActor部804とを、備えたので、効率的に学習データを収集することができる。 Therefore, in the present embodiment, when including alignment with insertion for two objects, in order to learn the control amount, when extracting from the insertion state, it is moved on the path from the insertion state and its periphery The path setting unit 806 for instructing the control amount, the output layer of the moved position, the value of the force sensor 801 at the moved position as the input layer, and the moved position and the value of the force sensor 801 The acquisition of the Actor unit 804 enables efficient collection of learning data.
 実施の形態4.
 本実施の形態においては、実施の形態2における学習過程(特に学習初期)においても、安全な制御を行わせる方法について説明する。実施の形態4における位置制御装置のハードウエア構成図は実施の形態2と同じ図9とする。
Fourth Embodiment
In the present embodiment, a method of performing safe control also in the learning process (particularly, in the initial stage of learning) in the second embodiment will be described. The hardware configuration diagram of the position control device in the fourth embodiment is the same as that of the second embodiment shown in FIG.
 図14は、実施の形態4における位置制御装置の機能構成図を示す。図8との違いは、制御パラメータ調整部1401、が追加されており、かつ制御パラメータ調整部1401は、軌道生成部1402、座標変換部1403、重力補正部1404、コンプライアントモーション制御部1405、合成部1406から構成されている。 FIG. 14 shows a functional configuration diagram of the position control device in the fourth embodiment. The difference from FIG. 8 is that a control parameter adjustment unit 1401 is added, and the control parameter adjustment unit 1401 includes a trajectory generation unit 1402, a coordinate conversion unit 1403, a gravity correction unit 1404, a compliant motion control unit 1405, and a combination. A section 1406 is composed.
 経路決定部802の構成は実施の形態2に準じ、強化学習モジュールとしてActor-Criticモデルを、成功度合を評価するモジュールとして評価部805と経路設定部806からなる構成を記載したが、Q-LearningやDDPGなど他の強化学習モデルを用いても構わない。また、本実施の形態で説明する内容の要点からすると、評価部805または経路設定部806がなくても、学習するための機能とロボットの位置制御するための機能が別あってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することができる。 The configuration of the route determination unit 802 is the same as that of the second embodiment, and the configuration including the Actor-Critic model as a reinforcement learning module and the evaluation unit 805 and the route setting unit 806 as modules for evaluating the degree of success is described. You may use other reinforcement learning models, such as DDPG. Moreover, from the point of the contents described in the present embodiment, even if there is no evaluation unit 805 or the path setting unit 806, the function for learning and the function for controlling the position of the robot are different. Learning data can be collected while preventing the application of a heavy load.
 次に図14の詳細について説明する。
 実施の形態2においては制御パラメータ生成部202が生成した制御量を制御部203に出力することで駆動部204を構成する各デバイスに対する電流・電圧値を決定し制御していたが、この方法では特に初期の学習過程において制御量が不適切となり、駆動部204がエラー停止する、またはロボットアームやオス側コネクタ110とメス側コネクタ120などの周辺環境を破損する可能性がある。また、オス側コネクタ110とメス側コネクタ120の強度が想定よりも弱く、学習過程において制御量を十分小さく設定していたとしても、オス側コネクタ110とメス側コネクタ120などの周辺環境を破損する可能性がある。これは制御量の設定側と制御量に基づく制御側とが独立しているために発生しうる要因である。ここで本実施の形態においては、学習過程においても周辺環境に負荷をかけすぎないような仕組みを導入する。
Next, the details of FIG. 14 will be described.
In the second embodiment, the control amount generated by the control parameter generation unit 202 is output to the control unit 203 to determine and control the current / voltage value for each device constituting the drive unit 204. In particular, in the initial learning process, the control amount may become inappropriate, and the drive unit 204 may stop in error or damage the surrounding environment such as the robot arm and the male connector 110 and the female connector 120. Further, the strengths of the male connector 110 and the female connector 120 are weaker than expected, and even if the control amount is set sufficiently small in the learning process, the surrounding environment such as the male connector 110 and the female connector 120 is damaged. there is a possibility. This is a factor that may occur because the setting side of the control amount and the control side based on the control amount are independent. Here, in the present embodiment, a mechanism is introduced to prevent the surrounding environment from being overloaded even in the learning process.
 軌道生成を行う軌道生成部1402は、制御パラメータ生成部202が生成した制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)を目標位置として取得し、速度・加速度が滑らかになるように調整した周期制御量(ΔX’、ΔY’、ΔZ’、ΔAx’、ΔAy’、ΔAz’)をロボットアーム100の制御周期、すなわち制御部203の制御周期に合わせて出力する機能を持つ。目標位置としての制御量(ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz)はあくまで制御周期としては複数の周期で到達する制御量として定義されるのに対し、周期制御量(ΔX’、ΔY’、ΔZ’、ΔAx’、ΔAy’、ΔAz’)は、基本的に制御量に到達するために一周期ごと設定される制御量であり、周辺環境への負荷を考慮した制御量であり、想定外に力覚センサ801も検知された負荷などにも対応することができる。周期制御量の調整方法は後述する。
 力覚センサ801は、ロボットアーム100の把持部101にかかる負荷を計測するものであり、例えば図1でいうオス側コネクタ110とメス側コネクタ120が当接した場合の力の値を計測できるものである。実施の形態2においては学習初期には試行の際出力される動作により、周辺環境に過大な力がかかり、ロボットアーム100やオス側コネクタ110とメス側コネクタ120などの周辺環境を破損する可能性がある。そこで実施の形態4においては、コンプライアントモーション制御部1405を制御パラメータ生成部202の後段に配置して、力覚センサ801で取得した外力に倣って動作させることで、ロボットアーム100やオス側コネクタ110とメス側コネクタ120などの周辺環境に過大な力をかけることを防ぐ。これにより、学習に必要な試行を安全に行わせることが出来る。
力覚センサ801の値は、3つ(力と2方向のモーメント)の場合と、6つ(3方向と3方向モーメント)いずれでもよい。6つの場合の力覚センサ801の値は(Fx、Fy、Fz、Tx、Ty、Tz)と表すことができる。ただし上記座標は力覚センサの座標系として求まるため、座標変換部1403は力覚センサの座標系とロボットアーム100全体の座標系が異なる場合に、力覚センサ801の値をロボットアーム100全体の座標系に変換する機能を持つ。
 力覚センサ801で計測する値は、重力の影響を受ける。重力補正部1404は、力覚センサ801で計測した値から重力の影響を取り除く機能を持つ。
 コンプライアントモーション制御部1405は、座標変換部1403および重力補正部1404で補正された力覚センサ801の値を取得する。物理法則に従い、力覚センサ801から検出されたこの外力に適応した制御量を出力する。外力に適応した制御量の調整方法は後述する。
 合成部1406は、軌道生成部1402の出力である制御量と、コンプライアントモーション制御部1405の出力である制御量を合成し、制御部203に出力する。合成方法には、軌道生成部1402の出力である制御量とコンプライアントモーション制御部1405の出力である制御量の加算を用いる。あるいは加算比率を設定し、加算比率に応じた重み付き加算を行ってもよい。
The trajectory generation unit 1402 for generating a trajectory acquires the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) generated by the control parameter generation unit 202 as a target position, and adjusts so that the velocity and acceleration become smooth. It has a function of outputting the periodic control amounts (ΔX ′, ΔY ′, ΔZ ′, ΔAx ′, ΔAy ′, ΔAz ′) in accordance with the control cycle of the robot arm 100, that is, the control cycle of the control unit 203. While the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) as the target position is defined as a control period to be reached at a plurality of cycles, the period control amount (ΔX ′, ΔY ′) , ZZ ', AAx', AAy ', AAz') are basically control amounts set for each cycle to reach the control amount, and are control amounts taking into account the load on the surrounding environment, assumed The external force sensor 801 can also cope with the detected load. The method of adjusting the periodic control amount will be described later.
The force sensor 801 measures the load applied to the grip portion 101 of the robot arm 100, and can measure, for example, the value of force when the male connector 110 and the female connector 120 in FIG. 1 abut. It is. In the second embodiment, an excessive force is applied to the surrounding environment due to the operation output at the initial stage of learning, which may damage the surrounding environment such as the robot arm 100 and the male connector 110 and the female connector 120. There is. Therefore, in the fourth embodiment, the robot arm 100 or the male-side connector can be operated by disposing the compliant motion control unit 1405 at the subsequent stage of the control parameter generation unit 202 and operating according to the external force acquired by the force sensor 801. It is prevented that excessive force is applied to the surrounding environment such as 110 and the female side connector 120. This makes it possible to safely carry out the trials necessary for learning.
The value of the force sensor 801 may be either three (force and moment in two directions) or six (three moment and three moment). The values of the force sensor 801 in the six cases can be expressed as (Fx, Fy, Fz, Tx, Ty, Tz). However, since the above coordinates can be obtained as the coordinate system of the force sensor, the coordinate conversion unit 1403 determines the value of the force sensor 801 for the entire robot arm 100 when the coordinate system of the force sensor and the coordinate system of the entire robot arm 100 are different. It has a function to convert to coordinate system.
The value measured by the force sensor 801 is affected by gravity. The gravity correction unit 1404 has a function of removing the influence of gravity from the value measured by the force sensor 801.
The compliant motion control unit 1405 acquires the value of the force sensor 801 corrected by the coordinate conversion unit 1403 and the gravity correction unit 1404. According to the physical law, a control amount adapted to the external force detected from the force sensor 801 is output. The adjustment method of the control amount adapted to the external force will be described later.
The combining unit 1406 combines the control amount output from the trajectory generation unit 1402 and the control amount output from the compliant motion control unit 1405, and outputs the combined amount to the control unit 203. In the combining method, the addition of the control amount which is the output of the trajectory generation unit 1402 and the control amount which is the output of the compliant motion control unit 1405 is used. Alternatively, the addition ratio may be set, and weighted addition may be performed according to the addition ratio.
 軌道生成部1402では、ロボットアーム100の制御周期および最大速度、最大加速度、最大加加速度を所与として、これらの少なくともいずれかの制限を超えないように制御周期単位での周期制御量を計算する。
 例えば、非特許文献(KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012. p. 4862-4869.)のように、以下の条件をすべて満たすように周期制御量を計算する方法がある。
以下の定数は、ロボットアーム100の仕様に対応した所与の定数とする。
The trajectory generation unit 1402 calculates the periodic control amount in control cycle units so as not to exceed at least one of these limitations, given the control period of the robot arm 100 and the maximum velocity, maximum acceleration, and maximum jerk. .
For example, non-patent documents (KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on IEEE, 2012. As in p. 4862-4869.), there is a method of calculating the periodic control amount so as to satisfy all the following conditions.
The following constants are given constants corresponding to the specifications of the robot arm 100.
Tcycle :周期
Vmax :最大速度
Amax :最大加速度
Jmax :最大加加速度(ジャーク)
Tcycle: Period Vmax: Maximum velocity Amax: Maximum acceleration Jmax: Maximum jerk (jerk)
Figure JPOXMLDOC01-appb-M000011

ここで、xi、vi、αi、jiは以下を表す変数である。
xi:ステップiにおける現在位置
vi:ステップiにおける現在速度
αi:ステップiにおける現在加速度
ji:ステップiにおける現在加加速度(ジャーク)
Figure JPOXMLDOC01-appb-M000011

Here, xi, vi, αi and ji are variables representing the following.
xi: Current position in step i vi: Current velocity α in step i: Current acceleration in step i ji: Current acceleration in step i (jerk)
 上述したコンプライアントモーション制御部1405における外力に適応した制御量の調整方法について説明する。コンプライアントモーション制御では、環境の安定性および剛性を表す係数を所与として、外力の情報から従属動作を計算する。例えば、外力をf(t)とした時の従属動作Δx(t)は、次の微分方程式を解くことによって計算できる。 A method of adjusting the control amount adapted to the external force in the compliant motion control unit 1405 described above will be described. In the compliant motion control, the dependent motion is calculated from the information of the external force, given a coefficient representing the stability and stiffness of the environment. For example, when the external force is f (t), the dependent movement Δx (t) can be calculated by solving the following differential equation.
Figure JPOXMLDOC01-appb-M000012
ここで、m, d, kは環境の安定性および剛性を表す係数である。
m: 重力定数
d: 抵抗
k: ばね定数
Figure JPOXMLDOC01-appb-M000012
Here, m, d, k are coefficients representing the stability and stiffness of the environment.
m: gravity constant
d: resistance
k: spring constant
 次に動作フローを図15に示す。
 図15は、実施の形態4における位置制御装置の経路学習におけるフローチャートである。
 まず、ステップS1501において、ロボットアーム100の把持部101は、オス側コネクタ110を把持する。このオス側コネクタ110の位置や姿勢は図14の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいて動作される。
Next, the operation flow is shown in FIG.
FIG. 15 is a flowchart of route learning of the position control device according to the fourth embodiment.
First, in step S <b> 1501, the gripping unit 101 of the robot arm 100 grips the male connector 110. The position and attitude of the male connector 110 are registered in advance on the control unit 203 side of FIG. 14 and operated based on a control program registered on the control unit 203 side in advance.
 次に、ステップS1502において、ロボットアーム100をメス側コネクタ120の挿入位置近辺まで近づける。このメス側コネクタ110のおおよその位置や姿勢は、図14の制御部203側で事前に登録されており、あらかじめ制御部203側に登録された制御プログラムに基づいてオス側コネクタ110の位置が、動作される。ここまでは実施の形態1における図4のフローチャートのステップS101~S102と同じである。 Next, in step S1502, the robot arm 100 is brought close to the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the side of the control unit 203 in FIG. 14, and the position of the male connector 110 is determined based on a control program registered on the side of the control unit 203 in advance. Be operated. Up to this point is the same as steps S101 to S102 of the flowchart of FIG. 4 in the first embodiment.
 次に、ステップS1503において、経路決定部802は、単眼カメラ102の撮像部201に対し、画像を撮像するよう指示し、単眼カメラ102は、把持部101が把持しているオス側コネクタ110と、挿入先となるメス側コネクタ120とが両方映っている画像を撮像する。単眼カメラ102からの画像を経路決定部802の評価部805とCritic部803が記憶する。
 さらに、ステップS1504において、経路決定部802は、力覚センサ801に対し、外力を取得するよう指示し、力覚センサ801は現在位置での外力を取得する。同時に、力覚センサ801の値を経路決定部802の評価部805とCritic部803が記憶する。
Next, in step S1503, the path determination unit 802 instructs the imaging unit 201 of the single-eye camera 102 to take an image, and the single-eye camera 102 holds the male connector 110 held by the holding unit 101; An image is captured in which both the female connector 120 to be inserted are shown. The evaluation unit 805 and the Critic unit 803 of the path determination unit 802 store the image from the single-eye camera 102.
Further, in step S1504, the path determination unit 802 instructs the force sensor 801 to acquire an external force, and the force sensor 801 acquires an external force at the current position. At the same time, the evaluation unit 805 and the Critic unit 803 of the path determination unit 802 store the value of the force sensor 801.
 次に、ステップS1505において、経路決定部802部のActor部804は、嵌合を行うための制御量を計算し、軌道生成部1402に与える。 Next, in step S1505, the Actor unit 804 of the path determination unit 802 calculates a control amount for fitting, and supplies the control amount to the trajectory generation unit 1402.
 次にステップS1506において、軌道生成部1402によって速度・加速度が滑らかになるように調整した新しい制御量を計算する。具体的には前述したロボットアーム100の仕様に対応した所与の定数、Tcycle、Vmax、Amax、Jmaxを満たすxiを1制御周期ごとの目標位置である周期制御量を算出する。 Next, in step S1506, the trajectory generation unit 1402 calculates a new control amount adjusted so that the velocity and acceleration become smooth. Specifically, a cycle control amount is calculated which is a target position for each control cycle xi which satisfies given constants Tcycle, Vmax, Amax, and Jmax corresponding to the specification of the robot arm 100 described above.
 ステップS1507において、座標変換部1403は、ステップS1503で取得した力覚センサ801の値をロボットアーム100全体の座標系に変換する。 In step S1507, the coordinate conversion unit 1403 converts the value of the force sensor 801 acquired in step S1503 into the coordinate system of the entire robot arm 100.
 次に、ステップS1508において、重力補正部1404は、ステップS1506で座標変換した力覚センサ801の値から重力の影響を取り除き、コンプライアントモーション制御部1405に与える。 Next, in step S1508, the gravity correction unit 1404 removes the influence of gravity from the value of the force sensor 801 subjected to coordinate conversion in step S1506, and applies the value to the compliant motion control unit 1405.
 次に、ステップS1509において、コンプライアントモーション制御部1405は、ステップS1508で重力補正した力覚センサ801の値から、外力に適応した制御量を計算し、合成部1406に与える。外力に適応した制御量は、例えば力覚センサ801の値が小さくなるように、上述にて計算された値を算出する。 Next, in step S1509, the compliant motion control unit 1405 calculates a control amount adapted to the external force from the value of the force sensor 801 corrected for gravity in step S1508, and supplies the control amount to the combining unit 1406. As the control amount adapted to the external force, for example, the value calculated in the above is calculated so that the value of the force sensor 801 becomes smaller.
 次に、ステップS1510において、合成部1406は、ステップS1506で計算した周期制御量と、ステップS1509で計算したコンプライアントモーション制御量を加算、または重みづけ加算することで合成し、周期制御量調整値として制御部203に与える。 Next, in step S1510, the combining unit 1406 combines the periodic control amount calculated in step S1506 and the compliant motion control amount calculated in step S1509 by adding or weighting, and adjusts the periodic control amount adjustment value. As the control unit 203.
 次にステップS1511において、駆動部204によってロボットアーム100を移動し、コネクタ挿入を試行する。制御パラメータ調整部1401は、周期制御量調整値が制御パラメータ生成部202で生成された制御量に到達したか否かを確認し、到達していなければステップS1504へ戻る。したがって制御周期毎にステップS1504からステップS1511までの動作を繰り返すことができる。
 また、制御パラメータ生成部202で生成された制御量に到達する前にオス側コネクタ110とメス側コネクタ120が当接した場合でも、力覚センサ801の値が上昇することが制御周期毎に検出され、周期制御量調整値としてフィードバック制御されるため、学習初期であっても周囲の環境を破壊する可能性を小さくすることができる。
Next, in step S1511, the robot arm 100 is moved by the drive unit 204 to try to insert a connector. The control parameter adjustment unit 1401 checks whether the periodic control amount adjustment value has reached the control amount generated by the control parameter generation unit 202, and if it has not, the process returns to step S1504. Therefore, the operations from step S1504 to step S1511 can be repeated for each control cycle.
In addition, even when the male connector 110 and the female connector 120 abut before reaching the control amount generated by the control parameter generation unit 202, it is detected that the value of the force sensor 801 rises for each control cycle. Since the feedback control is performed as the periodic control amount adjustment value, the possibility of destroying the surrounding environment can be reduced even in the initial stage of learning.
 次にステップS1512において、嵌合が成功したかを評価部805とCritic部803が確認すると同時に、ステップS1503で記憶した単眼カメラ102と力覚センサ801の値およびS1504で計算した制御パラメータの値より、Actor部804とCritic部803のニューラルネットワークパラメータを更新する。 Next, in step S1512, the evaluation unit 805 and the Critic unit 803 confirm whether the fitting is successful, and at the same time, based on the values of the monocular camera 102 and force sensor 801 stored in step S1503 and the control parameters calculated in S1504. The neural network parameters of the Actor unit 804 and the Critic unit 803 are updated.
 そして、ステップS1513において嵌合が成功していない場合は、ロボットアーム100の位置を動かさず、ステップS1503に戻って次の試行を行う。
 尚、ステップS1503に戻る前に図15には示していないが、実施の形態2の図11のステップS1108、ステップS1109、及び図10に示す実施の形態2で用いた評価方法で評価したうえでステップS1503に戻ることで、実施の形態2の場合と同様の効果が得られる。
If the fitting is not successful in step S1513, the position of the robot arm 100 is not moved, and the process returns to step S1503 to perform the next trial.
Although not shown in FIG. 15 before returning to step S1503, the evaluation method used in step S1108 and step S1109 of FIG. 11 of the second embodiment and in the second embodiment shown in FIG. By returning to step S1503, the same effect as that of the second embodiment can be obtained.
なお、ステップS1513において、篏合が成功していた場合は、篏合タスク自体は終了する。Actor部804とCritic部803の学習をさらに続ける場合は、ステップS1501に戻ってオス側コネクタ110の把持から再試行することで、学習の精度を高めることが可能である。 If the combination is successful in step S1513, the combination task itself ends. When learning of the Actor unit 804 and the Critic unit 803 is further continued, it is possible to increase the accuracy of learning by returning to step S1501 and retrying from gripping the male connector 110.
 強化学習モジュールとしてActor-Criticモデルをベースに記載したが、DDPGなど他の強化学習モデルを用いても構わない。 Although described based on the Actor-Critic model as a reinforcement learning module, other reinforcement learning models such as DDPG may be used.
 また、実施の形態1及び本実施の形態で述べたように、この技術の適用はコネクタの嵌合に限られない。例えば基板にICを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入する場合においても、同様の方法を用いれば効果を奏するものである。
 また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。
Further, as described in Embodiment 1 and the present embodiment, the application of this technology is not limited to the fitting of the connector. For example, the present invention can be applied to the case of mounting an IC on a substrate, and in the case of inserting a capacitor or the like having a large dimensional error of a foot into a hole of the substrate, the same method can be used to produce an effect.
Further, the present invention is not necessarily limited to the insertion into the substrate, but can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, learning of the relationship between the image and the control amount using a neural network has the advantage of being able to absorb the individual differences when performing alignment between objects and things, and more accurately. Control amount can be calculated.
 したがって、本実施形態においては、二つのモノについて挿入を伴う位置合わせを含む場合、撮像部201から取得された画像と力覚センサ801の値に基づいて挿入するための制御量を指示するとともに位置合わせに対する結果から学習する経路決定部802と、制御量に到達するために一制御周期ごとに設定される周期制御量と、力覚センサ801の値に基づく外力に適応した制御量とに基づいて周期制御量調整値を出力する制御パラメータ調整部1401を備えたので、軌道生成制御および力覚センサ801を用いたコンプライアントモーション制御を加算した制御量によってロボットアーム100を動作させ、通常の強化学習モデルでは学習が収束するまでは試行錯誤が必要であり環境を破損する可能性があるが、本発明により学習の初期であっても安全に試行を行わせることが可能である。 Therefore, in the present embodiment, in the case where the alignment involving the insertion of two objects is included, the control amount to be inserted is indicated based on the image acquired from the imaging unit 201 and the value of the force sensor 801. Based on the path determination unit 802 which learns from the result for matching, the periodic control amount set for each control period to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor 801 Since the control parameter adjustment unit 1401 that outputs the periodic control amount adjustment value is provided, the robot arm 100 is operated by the control amount obtained by adding the trajectory generation control and the compliant motion control using the force sensor 801, and normal reinforcement learning The model requires trial and error until the learning converges, which may damage the environment. Even early it is possible to perform a safe trial.
 尚、本実施の形態においては、制御パラメータ調整部1401の機能に限定して説明したが、実施の形態2、3に記述の内容に対しても制御パラメータ調整部1401の機能を追加しても実施の形態2、3動作させることができ、安全に且つ学習速度を向上させることが可能である。
In the present embodiment, only the functions of the control parameter adjustment unit 1401 have been described, but the functions of the control parameter adjustment unit 1401 may be added to the contents of the description in the second and third embodiments. The second and third embodiments can be operated, and the learning speed can be improved safely.
100:ロボットアーム、
101:把持部、
102:単眼カメラ
110:オス側コネクタ
120:メス側コネクタ
201:撮像部
202:制御パラメータ生成部
203:制御部
204:駆動部
301:入出力インターフェース
302:プロセッサ、
303:メモリ、
304:制御回路、
305:モータ、
801:力覚センサ
802:経路決定部
803:Critic部
804:Actor部
805:評価部
806:経路設定部
1401:制御パラメータ調整部
1402:軌道生成部
1403:座標変換部
1404:重力補正部
1405:コンプライアントモーション制御部
1406:合成部
100: Robot arm,
101: gripping portion,
102: monocular camera 110: male connector 120: female connector 201: imaging unit 202: control parameter generation unit 203: control unit 204: drive unit 301: input / output interface 302: processor,
303: Memory,
304: control circuit,
305: Motor,
801: Force sensor 802: Path determination section 803: Critic section 804: Actor section 805: Evaluation section 806: Path setting section 1401: Control parameter adjustment section 1402: Trajectory generation section 1403: Coordinate conversion section 1404: Gravity correction section 1405: Compliant motion control unit 1406: combining unit

Claims (9)

  1.  二つのモノについて挿入を伴う位置合わせを含む場合、撮像部から取得された画像と力覚センサの値に基づいて挿入するための制御量を指示するとともに位置合わせに対する結果から学習する経路決定部と、
     前記制御量に到達するために一制御周期ごとに設定される周期制御量と、前記一制御周期に対応した前記力覚センサの値に基づく外力に適応した制御量とに基づいて周期制御量調整値を出力する制御パラメータ調整部と、
     を備えた位置制御装置。
    And a path determination unit for instructing a control amount to be inserted based on the image acquired from the imaging unit and the value of the force sensor when including alignment involving insertion of two objects, and learning from the result of alignment, and ,
    Periodic control amount adjustment based on a periodic control amount set for each control period to reach the control amount, and a control amount adapted to an external force based on a value of the force sensor corresponding to the one control period A control parameter adjustment unit that outputs a value;
    Position control device with.
  2.  前記周期制御量は、前記制御量に到達するために最大速度、最大加速度、最大加加速度のうちのいずれかを考慮して一周期ごと設定され、前記外力に適応した制御量は、前記力覚センサの値から重力成分を除外した値に応じて決定される、請求項1に記載の位置制御装置。 The periodic control amount is set for each period in consideration of any of the maximum velocity, the maximum acceleration, and the maximum jerk to reach the control amount, and the control amount adapted to the external force is the force sense The position control device according to claim 1, wherein the position control device is determined according to a value obtained by excluding the gravity component from the value of the sensor.
  3.  請求項1において指示された前記周期制御量調整値を用いて前記二つのモノの位置関係を制御するための電流または電圧を制御する制御部と、前記二つのモノの位置関係を制御するための電流または電圧を用いて前記二つのモノの位置関係の一方の位置を移動させる駆動部とを備え、前記力覚センサは、前記二つのモノの位置関係を保持する際にかかる力を取得する、請求項1または2に記載の位置制御装置。 A control unit for controlling a current or a voltage for controlling the positional relationship between the two objects using the periodic control amount adjustment value instructed in claim 1 and a positional relationship between the two objects. And a drive unit for moving one position of the positional relationship between the two objects using current or voltage, and the force sensor acquires a force applied when maintaining the positional relationship between the two objects, The position control device according to claim 1.
  4.  請求項1に記載の経路決定部は、挿入状態から抜き出す際に前記挿入状態からの経路上とその周辺とに移動させるよう移動量を指示する経路設定部と、移動された位置データを出力層、移動された位置の力覚センサの値を入力層として学習させるために移動された位置の値と力覚センサの値を取得するActor部と、
    を備えた請求項1乃至3のいずれかに記載の位置制御装置。
    The path determination unit according to claim 1, when extracting from the insertion state, the path setting unit that instructs the movement amount to move on the path from the insertion state and its periphery, and the moved position data as the output layer An Actor unit for acquiring the value of the moved position and the value of the force sensor in order to learn the value of the moved force sensor as the input layer;
    The position control device according to any one of claims 1 to 3, further comprising:
  5.  前記二つのモノ存在する画像を撮像し取得する単眼カメラを備え、
     前記Actor部は、前記移動された位置において前記単眼カメラによって撮像された画像を取得する、
    請求項4に記載の位置制御装置。
    It has a single-eye camera for capturing and acquiring the two existing images.
    The Actor unit acquires an image captured by the monocular camera at the moved position.
    The position control device according to claim 4.
  6.  前記Actor部は、前記入力層と前記出力層とからActor-Criticモデルを用いて学習をおこなう、請求項4または5に記載の位置制御装置。 The position control device according to claim 4, wherein the Actor unit performs learning from the input layer and the output layer using an Actor-Critic model.
  7.  前記Actor部は、複数のニューラルネットワークを学習し、前記複数のニューラルネットワークの一方は、前記二つのモノの位置関係が挿入されている位置のデータが学習に用いられ、他方のデータは、前記二つのモノの位置関係が挿入されていない位置のデータが学習に用いられる、請求項6に記載の位置制御装置。 The Actor unit learns a plurality of neural networks, and in one of the plurality of neural networks, data of a position where the positional relationship of the two objects is inserted is used for learning, and the other data is the second data. The position control device according to claim 6, wherein data of a position where the positional relationship of two objects is not inserted is used for learning.
  8.   前記Actor部は、前記二つのモノの位置関係が挿入されている位置のデータには前記力覚センサの値が用いられ、前記二つのモノの位置関係が挿入されていない位置のデータには、画像データが用いられる、請求項7に記載の位置制御装置。 The Actor unit uses the value of the force sensor for data of the position where the positional relationship between the two objects is inserted, and the data of the position where the positional relationship between the two objects is not inserted, The position control device according to claim 7, wherein image data is used.
  9.  二つのモノの位置制御方法であって、
     二つのモノについて挿入を伴う位置合わせを含む場合、取得された画像と力覚センサの値に基づいて挿入するための制御量を出力し、
     前記制御量に対して一制御周期で到達できる周期制御量と、前記一制御周期に対応した前記力覚センサの値に基づく外力に適応した制御量とに基づいて周期制御量調整値を出力し、
     位置合わせに対する結果から学習する
     二つのモノの位置制御方法。
    It is a position control method of two things,
    When including alignment with insertion for two objects, output a control amount for insertion based on the acquired image and the value of the force sensor,
    The periodic control amount adjustment value is output based on a periodic control amount that can be reached in one control cycle with respect to the control amount and a control amount adapted to an external force based on the value of the force sensor corresponding to the one control period. ,
    Two-object position control method that learns from the result for alignment.
PCT/JP2018/002053 2018-01-24 2018-01-24 Position control device and position control method WO2019146007A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2018530627A JP6458912B1 (en) 2018-01-24 2018-01-24 Position control device and position control method
PCT/JP2018/002053 WO2019146007A1 (en) 2018-01-24 2018-01-24 Position control device and position control method
TW107125131A TW201932257A (en) 2018-01-24 2018-07-20 Position control device and position control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/002053 WO2019146007A1 (en) 2018-01-24 2018-01-24 Position control device and position control method

Publications (1)

Publication Number Publication Date
WO2019146007A1 true WO2019146007A1 (en) 2019-08-01

Family

ID=65228992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/002053 WO2019146007A1 (en) 2018-01-24 2018-01-24 Position control device and position control method

Country Status (3)

Country Link
JP (1) JP6458912B1 (en)
TW (1) TW201932257A (en)
WO (1) WO2019146007A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021066794A1 (en) * 2019-09-30 2021-04-08 Siemens Aktiengesellschaft Machine learning enabled visual servoing with dedicated hardware acceleration
WO2021170163A1 (en) * 2020-02-28 2021-09-02 Rittal Gmbh & Co. Kg Arrangement for fitting and wiring electronic components in switchgear engineering, and corresponding method
WO2022030334A1 (en) * 2020-08-03 2022-02-10 キヤノン株式会社 Control device, lithography device, and method for manufacturing article
CN115990891A (en) * 2023-03-23 2023-04-21 湖南大学 Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7239399B2 (en) * 2019-06-19 2023-03-14 ファナック株式会社 Adjustment support device
US20220281120A1 (en) 2019-08-02 2022-09-08 Dextrous Robotics, Inc. Robotic manipulators
CN111230469B (en) * 2020-03-11 2021-05-04 苏州科诺机器人有限责任公司 Full-automatic water joint assembling mechanism and assembling method
TWI766252B (en) * 2020-03-18 2022-06-01 揚明光學股份有限公司 Optical lens manufacturing system and optical lens manufacturing method using the same
JP2022122670A (en) * 2021-02-10 2022-08-23 オムロン株式会社 Robot model learning device, robot model machine learning method, robot model machine learning program, robot control device, robot control method, and robot control program
CN113140104B (en) * 2021-04-14 2022-06-21 武汉理工大学 Vehicle queue tracking control method and device and computer readable storage medium
US11845184B2 (en) 2022-04-18 2023-12-19 Dextrous Robotics, Inc. System and/or method for grasping objects

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998017444A1 (en) * 1996-10-24 1998-04-30 Fanuc Ltd Force control robot system with visual sensor for inserting work
JP2011230245A (en) * 2010-04-28 2011-11-17 Yaskawa Electric Corp Robot system
JP2014054715A (en) * 2012-09-13 2014-03-27 Fanuc Ltd Article retrieving apparatus that determines retaining position/posture of robot based on selecting conditions
JP2015217486A (en) * 2014-05-19 2015-12-07 富士通株式会社 Determining apparatus, determining method, and determining program
JP2016221642A (en) * 2015-06-02 2016-12-28 セイコーエプソン株式会社 Robot, robot control device, robot control method and robot system
JP2016221660A (en) * 2015-06-03 2016-12-28 富士通株式会社 Determination method, determination program and determination device
WO2017018113A1 (en) * 2015-07-29 2017-02-02 株式会社オートネットワーク技術研究所 Object handling simulation device, object handling simulation system, method for simulating object handling, manufacturing method for object, and object handling simulation program
JP2017030135A (en) * 2015-07-31 2017-02-09 ファナック株式会社 Machine learning apparatus, robot system, and machine learning method for learning workpiece take-out motion

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5904635B2 (en) * 2012-03-02 2016-04-13 セイコーエプソン株式会社 Control apparatus, control method, and robot apparatus
JP6248694B2 (en) * 2014-02-25 2017-12-20 セイコーエプソン株式会社 Robot, robot system, and control device
JP6376296B1 (en) * 2017-02-09 2018-08-22 三菱電機株式会社 Position control device and position control method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998017444A1 (en) * 1996-10-24 1998-04-30 Fanuc Ltd Force control robot system with visual sensor for inserting work
JP2011230245A (en) * 2010-04-28 2011-11-17 Yaskawa Electric Corp Robot system
JP2014054715A (en) * 2012-09-13 2014-03-27 Fanuc Ltd Article retrieving apparatus that determines retaining position/posture of robot based on selecting conditions
JP2015217486A (en) * 2014-05-19 2015-12-07 富士通株式会社 Determining apparatus, determining method, and determining program
JP2016221642A (en) * 2015-06-02 2016-12-28 セイコーエプソン株式会社 Robot, robot control device, robot control method and robot system
JP2016221660A (en) * 2015-06-03 2016-12-28 富士通株式会社 Determination method, determination program and determination device
WO2017018113A1 (en) * 2015-07-29 2017-02-02 株式会社オートネットワーク技術研究所 Object handling simulation device, object handling simulation system, method for simulating object handling, manufacturing method for object, and object handling simulation program
JP2017030135A (en) * 2015-07-31 2017-02-09 ファナック株式会社 Machine learning apparatus, robot system, and machine learning method for learning workpiece take-out motion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KROEGER, TORSTEN ET AL.: "Simple and Robust Visual Servo Control of Robot Arms Using an On-Line Trajectory Generator", 2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, 18 May 2012 (2012-05-18), pages 4862 - 4869, XP032450906 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021066794A1 (en) * 2019-09-30 2021-04-08 Siemens Aktiengesellschaft Machine learning enabled visual servoing with dedicated hardware acceleration
CN114630734A (en) * 2019-09-30 2022-06-14 西门子股份公司 Visual servoing with dedicated hardware acceleration to support machine learning
US11883947B2 (en) 2019-09-30 2024-01-30 Siemens Aktiengesellschaft Machine learning enabled visual servoing with dedicated hardware acceleration
WO2021170163A1 (en) * 2020-02-28 2021-09-02 Rittal Gmbh & Co. Kg Arrangement for fitting and wiring electronic components in switchgear engineering, and corresponding method
WO2022030334A1 (en) * 2020-08-03 2022-02-10 キヤノン株式会社 Control device, lithography device, and method for manufacturing article
JP7466403B2 (en) 2020-08-03 2024-04-12 キヤノン株式会社 CONTROL APPARATUS, LITHOGRAPHY APPARATUS, CONTROL METHOD AND ARTICLE MANUFACTURING METHOD - Patent application
CN115990891A (en) * 2023-03-23 2023-04-21 湖南大学 Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration

Also Published As

Publication number Publication date
JP6458912B1 (en) 2019-01-30
TW201932257A (en) 2019-08-16
JPWO2019146007A1 (en) 2020-02-06

Similar Documents

Publication Publication Date Title
WO2019146007A1 (en) Position control device and position control method
JP6376296B1 (en) Position control device and position control method
JP6587761B2 (en) Position control device and position control method
CN109807882B (en) Gripping system, learning device, and gripping method
JP6522488B2 (en) Machine learning apparatus, robot system and machine learning method for learning work taking-out operation
WO2024027647A1 (en) Robot control method and system and computer program product
CN112757284A (en) Robot control apparatus, method and storage medium
US10926416B2 (en) Robotic manipulation using an independently actuated vision system, an adversarial control scheme, and a multi-tasking deep learning architecture
CN114952821A (en) Robot motion control method, robot and system
CN113878588B (en) Robot compliant assembly method based on tactile feedback and oriented to buckle type connection
JP2008009999A (en) Plane extraction method, and device, program, and storage medium therefor, and imaging device
CN113927602B (en) Robot precision assembly control method and system based on visual and tactile fusion
CN113954076B (en) Robot precision assembling method based on cross-modal prediction assembling scene
CN110942083A (en) Imaging device and imaging system
US11372475B2 (en) Information processing apparatus, information processing method, and floor modeling system
Ramachandruni et al. Vision-based control of UR5 robot to track a moving object under occlusion using Adaptive Kalman Filter
CN113011526B (en) Robot skill learning method and system based on reinforcement learning and unsupervised learning
WO2022091366A1 (en) Information processing system, information processing device, information processing method, and recording medium
US20240054393A1 (en) Learning Device, Learning Method, Recording Medium Storing Learning Program, Control Program, Control Device, Control Method, and Recording Medium Storing Control Program
JP2023156751A (en) Information processing device, information processing method, program, and learned model
JP2024034668A (en) Wire insertion system, wire insertion method, and wire insertion program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018530627

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18902988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18902988

Country of ref document: EP

Kind code of ref document: A1