WO2022014128A1

WO2022014128A1 - Assembly work order planning device and assembly work order planning method

Info

Publication number: WO2022014128A1
Application number: PCT/JP2021/017731
Authority: WO
Inventors: 利浩森澤
Original assignee: 株式会社日立製作所
Priority date: 2020-07-16
Filing date: 2021-05-10
Publication date: 2022-01-20
Also published as: JP7474653B2; JP2022018654A

Abstract

The present invention plans an assembly order of components constituting a product and a work order of work subjects that may include robots and workers.　This assembly work order planning device is characterized by comprising: an information acquisition unit that acquires assembly state transition information including information indicating a process of assembling a plurality of components into a sub-assembly product and then into a final product; an assembly work definition unit that defines, on the basis of the assembly state transition information, work that can be performed by the work subject with respect to an assembly state including the two components before the assembly or the sub-assembly product; a constraint condition setting unit that sets a constraint condition related to whether or not the work can be performed; a learning unit that performs reinforcement learning on a method of selecting the work for the assembly state according to the constraint condition; and an assembly work order generation unit that generates an assembly work order of the product on the basis of the result of the reinforcement learning.

Description

Assembly work sequence planning device and assembly work sequence planning method

The present invention relates to an assembly work sequence planning device and an assembly work sequence planning method. The present invention claims the priority of application number 2020-121910 of the Japanese patent filed on July 16, 2020, and for designated countries where incorporation by reference to the literature is permitted, the content described in the application is Incorporated into this application by reference.

In the assembly work of a product consisting of multiple parts, an efficient work method and order are planned in advance. At the assembly work site, not only the worker and the robot may work individually, but also the worker and the robot may work together. In a work site where workers and robots coexist, it is important to improve the efficiency of assembly work sequence planning and production efficiency. In particular, it is necessary to completely define the contents of the work by the robot in advance for the work by the robot, and in the situation where the robot configuration of the line and cell is diversifying due to the spread of the robot and the cost reduction, the assembly work order. Planning automation is desired.

Regarding the method of planning the assembly work order, for example, in Patent Document 1, for example, in Patent Document 1, part attributes and part arrangements for each part and adjacency relationship information with other parts are extracted from a three-dimensional CAD (Computer Aided Design) model of a product, and between parts. A technique is described in which an assembly graph is generated as a directed graph, an assembly graph is generated from adjacency information, and an assembly order is generated as the reverse order by obtaining the decomposition order of parts.

Further, for example, in Patent Document 2, a technique of inputting 3D CAD data, performing an assembly work plan with a task planner, modeling the assembly work with a Petri net, and searching for an optimum route to generate a robot program offline. Is described.

Further, for example, in Patent Document 3, when a simulation of a work operation by a robot is executed and a control command is determined based on the execution result, a result label for each of a case where the determination result is good and a case where the determination result is bad is trained. As data, a technique for learning control commands is described.

Japanese Patent No. 6199210 Japanese Patent No. 3705672 Japanese Patent No. 6457421

The technique described in Patent Document 1 generates an assembly order by using a directed graph and an assembly graph based on the priority relationship and the adjacency relationship of the connection between the parts extracted from the three-dimensional CAD data, and the parts in the product. A huge number of assembly sequences will be generated according to the number of. In addition, it is necessary to manually set the priority relationship of coupling and the like, and it is difficult to efficiently determine the assembly order. Furthermore, the work sequence of the work subject cannot be generated.

The technique described in Patent Document 2 includes an assembly work planning method, but the assembly order is layered into a product state level, a target movement level, and a hand movement level, and is modeled by a petri net. The assembly order is determined at the stage when the petri net is configured, but since it is not a method of automatically configuring (automatically modeling) the petri net, it is not possible to generate an assembly work order in which robots and workers are the main workers.

The technique described in Patent Document 3 uses reinforcement learning technique to learn control commands of machines including robots, and the learning is advanced by executing a simulation of work operation. Since the assembly work sequence is already defined when defining a series of work operations, it does not generate an assembly work sequence.

The present invention has been made in view of the above points, and an object of the present invention is to be able to plan the assembly order of each component constituting a product and the work order of a work subject including a robot and a worker. And.

The present application includes a plurality of means for solving at least a part of the above problems, and examples thereof are as follows.

In order to solve the above problems, the assembly work sequence planning device according to one aspect of the present invention is an assembly state including information indicating a process from assembling to a final product through a component in which a plurality of parts are assembled. An assembly work that defines work that can be performed by a work subject for an assembly state consisting of two parts or parts before assembly, based on an information acquisition unit that acquires transition information and the assembly state transition information. A definition unit, a constraint condition setting unit that sets a constraint condition regarding whether or not the work can be executed, a learning unit that reinforces learning how to select the work for the assembly state according to the constraint condition, and a result of the reinforcement learning. It is characterized by including an assembly work order generation unit that generates an assembly work order of the product based on the above.

According to the present invention, it is possible to plan the assembly order of each component constituting the product and the work order of the work subject including the robot and the worker.

Issues, configurations, and effects other than those described above will be clarified by the explanation of the following embodiments.

FIG. 1 is a diagram showing a configuration example of an assembly work sequence planning device according to the first embodiment of the present invention. FIG. 2 is a diagram showing an example of an assembly work environment. FIG. 3 is a diagram showing an example of a product composed of a plurality of parts. FIG. 4 is a diagram showing an AND / OR tree representing the assembly state transition of the product shown in FIG. FIG. 5 is a diagram showing a list of transitions to each assembly state corresponding to FIG. FIG. 6 is a diagram showing an example of an assembly work order in which the work subject is set as one robot based on the assembly state transition of the product, and an assembly state for each work. FIG. 7 is a diagram showing an example of an assembly work order in which a work subject is set as one robot based on an assembly state transition and an assembly work environment, and an assembly state for each work. FIG. 8 is a diagram showing an example of an assembly work order in which the work subject is set as two robots based on the assembly state transition and the assembly work environment, and the assembly state for each work. FIG. 9 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device according to the first embodiment. FIG. 10 is a diagram showing a configuration example of an assembly work sequence planning device according to a second embodiment of the present invention. FIG. 11 is a diagram showing an example of a product composed of a plurality of parts. FIG. 12 shows a list of transitions to each assembly state corresponding to the assembly operation of FIG. FIG. 13 is a diagram showing an example of an assembly work order in which the work subject is set as one robot and one worker based on the assembly state transition of the product and the assembly work environment, and the assembly state for each work. .. FIG. 14 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device according to the second embodiment. FIG. 15 is a diagram for explaining the outline of A3C. FIG. 16 is a diagram for explaining the processing contents of the training of the action selection function and the state value function.

Hereinafter, a plurality of embodiments of the present invention will be described with reference to the drawings. In addition, in all the drawings for explaining each embodiment, in principle, the same members are designated by the same reference numerals, and the repeated description thereof will be omitted. Further, in the following embodiments, it is needless to say that the components (including element steps and the like) are not necessarily essential except when explicitly stated and when it is clearly considered to be essential in principle. stomach. In addition, when saying "consisting of A", "consisting of A", "having A", and "including A", other elements are excluded unless it is clearly stated that it is only that element. It goes without saying that it is not something to do. Similarly, in the following embodiments, when the shape, positional relationship, etc. of the constituent elements, etc. are referred to, the shape, etc. It shall include those that are close to or similar to.

<First Embodiment>
FIG. 1 shows a configuration example of the assembly work sequence planning device 10 according to the first embodiment of the present invention.

The assembly work order planning device 10 is for planning the assembly work order when the parts are assembled and the product is completed by the work entity including the robot and the worker.

The assembly work order planning device 10 includes each functional block of a calculation unit 11, a storage unit 12, an input unit 13, an output unit 14, and a communication unit 15.

The assembly work sequence planning device 10 includes a processor such as a CPU (Central Processing Unit), a memory such as a DRAM (Dynamic Random Access Memory), a storage such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a keyboard, and a mouse. It consists of an input device such as a touch panel, an output device such as a display, and a general computer such as a personal computer equipped with a communication module such as an NIC (Network Interface Card).

The arithmetic unit 11 is realized by a computer processor. The calculation unit 11 is a functional block of an information acquisition unit 111, an assembly work definition unit 112, a constraint condition setting unit 113, an action selection / value function configuration unit 114, a reward setting unit 115, a learning unit 116, and an assembly work order generation unit 117. Have. These functional blocks are realized by the processor of the computer executing a predetermined program loaded in memory. However, a part or all of these functional blocks may be realized as hardware by an integrated circuit or the like.

The information acquisition unit 111 is connected to the CAD (Computer Aided Design) system 20 connected to the network 1 including the Internet and the mobile phone communication network via the communication unit 15, and the assembly work environment / product information 121 and the assembly state transition information. The 122 is acquired and stored in the storage unit 12.

Here, the assembly work environment / product information 121 includes an object (for example, a robot, a stage, a tray for mounting parts, a transfer device, a robot) existing in the assembly work environment, which is modeled as CAD data in advance by the CAD system 20. Information indicating the shape and position of the hand or tool attached to the arm, the frame structure, etc.) is included. Further, the assembly work environment / product information 121 includes information on a plurality of parts constituting the product.

The assembly state transition information 122 includes information indicating the process from assembling to the final product through the assembly product in which a plurality of parts are assembled, and the contact constraint condition. Here, the contact constraint condition represents a constraint in the moving direction of the parts due to contact between the parts. According to the contact constraint condition, the transition relationship of the assembly state in the assembly process can be obtained.

The assembly work definition unit 112 refers to the assembly state transition information 122, defines the assembly work for the assembly state consisting of two parts or components before assembly, and defines the defined assembly work as the assembly work information 123. Is stored in the storage unit 12. The assembly work definition unit 112 can set the state of the assembly work environment (robot hand, tray, stage, etc.) before work. Further, the assembly work definition unit 112 can set a work for transitioning the state of the assembly work environment such as a stage.

The constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124. Setting the constraint condition is to associate the work that cannot be performed with the assembled state. When the assembly work is defined from the assembly state transition, when the work is selected for the assembly state in reinforcement learning, and when the assembly state is not expected in advance of the work, the work cannot be selected due to constraints. Judged. The same applies to the selection of work for the state of the assembly work environment.

Constraints may be set in the order of assembly work. For example, if a task must be performed prior to assembling a component, the task must be selected in the presence of the component. Conversely, if a task must be performed after assembly of a component, the task must be selected in the absence of the component.

The action selection / value function component 114 is an action selection function that expresses the relationship between the state and the action, which is used for reinforcement learning to select the work for the assembly state and set the assembly work order for planning the assembly work order. , And construct a value function that expresses the relationship between the state and the state value. In addition, in order to construct the action selection function and the value function, the items of the state (state) and the action (action) are necessary, and these are acquired from the assembly state transition information 122 and the assembly work information 123. .. In the case of the present embodiment, the assembled state corresponds to the state as a term in the field of reinforcement learning technology, and the work corresponds to the action as a term in the field of reinforcement learning technology.

There are many algorithms for realizing reinforcement learning, such as table Q learning, function approximation method, and methods that modify or extend them. In this embodiment, A3C (Asynchronous Advantage Actor-Critic) of the on-policy method, which is an example of deep reinforcement learning using a deep neural network as an approximate function, is adopted. In A3C, a neural network is constructed by inputting a state and outputting a selected action and a value function. The number of layers in the intermediate layer of the neural network and the number of nodes in each layer are set in advance.

Another deep reinforcement learning is DQN (Deep Q-learning Network), which is an off-polisy method, and the action value function is composed of a deep neural network. In table Q learning, a Q table (arrangement) that associates states with action values is constructed. Either method may be adopted, but in order to perform reinforcement learning, it is necessary to construct a mechanism for obtaining the action selection and the value for evaluating the action selection from the state. Reinforcement learning will be described later with reference to FIGS. 15 and 16.

The reward setting unit 115 sets a negative reward for the assembled state when a work that cannot be performed because it conflicts with the constraint conditions is selected. In addition, when the product is completed and the assembly work is completed, a positive reward is set.

From the initial state of assembly, the learning unit 116 is a series of whether the work selection for the assembly state fails and the assembly fails, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds. Reinforcement learning is performed by repeating episodes that mean actions.

Specifically, the episode is repeated, the action selection for the state is learned, and when the result of the successful assembly is obtained, the reinforcement learning is completed. By completing the reinforcement learning, the behavior selection function that selects the good behavior for the state is obtained. In the processing of one episode, an action is selected for the initial state, and the next state is obtained. Select another action to get the next action. In this way, the action selection is repeated step by step to advance the state, and if the action selection is wrong or the state is not allowed, a negative reward is obtained and the episode ends. Conversely, if successful, the episode ends with a positive reward. Hereinafter, learning may be referred to as training.

The assembly work order generation unit 117 sets the assembly work order based on the result of reinforcement learning by the learning unit 116. Specifically, based on the action selection function obtained as a result of reinforcement learning, a series of actions selected in the process from the initial state of assembly to the state of successful assembly in which the assembly work is completed are generated as the assembly work order. It also generates the assembly status and assembly work environment status at each work stage.

The storage unit 12 is realized by the memory and storage of the computer. The storage unit 12 stores the assembly work environment / product information 121, the assembly state transition information 122, the assembly work information 123, and the constraint condition information 124. Information other than these may be stored in the storage unit 12.

The input unit 13 is realized by an input device of a computer. The input unit 13 receives various operations from the operator (user). The output unit 14 is realized by an output device of a computer. The output unit 14 displays, for example, an operation input screen. It is realized by the communication unit 15 and the communication module of the computer. The communication unit 15 connects to the CAD system 20 via the network 1 and receives predetermined information from the CAD system 20.

The CAD system 20 supplies the assembly work environment / product information 121 and the assembly state transition information 122 in response to the request from the assembly work sequence planning device 10.

Next, FIG. 2 shows an example of an assembly work environment.

In the assembly work environment, two robots R1 and R2, a stage 312 provided on the pedestal 311,

trays

321 and 322, and a hand installation table 331 and 332 are provided.

The hand of the robot R1 is replaceable, and the hand 303 is attached in the figure. Similarly, the robot R2 has a replaceable hand, and the hand 304 is attached in the figure. In the present embodiment, the hand means an end effector of a robot (manipulator), and is not necessarily limited to a hand having a gripping structure. The hand also includes, for example, a tool such as a screwdriver.

The

trays

321 and 322 contain parts that are moved to and placed on the stage 312 in the next work and are subject to assembly work. On the

hand installation bases

331 and 332, replacement hands 333 and 334 that can be replaced with the currently mounted

hands

303 and 304 are placed.

In the assembly work environment shown in the figure, two robots R1 and R2 can operate at the same time to perform the work of assembling the parts placed on the stage 312.

Next, FIG. 3 shows an example of a product 40 for planning an assembly work order with the assembly work order planning device 10. The product 40 has a structure completed by arranging plate parts B and C on the base part A and fastening and fixing each of them with screw parts D and E.

FIG. 4 shows an AND / OR tree representing the assembly state transition of the product 40 shown in FIG.

The AND / OR tree has a tree structure, and each assembly state representing a single part, a substructure in which two or more parts are assembled, or a product is represented by each node (node) indicated by an ellipse. .. The transition from one assembly state to the next is represented by an edge (ridge) connecting one node to another.

As for the assembly state, the component unit is in 5 states of base part A, plate parts B and C, screw parts D and E, 2 states of 2 parts AB and AC, and ABC of 3 parts. , ABD, ACE 3 states, 4 parts assembly ABCD, ABCE 2 states, 4 parts finished product ABCDE 1 state, all 13 states.

For example, the node A represents an assembled state in which the base component A exists as a single unit, and by performing the work of assembling the plate component B to the base component A, the node A transitions to the node AB representing the assembled state of the component AB. Further, for example, the node AB represents an assembled state as a component in which the plate component B is assembled to the base component A, and by performing the work of assembling the plate component C to the component AB, the node AB is assembled. Transition to the node ABC representing the assembly state of ABC.

Note that the transition between nodes in the AND / OR tree reflects the contact constraint conditions between the parts. For example, the assembly work of the screw component D cannot be performed unless the plate component B is arranged on the base component A. Therefore, in order to select the assembly work of the screw component D, at least the node AB (assembly product AB) and the node ABC (component ABC) representing the assembly state in which the plate component B is arranged on the base component A. Alternatively, it is a condition that the node ABCE (component ABCE) has already been transitioned.

FIG. 5 shows a list of transitions to each assembly state in the AND / OR tree shown in FIG.

The product 40 may be initially assembled from either the plate parts B or C to the base part A. Further, the assembly product ABC may be assembled from either the screw parts D or E, and therefore, there are six assembly sequences until the finished product is obtained from a single part. There are 12 transitions to each assembly state, No1 to No12 shown in FIG.

Next, a method of setting the assembly work order based on the assembly state transition (AND / OR tree) (FIG. 4) of the product 40 will be described.

Although there were two robots R1 and R2 in the assembly work environment of FIG. 2, first, a case where the work subject is only one robot (one of the robots R1 and R2) will be described. A case where the work subject is a plurality of (for example, two) robots will be described.

As a premise, it is assumed that only one assembly state is changed by one work by one robot.

The work defined based on the assembly state transition in FIG. 4 is the following work W1 to W5.
Work W1: Assembly of base component A.
Work W2: Assembly of plate part B.
Work W3: Assembly of plate part C.
Work W4: Assembly of screw part D.
Work W5: Assembly of screw part D.

Constraints and rewards for assembly work are directly derived from the assembly state transition. For example, the work W4 (assembly of the screw component D) must have an assembled state of any of the component AB, the component ABC, or the component ABCE before the execution. Therefore, if the work W4 is selected in a state where any of the substructure AB, the substructure ABC, or the substructure ABCE does not exist as the assembled state, a negative reward is set.

Further, for example, assuming that the base component A is pre-mounted on the stage 312 in the initial state of the assembly work, a negative reward is given when the work W1 is selected first in the assembly work. To set.

If you want to design the work of placing the base component A on the stage 312, the state of the base component A is that it is mounted on the tray 321 (or 322) and that it is mounted on the stage 312. The state in which the work W1 is placed may be provided, and the state in which the product is placed on the tray 321 (or 322) may be changed to the state in which the work W1 is placed on the stage 312.

FIG. 6 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40, and the assembly state for each work.

The assembly work order is set by reinforcement learning in which the work subject is only one robot and five types of work W1 to W5 are selected based on the assembly state of the product 40.

The assembly work order shown in the figure is from work steps 0 to 4, where 0 is the initial state and 4 is the completed state. The state value 0 described corresponding to each part or the like means that the corresponding part or the like does not exist, and the state value 1 means that the corresponding part or the like exists. do. The initial state of assembly work is defined as the state in which each part exists as a single unit and the components and products do not exist. The completed state of the assembly work is defined as the state in which the product ABCDE exists.

Work step 0 is the initial state. In the next work step 1, the robot executes work W2 (assembly of the plate component B). As a result, the base component A and the plate component B disappear, and the assembly product AB appears.

In the next work step 2, the robot executes work W3 (assembly of plate parts C). As a result, the plate component C and the substructure AB disappear, and the substructure ABC appears. In the next work step 3, the robot executes work W4 (assembly of the screw component D). As a result, the screw component D and the component ABC disappear, and the component ABCD appears. In the next work step 4, the robot executes work W5 (assembly of the screw component E). As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears. As a result, the completed state is obtained and the assembly work is completed.

Note that the order of work W2 (assembly of plate part B) and work W3 (assembly of plate part C) may be executed first, and cannot be determined only from the transition of the assembled state.

Work W2 can be selected only in the assembled state of the base component A, the assembly AC, or the assembly ACE. If the constraint condition that the plate component B must be assembled before the plate component C is assembled, the assembly state that is the premise of the work W2 is only the base component A alone.

Next, an example of setting the work based on the state of the assembly work environment in addition to the assembly state transition of the product 40 will be described. As the state of the assembly work environment, the work of exchanging the hands of the robot is introduced.

When assembling the screw parts D and E, the driver hand shall be attached to the robot, and when assembling the plate parts B and C, the grip hand shall be attached to the robot.

In this case, as the state of the assembly work environment, "grip hand" and "driver hand" related to the robot hand may be added, added to the above-mentioned works W1 to W5, and the following works W6 to W8 may be defined.

Work W6: Attaching the grip hand.
Work W7: Installation of the driver hand.
Work W8: Removal of the hand.

And, as a precondition, it is set that work W2 (assembly of plate part B) and W3 (assembly of plate part C) can be performed only when the grip hand is attached. Similarly, it is set that the work W4 (assembly of the screw component D) and W5 (assembly of the screw component E) can be performed only when the driver hand is attached.

Regarding the installation of work W6 (attachment of grip hand) and W7 (attachment of driver hand), it is set that work can be performed only when the hand is not attached to the robot. Regarding the work W8 (removal of the hand), it is set that the work can be performed only when the grip hand or the driver hand is attached to the robot.

The initial state of the assembly work is defined as the state in which each part exists as a single unit and neither the clip hand nor the driver hand is attached to the robot. The completed state of the assembly work is defined as the state in which the product ABCDE is obtained and the hand is not attached to the robot.

A hand other than the driver hand and the grip hand may be prepared in the assembly work environment. Further, one robot may have a plurality of arms so that a driver hand and a grip hand can be attached at the same time.

FIG. 7 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40, the assembly work environment (the state of the robot hand), and the assembly state for each work.

The assembly work order is set by reinforcement learning in which the work subject is only one robot and eight types of work from work W1 to W8 are selected based on the assembly state and assembly work environment of the product 40.

The assembly work order shown in the figure is from work steps 0 to 8, where 0 is the initial state and 8 is the completed state. The state values described corresponding to the component A and the like are the same as in the case of FIG.

For example, in the initial state of work step 0, only a single part exists, and neither the clip hand nor the driver hand is attached to the robot.

In the next work step 1, the robot executes work W6 (attachment of the grip hand). As a result, the work W2 (assembly of the plate component B) and W3 (assembly of the plate component C) by the robot can be executed. In the next work step 2, the robot executes the work W2. As a result, the base component A and the plate component B disappear, and the assembly product AB appears. In the next work step 3, the robot executes the work W3. As a result, the plate component C and the substructure AB disappear, and the substructure ABC appears.

In the next work step 4, the robot executes work W8 (removal of the hand). As a result, the work W6 (mounting the grip hand) and W7 (mounting the driver hand) by the robot can be executed. In the next work step 5, the robot executes the work W7. As a result, the work W4 (assembly of the screw component D) and W5 (assembly of the screw component E) by the robot can be executed.

In the next work step 6, the robot executes work W4. As a result, the screw component D and the component ABC disappear, and the component ABCD appears. In the next work step 7, the robot executes the work W5. As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears. In the next work step 8, the robot executes work W8 (removal of the hand). As a result, the completed state is obtained and the assembly work is completed.

Next, a case where the work subject is a plurality of robots (as an example, the two robots R1 and R2 in FIG. 2) will be described.

In this case, a "grip hand" and a "driver hand" for each of the hands of the two robots are added as the state of the assembly work environment. The actions that the two robots can select are the tasks W1 to W8 defined as described above. However, since there may be a case where one of the two robots does not work (cannot work), the following work W0 is additionally defined.
Work W0: Standby.

There are no restrictions on the state that is the premise of work W0 or the state that can be transitioned from work 0.

The same work may be selected for two robots at the same time. However, if there are restrictions on the state before work and the state before work disappears due to the work selected for one robot, it will be negative if the same work is selected for the other robot. Try to set rewards.

The initial state of the assembly work is defined as the state in which each part exists as a single unit and the hand is not attached to each of the two robots. The completed state of the assembly work is defined as the state in which the product ABCDE is obtained and the hands are not attached to each of the two robots.

FIG. 8 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40 and the assembly work environment (hand state), and the assembly state for each work.

The assembly work sequence is strengthened so that the work subject that carries out the assembly work of the product 40 is two robots, and nine types of work from work W0 to W8 are selected based on the assembly state of the product 40 and the assembly work environment. Set by learning.

In the figure, the two robots are the robots R1 and R2, the state of the hand of the robot R1 is the R1 grip and the R1 driver, and the state of the hand of the robot R2 is the R2 grip and the R2 driver.

The assembly work order shown in the figure is from work steps 0 to 5, with 0 being the initial state and 5 being the completed state. The state values described corresponding to the component A and the like are the same as in the case of FIG.

For example, in the initial state of work step 0, there are only single parts, and there are no components or products. Further, the clip hand and the driver hand are not attached to each of the two robots R1 and R2.

In the next work step 1, the robot R1 executes the work W6 (attachment of the grip hand), and the robot 2 executes the work W7 (attachment of the driver hand). As a result, the work W2 (assembly of the plate part B) and W3 (assembly of the plate part C) by the robot R1 can be executed, and the work W4 (assembly of the screw part D) and W5 (assembly of the screw part E) by the robot R2 can be executed. Assembling) becomes feasible.

In the next work step 2, the robot R1 executes the work W2. As a result, the base component A and the plate component B disappear, and the assembly product AB appears. On the other hand, the robot R2 executes work W0 (standby). In the next work step 3, the robot R1 executes the work W3, and the robot R2 executes the work W4 (assembly of the screw component D). As a result, the plate component C, the screw component D, and the substructure AB disappear, and the substructure ABCD appears.

In the next work step 4, the robot R1 executes the work W8 (removal of the hand), and the robot R2 executes the work W5 (assembly of the screw component E). As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears. In the next work step 5, the robot R1 executes the work W0 (standby), and the robot R2 executes the work W8 (removal of the hand). As a result, the completed state is obtained and the assembly work is completed.

It is also possible to set the work subject as three or more robots in the assembly work order. Further, as an assembly work environment, a state such as

trays

321 and 322 and a stage 312 may be added in addition to the hand. Further, the work of the transport device for transporting the parts from a predetermined position to another position may be added to the definition of the work.

<Assembly work sequence planning process by the assembly work sequence planning device 10>
Next, FIG. 9 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device 10.

As a premise, the CAD system 20 has already modeled the shape model of the assembly work environment composed of the product shape model and the robot, and the assembly work environment / product information 121 and the assembly state transition information (AND / OR tree). It is assumed that 122 can be supplied to the assembly work sequence planning device 10.

The assembly work sequence planning process is started, for example, in response to a predetermined operation from the user.

First, the information acquisition unit 111 acquires the assembly work environment / product information 121 from the CAD system 20 and stores it in the storage unit 12 (step S1). Next, the information acquisition unit 111 acquires the assembly state transition information 122 from the CAD system 20 and stores it in the storage unit 12 (step S2).

Next, the assembly work definition unit 112 sets the pre-work state of the assembly work environment (robot hand, tray, stage, etc.) (step S3), and then the assembly work definition unit 112 sets the assembly state transition information. Based on 122, an assembly work is defined for an assembly state consisting of two parts or components before assembly, and the defined assembly work is stored in the storage unit 12 as assembly work information 123 (step S4). ..

Next, the constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124 (step S5).

Next, the action selection / value function component 114 defines the action selection function and the value function used for reinforcement learning (step S6). Next, the reward setting unit 115 sets a reward for the work selected for each assembly state (step S7).

Next, from the initial state of assembly, the learning unit 116 fails to select the work for the assembly state and fails to assemble, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds. , Reinforcement learning is executed by repeating an episode meaning a series of actions (step S8).

Specifically, the episode is repeated, the action selection for the state is learned, and when the result of the successful assembly is obtained, the reinforcement learning is completed. By completing the reinforcement learning, the behavior selection function that selects the good behavior for the state is obtained. In the processing of one episode, an action is selected for the initial state, and the next state is obtained. Select another action to get the next action. In this way, the action selection is repeated step by step to advance the state, and if the action selection is wrong or the state is not allowed, a negative reward is obtained and the episode ends. Conversely, if successful, the episode ends with a positive reward.

In reinforcement learning, the period when the number of repetitions of the episode is small does not lead to success and fails, but as the number of repetitions of the episode increases and the learning progresses, the episode becomes successful. Therefore, when the episode succeeds a predetermined number of times (for example, three times) in a row, the reinforcement learning is terminated.

Next, the assembly work sequence generation unit 117 uses the action selection function and the state value function obtained as a result of reinforcement learning to try episodes from the initial state, thereby completing a series of work until the product is completed. The selection results are connected to set the assembly work order (step S9).

According to the assembly work order planning process by the assembly work order planning device 10 described above, it is possible to plan the assembly order of each part constituting the product and the work order in which the robot is the main work.

<Second embodiment>
Next, FIG. 10 shows a configuration example of the assembly work sequence planning device 100 according to the second embodiment of the present invention.

The assembly work order planning device 100 is for planning an assembly work order when a product is completed by assembling parts by a work entity that may include a robot and a worker.

The assembly work sequence planning device 100 includes the assembly work definition unit 112 in the assembly work sequence planning device 10 (FIG. 1) according to the first embodiment of the present invention, the assembly state-based assembly work definition unit 112A, and the work state-based assembly. It is divided into the work definition unit 112B and the simulation instruction unit 118 is added.

Further, the assembly work order planning device 100 adds constraint condition determination simulation information 125 and assembly work order simulation information 126 as information stored in the storage unit 12.

Further, a robot simulator 30 to which the assembly work sequence planning device 100 can be connected via the network 1 is added to the outside of the assembly work sequence planning device 100. The robot simulator 30 executes a simulation of work by the robots R1 and R2 in which the assembly work environment is installed according to the instruction from the assembly work order planning device 100, and outputs the simulation result to the assembly work order planning device 100.

Of the components of the assembly work sequence planning device 100, those that are common to the components of the assembly work sequence planning device 10 (FIG. 1) are designated by the same reference numerals and the description thereof will be omitted.

The assembly state-based assembly work definition unit 112A defines the assembly work with reference to the assembly state transition information 122, similarly to the assembly work definition unit 112 in the assembly work sequence planning device 10 (FIG. 1).

The work state-based assembly work definition unit 112B refers to the assembly work environment / product information 121 and defines the work required for assembly for the state of the assembly work environment. For example, if a driver hand is required to assemble a part, the work of attaching the driver hand to the robot in the assembled state before assembling the part is defined. In this case, if the assembly state before assembling the parts and the driver hand is attached to the robot, the work of assembling the parts can be executed.

The robot hand is not the only one that can set the state as an assembly work environment. For example, when there are a plurality of stages in the assembly work environment, it is possible to set the state of whether or not the stage is being used for assembly. Further, since the robot or the worker who is the work subject is carrying out a certain work, the type of the work being carried out by the work subject may be regarded as the state of the work subject and set.

The simulation instruction unit 118 instructs the robot simulator 30 to perform an individual assembly work simulation for setting constraint conditions to the robot simulator 30 provided externally, and acquires the simulation result. In this case, the constraint condition setting unit 113 can modify the constraint condition based on the simulation result. Further, the simulation instruction unit 118 instructs the robot simulator 30 to perform an assembly work simulation according to the finally obtained assembly work order, and acquires the simulation result. The simulation results can be used to confirm that the finally obtained assembly work sequence is valid.

The constraint condition determination simulation information 125 is information including the conditions when the robot simulator 30 is instructed to perform the individual assembly work simulation for setting the constraint conditions, and the simulation result.

The assembly work order simulation information 126 is information including the conditions when the assembly work order simulation is instructed to the robot simulator 30 and the simulation result.

Next, FIG. 11 shows an example of the product 50 that plans the assembly work order by the assembly work order planning device 100.

The product 50 is completed by arranging the box component B on the base component A, fastening and fixing it with the screw components C and D, and connecting the base component A and the box component B with the wiring components E and F. Has a structure.

Of the plurality of assembly work until the product 50 is completed, the robot executes the work of arranging the box part B on the base part A and the work of fastening the screw parts C and D, and the wiring parts E and F. It is assumed that the worker performs the work of connecting the wires.

FIG. 12 shows a list of transitions to each assembly state in the assembly work of the product 50.

Due to the structure of the product 50, it is necessary to first assemble the box component B to the base component A to make the component AB appear. Assembling parts AB may be assembled from any of screw parts C and D, and wiring parts E and F. Therefore, the assembly order until a finished product is obtained from a single part is 1 × 4! = 24 ways.

The assembled state of the product 50 is 6 states of the base part A, the box part B, the screw parts C and D, and the wiring parts E and F, which are single units, and 1 state of the assembly product AB consisting of 2 parts, and 3 states. 4 states of ABC, ABD, ABE, ABF consisting of one part, 6 states of ABCD, ABCE, ABCF, ABDE, ABDF, ABEF consisting of 4 parts, and a group consisting of 5 parts. There are 22 states, 4 states of the product ABCDE, ABCDF, ABCEF, and ABDEF, and 1 state of the finished product ABCDEF composed of 6 parts.

The transition of the product 50 to each assembly state is 33 ways of No1 to No33 shown in FIG. Of these, the transition for assembling parts E and F is performed according to the work by the worker, and the transition for assembling other parts is performed according to the work by the robot.

Hereinafter, the work subject will be one robot and one worker. The robot shall use the grip hand for arranging the box component B and the driver hand for fastening the screw components C and D.

In this case, the work defined for the work-oriented robot and workers is as follows.

Work by robot Work W0: Standby.
Work W1: Assembling box part B.
Work W2: Assembly of screw part C.
Work W3: Assembly of screw part D.
Work W4: Attaching the grip hand.
Work W5: Installation of the driver hand.
Work W6: Removal of the hand.
Work by workers Work P0: Standby.
Work P1: Assembly of wiring component E.
Work P2: Assembling the wiring component F.

The following are examples of constraints. For example, for a robot, when the hand is not attached, work W4 (attachment of the grip hand) or W5 (attachment of the driver hand) can be selected, and the grip hand or the driver hand is attached. If it is in the state, work W6 (removal of the hand) can be selected. In order to select the work W1 (assembly of the box component B), it is necessary that the grip hand is attached. In order to select work W2 (assembly of screw component C) or W3 (assembly of screw component D), it is necessary that the driver hand is attached. Further, for example, in order to select the assembly of the work P1 (assembly of the wiring component E) and the assembly of the P2 wiring component F for the worker, the state before the assembly state transition is at least the box component in the base component A. It is necessary to be in the state after the component AB to which B is assembled.

The initial state of the assembly work is defined as a state in which only a single part exists, the base part A is pre-mounted on the stage 312, and neither the grip hand nor the driver hand is attached to the robot. do. The completed state of the assembly work is defined as a state in which the product ABCDEF is obtained and neither the grip hand nor the driver hand is attached to the robot.

FIG. 13 shows an example of the assembly work order set based on the assembly state transition (not shown) of the product 50 and the assembly work environment (hand state), and the assembly state for each work.

The assembly work sequence consists of one robot and one worker as the main work subject, and seven types of robot work W0 to W6 and work P0 by the worker based on the assembly state and assembly work environment of the product 50. It is set by reinforcement learning to select a total of 10 types of work with 3 types of ~ P2.

The assembly work order shown in the figure is from work step 0 to 7, with 0 being the initial state and 7 being the completed state. The state values described corresponding to the component A and the like are the same as in the case of FIG.

For example, in the initial state of work step 0, only a single part exists, and the robot is not equipped with a clip hand or a driver hand.

In the next work step 1, the robot executes work W4 (attachment of the grip hand), and the worker executes work P0 (standby). As a result, the work W1 (assembly of the box component B) by the robot becomes possible.

In the next work step 2, the robot executes the work W1 and the worker executes the work P0 (standby). As a result, the base component A and the box component B disappear, and the assembly product AB appears. In the next work step 3, the robot executes the work W6 (removal of the hand), and the worker executes P1 (assembly of the wiring component E). As a result, the wiring component E and the substructure AB disappear, and the substructure ABE appears.

In the next work step 4, the robot executes the work W5 (attaching the driver hand), and the worker executes the work P2 (assembling the wiring component F). As a result, the work W2 (assembly of the screw component C) and W3 (assembly of the screw component D) by the robot can be performed. Further, the wiring component F and the substructured product ABE disappear, and the substructured product ABEF appears.

In the next work step 5, the robot executes work W2 (assembly of screw parts C), and the worker executes work P0 (standby). As a result, the screw component C and the substructured product ABEF disappear, and the substructured product ABCEF appears.

In the next work step 6, the robot executes the work W3 (assembly of the screw component D), and the worker executes the work P0 (standby). As a result, the screw component D and the component ABCEF disappear, and the product ABCDEF appears.

In the next work step 7, the robot executes work W6 (removal of the hand), and the worker executes work P0 (standby). As a result, the completed state is obtained and the assembly work is completed.

It should be noted that the assembly work order can be planned even if the number of parts increases and the number of required assembly work increases, or if there are multiple robots or workers as the work subject. Furthermore, as an assembly work environment, the states of

trays

321, 322, stages 312, etc. are added in addition to the hand, and the work of the transport device is added to the work definition from a predetermined position to another position. You can also plan the assembly work sequence.

<Assembly work sequence planning process by the assembly work sequence planning device 100>
Next, FIG. 14 is a flowchart illustrating an example of the assembly work order planning process by the assembly work order planning device 100. Of the steps S21 to S32 of the assembly work order planning process, the processes of steps S21 to S23, S26, S28 to S31 are the steps S1 to S1 of the assembly work order planning process (FIG. 9) by the assembly work order planning device 10. Since it is the same as the processing of S3, S5, S6 to S9, the description thereof will be omitted as appropriate.

As a premise, the CAD system 20 has already modeled the shape model of the assembly work environment composed of the product shape model and the robot, etc., and the assembly work environment / product information 121 and the assembly state transition information (AND / OR tree). It is assumed that 122 can be supplied to the assembly work sequence planning device 10.

First, the information acquisition unit 111 acquires the assembly work environment / product information 121 and the assembly state transition information 122 from the CAD system 20 and stores them in the storage unit 12 (steps S21 and S22). Next, the assembly work definition unit 112 sets the state of the assembly work environment before work (step S23).

Next, the assembly state-based assembly work definition unit 112A defines and defines the assembly work for the assembly state consisting of two parts or parts before assembly based on the assembly state transition information 122. The work is stored in the storage unit 12 as the assembly work information 123 (step S24).

Next, the work state-based assembly work definition unit 112B refers to the assembly work environment / product information 121, and thus defines the work required for assembly with respect to the state of the assembly work environment (step S25).

Next, the constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124 (step S26).

Next, the simulation instruction unit 118 instructs the robot simulator 30 to perform an individual assembly work simulation for setting constraint conditions, and acquires the simulation result (step S27). Specifically, the assembly state before each assembly work and the state of the assembly work environment are set in the robot simulator 30, and the simulation of the target assembly work is executed. If the simulation causes a problem such as the robot interfering with another robot (collision) or the robot cannot take the posture to assemble the target parts due to insufficient movable angle of the joints, etc. It is judged that the assembly work is not feasible. Therefore, according to this simulation result, the assembly work that cannot be performed is additionally set as a constraint condition. Alternatively, the definition of the assembly work determined to be unrealizable may be deleted.

Next, the action selection / value function component 114 defines the action selection function and the value function used for reinforcement learning (step S28). Next, the reward setting unit 115 sets a reward for the work selected for each assembly state (step S29).

Next, from the initial state of assembly, the learning unit 116 fails to select the work for the assembly state and fails to assemble, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds. , Perform reinforcement learning by repeating episodes that mean a series of actions. Reinforcement learning ends when a predetermined number of episodes (for example, three times) are successful in succession (step S30).

Next, the assembly work sequence generation unit 117 uses the action selection function and the state value function obtained as a result of reinforcement learning to try episodes from the initial state, thereby performing a series of work until the product is completed. An assembly work sequence in which the selection results are connected is generated (step S31).

Next, the simulation instruction unit 118 instructs the robot simulator 30 to simulate the assembly work order obtained in step S31, acquires the simulation result, and confirms that there is no problem (there is no problem). Step S32).

According to the assembly work order planning process by the assembly work order planning device 100 described above, it is possible to plan the assembly order of each part constituting the product and the work order of the work main body including the robot and the worker. Become.

<About reinforcement learning>
Next, the reinforcement learning adopted in each embodiment of the present invention described above will be described.

In general, the method of reinforcement learning is based on a model of Markov decision process (MDP), which selects an action and updates the state in a certain state, and the action or state is good or bad. That is, it is based on the model that value accompanies.

As a method of reinforcement learning, action selection is executed using the value of the action value table, and Q-learning that updates the table value by learning the action selection, and action from the action value using the action value function as a deep neural network. DQN (Deep Q-learning Network) that selects the above, and A3C (Asynchronous Advantage Actor-Critic) that uses the action selection function (policy) and the state value function as a deep neural network are known.

There are many other methods, but all of them are modeled by state, behavior, and reward as problems of reinforcement learning.

Hereinafter, the A3C adopted in the above-described embodiment will be described in detail. In reinforcement learning, each step in one episode, the action is selected for the state to transition to the next state, and the result of the action selection or the state is rewarded. Then, if the reward meets the end condition of the episode, the episode ends. Then, when the episode is repeated and the correct action can be selected for the state, or when the action selected for the state becomes deterministic, the reinforcement learning ends.

In A3C, as the word Asynchronous implies, multiple agents (also called actors) perform episodes individually, and each agent trains (learns) a single action selection function, state value function. ), And it is a method used for action selection.

FIG. 14 is a diagram for explaining the outline of A3C. In the case of the figure, there are three agents. The action selection function and the state value function 1401 are configured as a shared deep neural network (DNN). This is referred to as a shared DNN. Further, in the case of the figure, the state variables of the input layer are set to 3 types, and the actions of the output layer are set to 3 types. The value (state value) of the output layer is always one.

The middle layer is one layer, and the number of nodes is four. The activation function of the node sets softmax in the output layer of action, linear in the output layer of value, and ReLU (Rectified Linear Unit) in the intermediate layer. Each agent has a memory for storing states, actions, next states, rewards, and a DNN for selecting its own action. For example, the agent 1 includes a memory 1411 and a DNN 1421. The agent 2 includes a memory 1412 and a DNN 1422, and the agent 3 includes a memory 1413 and a DNN 1423.

Each agent individually trains the shared DNN once the step data is accumulated in each episode process. Then, after the training, the shared DNN is copied to the own DNN, and the action for the state is selected by using the own DNN. This is a configuration for training and action selection not to conflict because the timing of DNN training and action selection is asynchronous between each agent.

When performing reinforcement learning with a single agent, it is not necessary to provide a shared DNN. The learning method in this case is called A2C (Advantage Actor-Critic). The advantage in A2C is the difference between the action value and the state value, that is, the quantity indicating the goodness of the action selection, and the Actor-Critic in the A2C is a method in which the action selection and the state value evaluation are calculated separately. Means that

Next, FIG. 16 is a diagram for explaining the processing contents of the training of the action selection function and the state value function, and schematically shows the DNN (deep neural network) 1501.

The DNN1501 takes the state s as an input, the output (measure) of action selection is π (s), and the output of the state value is V (s). Let θ be the parameter of the function (activation function) defined for the node in DNN1501. The policy is also called a probability policy, and is also expressed as the probability π (a | s) of taking action a in the state s.

The training of the action value function and the state value function optimizes the parameter θ of the DNN 1501 so that the relationship predicting the relationship can be obtained from the data of the state s, the action a, the next state s', and the reward r. That is. In particular, it is important to be able to correctly estimate the relationship between state value and behavior.

The state value function V (s) is expressed by the following equation (1) under the policy π (s).

Here, E means the expected value in the subscript π (s). The discount rate γ is a coefficient for correcting the next state value (future value) to the current value. The value of the discount rate γ is 0.99 as an example, but it can be adjusted as a parameter for reinforcement learning.

The value of the policy is expressed by the following equation (2) as the expected value in the distribution ρ of the state s.

Equation (2) is called the discounted reward function. There is a policy gradient theorem expressed by the following equation (3) in the change of the discounted reward function with respect to the parameter θ of the DNN.

Here, ∇ means the gradient (first-order partial differential with respect to the basis). The expected value depends on the range of the state s over the distribution ρ of the policy π, and the action a depends on the range of the policy π for the state s. The action value function Q (s, a) and the advantage function A (s, a) are as shown in the following equations (4) and (5), and can be calculated from the data.

Then, if the network parameter θ is optimized so as to maximize the value of the discounted reward function (Equation (2)), the relationship between the policy and the state value can be obtained so that a good reward for the state can be obtained.

If the negative (minus) of the discounted reward function is the policy loss, the policy loss should be minimized. In addition to the policy loss, it is desirable that the action value and the state value match, so there is also a value loss that means the magnitude of the advantage absolute value. In addition, it is desirable that the policy is uniquely determined for the state, and since the policy is stochastically determined, the regularization term modeled by entropy can also be used for optimization. Therefore, the loss function L is defined as the following equation (6) using the _{policy loss L π} , the value loss L _v , and the regularization term L _{reg, and the loss function is minimized.}

Here, c _v and c _reg are coefficients. The policy loss L _π is as shown in the following equation (7) from the definition of the discounted reward function. N used for training is the number of data in the step.

The value loss L _v is the square of the advantage function A (s, a) as shown in the following equation (8).

The regularization term is obtained by calculating the entropy H (π (s)) as shown in the following equations (9) and (10).

Here, n _action is the number of actions.

Training by optimization calculation uses the gradient method in the case of deep neural networks. This training itself is similar to learning of a deep neural network, which is different from reinforcement learning, and can be realized by utilizing deep neural network technology.

For various algorithms of reinforcement learning, a method of utilizing the characteristics of search for reinforcement learning and utilization of learning results (exploration-exploitation) such as the ε-greedy method may be used as a technique in the field of reinforcement learning technology.

For work selection in assembly work order generation, the defined work is taken as an action. The state is the assembly state and the state of the assembly work environment, and the variable value is 0 or 1. As an example, the positive reward value may be 1, the negative reward value may be -1, and 0 may be set when no reward is generated.

The above is the explanation of the processing contents of the training of the action selection function and the state value function. This is the end of the explanation of the reinforcement learning adopted in the embodiment of the present invention.

The present invention is not limited to the above-described embodiment, and various modifications are possible. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. Further, it is possible to replace or add a part of the configuration of one embodiment with the configuration of another embodiment.

Further, each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a recording device such as a hard disk or SSD, or a recording medium such as an IC card, SD card, or DVD. In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.

10 ... Assembly work order planning device, 11 ... Calculation unit, 111 ... Information acquisition unit, 112 ... Assembly work definition unit, 112A ... Assembly status base assembly work definition unit, 112B ... Work state-based assembly work definition unit, 113 ... constraint condition setting unit, 114 ... action selection / value function configuration unit, 115 ... reward setting unit, 116 ... learning unit, 117 ... assembly work Order generation unit, 118 ... Simulation instruction unit, 12 ... Storage unit, 121 ... Assembly work environment / product information, 122 ... Assembly status transition information, 122 ... Assembly status transition information, 123.・・ Assembly work information, 124 ・・・ constraint condition information, 125 ・・・ constraint condition judgment simulation information, 126 ・・・ assembly work order simulation information, 13 ・・・ input unit, 14 ・・・ output unit, 15 ・・・ Communication unit, 20 ・・・ CAD system, 30 ・・・ robot simulator, 40, 50 ・・・ product, 100 ・・・ assembly work sequence planning device, 303, 304 ・・・ hand, 311 ・・・ pedestal , 312 ... Stage, 321, 322 ... Tray, 331, 332 ... Hand installation stand, 333, 334 ... Replacement hand

Claims

An information acquisition unit that acquires assembly state transition information, including information that represents the process of assembling into the final product through a component that is assembled with multiple parts.
Based on the assembly state transition information, the assembly work definition unit that defines the work that the work entity can perform for the assembly state consisting of the two parts or the parts before assembly, and the assembly work definition unit.
A constraint condition setting unit that sets a constraint condition regarding whether or not the work can be executed, and a constraint condition setting unit.
A learning unit that reinforces learning how to select the work for the assembled state according to the constraints.
An assembly work order generation unit that generates an assembly work order of the product based on the result of the reinforcement learning,
An assembly work sequence planning device characterized by comprising.
The assembly work sequence planning apparatus according to claim 1.
The information acquisition unit acquires information on an assembly work environment / product including information on an object existing in the assembly work environment and information on a plurality of parts constituting the product.
The assembly work definition unit defines the work that can be executed for the state of the assembly work environment based on the assembly work environment / product information.
The learning unit is an assembly work sequence planning device, characterized in that it reinforces and learns a method of selecting the work with respect to the assembly state and the state of the assembly work environment in accordance with the constraint conditions.
The assembly work sequence planning apparatus according to claim 2.
In the reinforcement learning, the learning unit
The relationship between the assembly work environment, the work, and the constraints is defined by the action selection function and the state value function.
Define the initial state and work completion state of the assembly work environment,
In learning in episodes
With respect to the state of the assembly work environment during the assembly work, the step of selecting the action of the work subject and obtaining the next state is repeated.
If the selected action does not meet the work constraints, a negative reward is given to end the episode and end the episode.
If the work is completed due to the selected action, a positive reward will be given to end the episode and the episode will end.
The action selection function and the state value function are trained from the obtained state, action, and reward data.
An assembly work sequence planning device characterized by learning how to select work for the state of the assembly work environment by repeating the episode.
The assembly work sequence planning apparatus according to claim 2.
An assembly work sequence planning apparatus, wherein the assembly work environment includes a robot as a work subject, one of the robot's hands, a stage, a tray, and at least one of a transfer device.
The assembly work sequence planning apparatus according to claim 1.
The constraint condition setting unit is characterized in that an assembly state before execution of work that can be executed by a work subject with respect to the assembly state, which is defined based on the assembly state transition information, is set as the constraint condition. Assembly work sequence planning device.
The assembly work sequence planning apparatus according to claim 2.
The constraint condition setting unit constrains the assembly state before execution of the work that can be executed and the state of the assembly work environment for the state of the assembly work environment defined based on the assembly work environment / product information. An assembly work sequence planning device characterized by being set as a condition.
The assembly work sequence planning apparatus according to claim 2.
The assembly work sequence planning device, characterized in that the assembly work environment includes at least one robot as the work subject and at least one worker.
The assembly work sequence planning apparatus according to claim 1.
The robot simulator is provided with a simulation instruction unit for instructing an individual assembly work simulation for setting the constraint conditions.
The constraint condition setting unit is an assembly work sequence planning device characterized in that the constraint condition is corrected based on the result of the individual assembly work simulation.
The assembly work sequence planning apparatus according to claim 8.
The simulation instruction unit is an assembly work order planning device, characterized in that the robot simulator is instructed to perform an assembly work simulation according to the generated assembly work order.
It is an assembly work sequence planning method using an assembly work sequence planning device.
Acquire assembly state transition information including information showing the process from assembling to the final product through a component in which multiple parts are assembled.
Based on the assembly state transition information, the work that can be performed by the work entity for the assembly state consisting of the two parts or the parts before assembly is defined.
Set constraints on whether or not the above work can be executed,
Reinforcement learning of the work selection method for the assembled state according to the constraints.
An assembly work sequence planning method comprising a step of generating an assembly work sequence of the product based on the result of the reinforcement learning.