WO2022014128A1 - Dispositif de planification d'ordre de travail d'assemblage et procédé de planification d'ordre de travail d'assemblage - Google Patents

Dispositif de planification d'ordre de travail d'assemblage et procédé de planification d'ordre de travail d'assemblage Download PDF

Info

Publication number
WO2022014128A1
WO2022014128A1 PCT/JP2021/017731 JP2021017731W WO2022014128A1 WO 2022014128 A1 WO2022014128 A1 WO 2022014128A1 JP 2021017731 W JP2021017731 W JP 2021017731W WO 2022014128 A1 WO2022014128 A1 WO 2022014128A1
Authority
WO
WIPO (PCT)
Prior art keywords
assembly
work
state
assembly work
product
Prior art date
Application number
PCT/JP2021/017731
Other languages
English (en)
Japanese (ja)
Inventor
利浩 森澤
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2022014128A1 publication Critical patent/WO2022014128A1/fr

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23PMETAL-WORKING NOT OTHERWISE PROVIDED FOR; COMBINED OPERATIONS; UNIVERSAL MACHINE TOOLS
    • B23P21/00Machines for assembling a multiplicity of different parts to compose units, with or without preceding or subsequent working of such parts, e.g. with programme control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • the present invention relates to an assembly work sequence planning device and an assembly work sequence planning method.
  • the present invention claims the priority of application number 2020-121910 of the Japanese patent filed on July 16, 2020, and for designated countries where incorporation by reference to the literature is permitted, the content described in the application is Incorporated into this application by reference.
  • an efficient work method and order are planned in advance.
  • the worker and the robot may work individually, but also the worker and the robot may work together.
  • Patent Document 1 For example, in Patent Document 1, part attributes and part arrangements for each part and adjacency relationship information with other parts are extracted from a three-dimensional CAD (Computer Aided Design) model of a product, and between parts.
  • CAD Computer Aided Design
  • a technique is described in which an assembly graph is generated as a directed graph, an assembly graph is generated from adjacency information, and an assembly order is generated as the reverse order by obtaining the decomposition order of parts.
  • Patent Document 2 a technique of inputting 3D CAD data, performing an assembly work plan with a task planner, modeling the assembly work with a Petri net, and searching for an optimum route to generate a robot program offline. Is described.
  • Patent Document 3 when a simulation of a work operation by a robot is executed and a control command is determined based on the execution result, a result label for each of a case where the determination result is good and a case where the determination result is bad is trained. As data, a technique for learning control commands is described.
  • Patent Document 1 generates an assembly order by using a directed graph and an assembly graph based on the priority relationship and the adjacency relationship of the connection between the parts extracted from the three-dimensional CAD data, and the parts in the product.
  • a huge number of assembly sequences will be generated according to the number of.
  • the work sequence of the work subject cannot be generated.
  • Patent Document 2 includes an assembly work planning method, but the assembly order is layered into a product state level, a target movement level, and a hand movement level, and is modeled by a petri net.
  • the assembly order is determined at the stage when the petri net is configured, but since it is not a method of automatically configuring (automatically modeling) the petri net, it is not possible to generate an assembly work order in which robots and workers are the main workers.
  • Patent Document 3 uses reinforcement learning technique to learn control commands of machines including robots, and the learning is advanced by executing a simulation of work operation. Since the assembly work sequence is already defined when defining a series of work operations, it does not generate an assembly work sequence.
  • the present invention has been made in view of the above points, and an object of the present invention is to be able to plan the assembly order of each component constituting a product and the work order of a work subject including a robot and a worker. And.
  • the present application includes a plurality of means for solving at least a part of the above problems, and examples thereof are as follows.
  • the assembly work sequence planning device is an assembly state including information indicating a process from assembling to a final product through a component in which a plurality of parts are assembled.
  • An assembly work that defines work that can be performed by a work subject for an assembly state consisting of two parts or parts before assembly, based on an information acquisition unit that acquires transition information and the assembly state transition information.
  • a definition unit a constraint condition setting unit that sets a constraint condition regarding whether or not the work can be executed, a learning unit that reinforces learning how to select the work for the assembly state according to the constraint condition, and a result of the reinforcement learning. It is characterized by including an assembly work order generation unit that generates an assembly work order of the product based on the above.
  • FIG. 1 is a diagram showing a configuration example of an assembly work sequence planning device according to the first embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of an assembly work environment.
  • FIG. 3 is a diagram showing an example of a product composed of a plurality of parts.
  • FIG. 4 is a diagram showing an AND / OR tree representing the assembly state transition of the product shown in FIG.
  • FIG. 5 is a diagram showing a list of transitions to each assembly state corresponding to FIG.
  • FIG. 6 is a diagram showing an example of an assembly work order in which the work subject is set as one robot based on the assembly state transition of the product, and an assembly state for each work.
  • FIG. 1 is a diagram showing a configuration example of an assembly work sequence planning device according to the first embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of an assembly work environment.
  • FIG. 3 is a diagram showing an example of a product composed of a plurality of parts.
  • FIG. 4 is a diagram
  • FIG. 7 is a diagram showing an example of an assembly work order in which a work subject is set as one robot based on an assembly state transition and an assembly work environment, and an assembly state for each work.
  • FIG. 8 is a diagram showing an example of an assembly work order in which the work subject is set as two robots based on the assembly state transition and the assembly work environment, and the assembly state for each work.
  • FIG. 9 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device according to the first embodiment.
  • FIG. 10 is a diagram showing a configuration example of an assembly work sequence planning device according to a second embodiment of the present invention.
  • FIG. 11 is a diagram showing an example of a product composed of a plurality of parts.
  • FIG. 12 shows a list of transitions to each assembly state corresponding to the assembly operation of FIG.
  • FIG. 13 is a diagram showing an example of an assembly work order in which the work subject is set as one robot and one worker based on the assembly state transition of the product and the assembly work environment, and the assembly state for each work. ..
  • FIG. 14 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device according to the second embodiment.
  • FIG. 15 is a diagram for explaining the outline of A3C.
  • FIG. 16 is a diagram for explaining the processing contents of the training of the action selection function and the state value function.
  • FIG. 1 shows a configuration example of the assembly work sequence planning device 10 according to the first embodiment of the present invention.
  • the assembly work order planning device 10 is for planning the assembly work order when the parts are assembled and the product is completed by the work entity including the robot and the worker.
  • the assembly work order planning device 10 includes each functional block of a calculation unit 11, a storage unit 12, an input unit 13, an output unit 14, and a communication unit 15.
  • the assembly work sequence planning device 10 includes a processor such as a CPU (Central Processing Unit), a memory such as a DRAM (Dynamic Random Access Memory), a storage such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a keyboard, and a mouse. It consists of an input device such as a touch panel, an output device such as a display, and a general computer such as a personal computer equipped with a communication module such as an NIC (Network Interface Card).
  • a processor such as a CPU (Central Processing Unit), a memory such as a DRAM (Dynamic Random Access Memory), a storage such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), a keyboard, and a mouse. It consists of an input device such as a touch panel, an output device such as a display, and a general computer such as a personal computer equipped with a communication module such as an NIC (Network Interface Card).
  • NIC Network Interface Card
  • the arithmetic unit 11 is realized by a computer processor.
  • the calculation unit 11 is a functional block of an information acquisition unit 111, an assembly work definition unit 112, a constraint condition setting unit 113, an action selection / value function configuration unit 114, a reward setting unit 115, a learning unit 116, and an assembly work order generation unit 117.
  • These functional blocks are realized by the processor of the computer executing a predetermined program loaded in memory. However, a part or all of these functional blocks may be realized as hardware by an integrated circuit or the like.
  • the information acquisition unit 111 is connected to the CAD (Computer Aided Design) system 20 connected to the network 1 including the Internet and the mobile phone communication network via the communication unit 15, and the assembly work environment / product information 121 and the assembly state transition information.
  • the 122 is acquired and stored in the storage unit 12.
  • the assembly work environment / product information 121 includes an object (for example, a robot, a stage, a tray for mounting parts, a transfer device, a robot) existing in the assembly work environment, which is modeled as CAD data in advance by the CAD system 20. Information indicating the shape and position of the hand or tool attached to the arm, the frame structure, etc.) is included. Further, the assembly work environment / product information 121 includes information on a plurality of parts constituting the product.
  • the assembly state transition information 122 includes information indicating the process from assembling to the final product through the assembly product in which a plurality of parts are assembled, and the contact constraint condition.
  • the contact constraint condition represents a constraint in the moving direction of the parts due to contact between the parts. According to the contact constraint condition, the transition relationship of the assembly state in the assembly process can be obtained.
  • the assembly work definition unit 112 refers to the assembly state transition information 122, defines the assembly work for the assembly state consisting of two parts or components before assembly, and defines the defined assembly work as the assembly work information 123. Is stored in the storage unit 12.
  • the assembly work definition unit 112 can set the state of the assembly work environment (robot hand, tray, stage, etc.) before work. Further, the assembly work definition unit 112 can set a work for transitioning the state of the assembly work environment such as a stage.
  • the constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124. Setting the constraint condition is to associate the work that cannot be performed with the assembled state.
  • the assembly work is defined from the assembly state transition, when the work is selected for the assembly state in reinforcement learning, and when the assembly state is not expected in advance of the work, the work cannot be selected due to constraints. Judged. The same applies to the selection of work for the state of the assembly work environment.
  • Constraints may be set in the order of assembly work. For example, if a task must be performed prior to assembling a component, the task must be selected in the presence of the component. Conversely, if a task must be performed after assembly of a component, the task must be selected in the absence of the component.
  • the action selection / value function component 114 is an action selection function that expresses the relationship between the state and the action, which is used for reinforcement learning to select the work for the assembly state and set the assembly work order for planning the assembly work order.
  • a value function that expresses the relationship between the state and the state value.
  • the items of the state (state) and the action (action) are necessary, and these are acquired from the assembly state transition information 122 and the assembly work information 123. ..
  • the assembled state corresponds to the state as a term in the field of reinforcement learning technology
  • the work corresponds to the action as a term in the field of reinforcement learning technology.
  • A3C Asynchronous Advantage Actor-Critic of the on-policy method, which is an example of deep reinforcement learning using a deep neural network as an approximate function, is adopted.
  • A3C a neural network is constructed by inputting a state and outputting a selected action and a value function. The number of layers in the intermediate layer of the neural network and the number of nodes in each layer are set in advance.
  • DQN Deep Q-learning Network
  • table Q learning a Q table (arrangement) that associates states with action values is constructed. Either method may be adopted, but in order to perform reinforcement learning, it is necessary to construct a mechanism for obtaining the action selection and the value for evaluating the action selection from the state. Reinforcement learning will be described later with reference to FIGS. 15 and 16.
  • the reward setting unit 115 sets a negative reward for the assembled state when a work that cannot be performed because it conflicts with the constraint conditions is selected. In addition, when the product is completed and the assembly work is completed, a positive reward is set.
  • the learning unit 116 is a series of whether the work selection for the assembly state fails and the assembly fails, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds. Reinforcement learning is performed by repeating episodes that mean actions.
  • the episode is repeated, the action selection for the state is learned, and when the result of the successful assembly is obtained, the reinforcement learning is completed.
  • the behavior selection function that selects the good behavior for the state is obtained.
  • an action is selected for the initial state, and the next state is obtained. Select another action to get the next action.
  • the action selection is repeated step by step to advance the state, and if the action selection is wrong or the state is not allowed, a negative reward is obtained and the episode ends. Conversely, if successful, the episode ends with a positive reward.
  • learning may be referred to as training.
  • the assembly work order generation unit 117 sets the assembly work order based on the result of reinforcement learning by the learning unit 116. Specifically, based on the action selection function obtained as a result of reinforcement learning, a series of actions selected in the process from the initial state of assembly to the state of successful assembly in which the assembly work is completed are generated as the assembly work order. It also generates the assembly status and assembly work environment status at each work stage.
  • the storage unit 12 is realized by the memory and storage of the computer.
  • the storage unit 12 stores the assembly work environment / product information 121, the assembly state transition information 122, the assembly work information 123, and the constraint condition information 124. Information other than these may be stored in the storage unit 12.
  • the input unit 13 is realized by an input device of a computer.
  • the input unit 13 receives various operations from the operator (user).
  • the output unit 14 is realized by an output device of a computer.
  • the output unit 14 displays, for example, an operation input screen. It is realized by the communication unit 15 and the communication module of the computer.
  • the communication unit 15 connects to the CAD system 20 via the network 1 and receives predetermined information from the CAD system 20.
  • the CAD system 20 supplies the assembly work environment / product information 121 and the assembly state transition information 122 in response to the request from the assembly work sequence planning device 10.
  • FIG. 2 shows an example of an assembly work environment.
  • the hand of the robot R1 is replaceable, and the hand 303 is attached in the figure.
  • the robot R2 has a replaceable hand, and the hand 304 is attached in the figure.
  • the hand means an end effector of a robot (manipulator), and is not necessarily limited to a hand having a gripping structure.
  • the hand also includes, for example, a tool such as a screwdriver.
  • the trays 321 and 322 contain parts that are moved to and placed on the stage 312 in the next work and are subject to assembly work.
  • replacement hands 333 and 334 that can be replaced with the currently mounted hands 303 and 304 are placed.
  • two robots R1 and R2 can operate at the same time to perform the work of assembling the parts placed on the stage 312.
  • FIG. 3 shows an example of a product 40 for planning an assembly work order with the assembly work order planning device 10.
  • the product 40 has a structure completed by arranging plate parts B and C on the base part A and fastening and fixing each of them with screw parts D and E.
  • FIG. 4 shows an AND / OR tree representing the assembly state transition of the product 40 shown in FIG.
  • the AND / OR tree has a tree structure, and each assembly state representing a single part, a substructure in which two or more parts are assembled, or a product is represented by each node (node) indicated by an ellipse. ..
  • the transition from one assembly state to the next is represented by an edge (ridge) connecting one node to another.
  • the component unit is in 5 states of base part A, plate parts B and C, screw parts D and E, 2 states of 2 parts AB and AC, and ABC of 3 parts.
  • the node A represents an assembled state in which the base component A exists as a single unit, and by performing the work of assembling the plate component B to the base component A, the node A transitions to the node AB representing the assembled state of the component AB.
  • the node AB represents an assembled state as a component in which the plate component B is assembled to the base component A, and by performing the work of assembling the plate component C to the component AB, the node AB is assembled. Transition to the node ABC representing the assembly state of ABC.
  • the transition between nodes in the AND / OR tree reflects the contact constraint conditions between the parts.
  • the assembly work of the screw component D cannot be performed unless the plate component B is arranged on the base component A. Therefore, in order to select the assembly work of the screw component D, at least the node AB (assembly product AB) and the node ABC (component ABC) representing the assembly state in which the plate component B is arranged on the base component A.
  • FIG. 5 shows a list of transitions to each assembly state in the AND / OR tree shown in FIG.
  • the product 40 may be initially assembled from either the plate parts B or C to the base part A. Further, the assembly product ABC may be assembled from either the screw parts D or E, and therefore, there are six assembly sequences until the finished product is obtained from a single part. There are 12 transitions to each assembly state, No1 to No12 shown in FIG.
  • the work defined based on the assembly state transition in FIG. 4 is the following work W1 to W5.
  • Work W1 Assembly of base component A.
  • Work W2 Assembly of plate part B.
  • Work W3 Assembly of plate part C.
  • Work W4 Assembly of screw part D.
  • Work W5 Assembly of screw part D.
  • Constraints and rewards for assembly work are directly derived from the assembly state transition.
  • the work W4 (assembly of the screw component D) must have an assembled state of any of the component AB, the component ABC, or the component ABCE before the execution. Therefore, if the work W4 is selected in a state where any of the substructure AB, the substructure ABC, or the substructure ABCE does not exist as the assembled state, a negative reward is set.
  • a negative reward is given when the work W1 is selected first in the assembly work. To set.
  • the state of the base component A is that it is mounted on the tray 321 (or 322) and that it is mounted on the stage 312.
  • the state in which the work W1 is placed may be provided, and the state in which the product is placed on the tray 321 (or 322) may be changed to the state in which the work W1 is placed on the stage 312.
  • FIG. 6 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40, and the assembly state for each work.
  • the assembly work order is set by reinforcement learning in which the work subject is only one robot and five types of work W1 to W5 are selected based on the assembly state of the product 40.
  • the assembly work order shown in the figure is from work steps 0 to 4, where 0 is the initial state and 4 is the completed state.
  • the state value 0 described corresponding to each part or the like means that the corresponding part or the like does not exist, and the state value 1 means that the corresponding part or the like exists. do.
  • the initial state of assembly work is defined as the state in which each part exists as a single unit and the components and products do not exist.
  • the completed state of the assembly work is defined as the state in which the product ABCDE exists.
  • Work step 0 is the initial state.
  • the robot executes work W2 (assembly of the plate component B).
  • work W2 assembly of the plate component B
  • the base component A and the plate component B disappear, and the assembly product AB appears.
  • the robot executes work W3 (assembly of plate parts C). As a result, the plate component C and the substructure AB disappear, and the substructure ABC appears.
  • the robot executes work W4 (assembly of the screw component D). As a result, the screw component D and the component ABC disappear, and the component ABCD appears.
  • the robot executes work W5 (assembly of the screw component E). As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears. As a result, the completed state is obtained and the assembly work is completed.
  • work W2 (assembly of plate part B) and work W3 (assembly of plate part C) may be executed first, and cannot be determined only from the transition of the assembled state.
  • Work W2 can be selected only in the assembled state of the base component A, the assembly AC, or the assembly ACE. If the constraint condition that the plate component B must be assembled before the plate component C is assembled, the assembly state that is the premise of the work W2 is only the base component A alone.
  • the driver hand When assembling the screw parts D and E, the driver hand shall be attached to the robot, and when assembling the plate parts B and C, the grip hand shall be attached to the robot.
  • Work W6 Attaching the grip hand.
  • Work W7 Installation of the driver hand.
  • Work W8 Removal of the hand.
  • work W2 (assembly of plate part B) and W3 (assembly of plate part C) can be performed only when the grip hand is attached.
  • work W4 (assembly of the screw component D) and W5 (assembly of the screw component E) can be performed only when the driver hand is attached.
  • the initial state of the assembly work is defined as the state in which each part exists as a single unit and neither the clip hand nor the driver hand is attached to the robot.
  • the completed state of the assembly work is defined as the state in which the product ABCDE is obtained and the hand is not attached to the robot.
  • a hand other than the driver hand and the grip hand may be prepared in the assembly work environment. Further, one robot may have a plurality of arms so that a driver hand and a grip hand can be attached at the same time.
  • FIG. 7 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40, the assembly work environment (the state of the robot hand), and the assembly state for each work.
  • the assembly work order is set by reinforcement learning in which the work subject is only one robot and eight types of work from work W1 to W8 are selected based on the assembly state and assembly work environment of the product 40.
  • the assembly work order shown in the figure is from work steps 0 to 8, where 0 is the initial state and 8 is the completed state.
  • the state values described corresponding to the component A and the like are the same as in the case of FIG.
  • the robot executes work W6 (attachment of the grip hand).
  • the work W2 assembly of the plate component B
  • W3 assembly of the plate component C
  • the robot executes the work W2.
  • the base component A and the plate component B disappear, and the assembly product AB appears.
  • the robot executes the work W3.
  • the plate component C and the substructure AB disappear, and the substructure ABC appears.
  • the robot executes work W8 (removal of the hand).
  • the work W6 mounting the grip hand
  • W7 mounting the driver hand
  • the robot executes the work W7.
  • the work W4 assembly of the screw component D
  • W5 assembly of the screw component E
  • the robot executes work W4. As a result, the screw component D and the component ABC disappear, and the component ABCD appears.
  • the robot executes the work W5. As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears.
  • the robot executes work W8 (removal of the hand). As a result, the completed state is obtained and the assembly work is completed.
  • a "grip hand” and a “driver hand” for each of the hands of the two robots are added as the state of the assembly work environment.
  • the actions that the two robots can select are the tasks W1 to W8 defined as described above. However, since there may be a case where one of the two robots does not work (cannot work), the following work W0 is additionally defined. Work W0: Standby.
  • the same work may be selected for two robots at the same time. However, if there are restrictions on the state before work and the state before work disappears due to the work selected for one robot, it will be negative if the same work is selected for the other robot. Try to set rewards.
  • the initial state of the assembly work is defined as the state in which each part exists as a single unit and the hand is not attached to each of the two robots.
  • the completed state of the assembly work is defined as the state in which the product ABCDE is obtained and the hands are not attached to each of the two robots.
  • FIG. 8 shows an example of the assembly work order set based on the assembly state transition (FIG. 4) of the product 40 and the assembly work environment (hand state), and the assembly state for each work.
  • the assembly work sequence is strengthened so that the work subject that carries out the assembly work of the product 40 is two robots, and nine types of work from work W0 to W8 are selected based on the assembly state of the product 40 and the assembly work environment. Set by learning.
  • the two robots are the robots R1 and R2, the state of the hand of the robot R1 is the R1 grip and the R1 driver, and the state of the hand of the robot R2 is the R2 grip and the R2 driver.
  • the assembly work order shown in the figure is from work steps 0 to 5, with 0 being the initial state and 5 being the completed state.
  • the state values described corresponding to the component A and the like are the same as in the case of FIG.
  • the robot R1 executes the work W6 (attachment of the grip hand), and the robot 2 executes the work W7 (attachment of the driver hand).
  • the work W2 (assembly of the plate part B) and W3 (assembly of the plate part C) by the robot R1 can be executed, and the work W4 (assembly of the screw part D) and W5 (assembly of the screw part E) by the robot R2 can be executed. Assembling) becomes feasible.
  • the robot R1 executes the work W2.
  • the base component A and the plate component B disappear, and the assembly product AB appears.
  • the robot R2 executes work W0 (standby).
  • the robot R1 executes the work W3
  • the robot R2 executes the work W4 (assembly of the screw component D).
  • the plate component C, the screw component D, and the substructure AB disappear, and the substructure ABCD appears.
  • the robot R1 executes the work W8 (removal of the hand), and the robot R2 executes the work W5 (assembly of the screw component E). As a result, the screw component E and the assembly ABCD disappear, and the product ABCDE appears.
  • the robot R1 executes the work W0 (standby), and the robot R2 executes the work W8 (removal of the hand). As a result, the completed state is obtained and the assembly work is completed.
  • the work subject is also possible to set the work subject as three or more robots in the assembly work order. Further, as an assembly work environment, a state such as trays 321 and 322 and a stage 312 may be added in addition to the hand. Further, the work of the transport device for transporting the parts from a predetermined position to another position may be added to the definition of the work.
  • FIG. 9 is a flowchart illustrating an example of the assembly work sequence planning process by the assembly work sequence planning device 10.
  • the CAD system 20 has already modeled the shape model of the assembly work environment composed of the product shape model and the robot, and the assembly work environment / product information 121 and the assembly state transition information (AND / OR tree). It is assumed that 122 can be supplied to the assembly work sequence planning device 10.
  • the assembly work sequence planning process is started, for example, in response to a predetermined operation from the user.
  • the information acquisition unit 111 acquires the assembly work environment / product information 121 from the CAD system 20 and stores it in the storage unit 12 (step S1). Next, the information acquisition unit 111 acquires the assembly state transition information 122 from the CAD system 20 and stores it in the storage unit 12 (step S2).
  • the assembly work definition unit 112 sets the pre-work state of the assembly work environment (robot hand, tray, stage, etc.) (step S3), and then the assembly work definition unit 112 sets the assembly state transition information. Based on 122, an assembly work is defined for an assembly state consisting of two parts or components before assembly, and the defined assembly work is stored in the storage unit 12 as assembly work information 123 (step S4). ..
  • the constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124 (step S5).
  • the action selection / value function component 114 defines the action selection function and the value function used for reinforcement learning (step S6).
  • the reward setting unit 115 sets a reward for the work selected for each assembly state (step S7).
  • the learning unit 116 fails to select the work for the assembly state and fails to assemble, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds.
  • Reinforcement learning is executed by repeating an episode meaning a series of actions (step S8).
  • the episode is repeated, the action selection for the state is learned, and when the result of the successful assembly is obtained, the reinforcement learning is completed.
  • the behavior selection function that selects the good behavior for the state is obtained.
  • an action is selected for the initial state, and the next state is obtained. Select another action to get the next action.
  • the action selection is repeated step by step to advance the state, and if the action selection is wrong or the state is not allowed, a negative reward is obtained and the episode ends. Conversely, if successful, the episode ends with a positive reward.
  • the period when the number of repetitions of the episode is small does not lead to success and fails, but as the number of repetitions of the episode increases and the learning progresses, the episode becomes successful. Therefore, when the episode succeeds a predetermined number of times (for example, three times) in a row, the reinforcement learning is terminated.
  • the assembly work sequence generation unit 117 uses the action selection function and the state value function obtained as a result of reinforcement learning to try episodes from the initial state, thereby completing a series of work until the product is completed.
  • the selection results are connected to set the assembly work order (step S9).
  • the assembly work order planning process by the assembly work order planning device 10 described above it is possible to plan the assembly order of each part constituting the product and the work order in which the robot is the main work.
  • FIG. 10 shows a configuration example of the assembly work sequence planning device 100 according to the second embodiment of the present invention.
  • the assembly work order planning device 100 is for planning an assembly work order when a product is completed by assembling parts by a work entity that may include a robot and a worker.
  • the assembly work sequence planning device 100 includes the assembly work definition unit 112 in the assembly work sequence planning device 10 (FIG. 1) according to the first embodiment of the present invention, the assembly state-based assembly work definition unit 112A, and the work state-based assembly. It is divided into the work definition unit 112B and the simulation instruction unit 118 is added.
  • the assembly work order planning device 100 adds constraint condition determination simulation information 125 and assembly work order simulation information 126 as information stored in the storage unit 12.
  • a robot simulator 30 to which the assembly work sequence planning device 100 can be connected via the network 1 is added to the outside of the assembly work sequence planning device 100.
  • the robot simulator 30 executes a simulation of work by the robots R1 and R2 in which the assembly work environment is installed according to the instruction from the assembly work order planning device 100, and outputs the simulation result to the assembly work order planning device 100.
  • the assembly state-based assembly work definition unit 112A defines the assembly work with reference to the assembly state transition information 122, similarly to the assembly work definition unit 112 in the assembly work sequence planning device 10 (FIG. 1).
  • the work state-based assembly work definition unit 112B refers to the assembly work environment / product information 121 and defines the work required for assembly for the state of the assembly work environment. For example, if a driver hand is required to assemble a part, the work of attaching the driver hand to the robot in the assembled state before assembling the part is defined. In this case, if the assembly state before assembling the parts and the driver hand is attached to the robot, the work of assembling the parts can be executed.
  • the robot hand is not the only one that can set the state as an assembly work environment. For example, when there are a plurality of stages in the assembly work environment, it is possible to set the state of whether or not the stage is being used for assembly. Further, since the robot or the worker who is the work subject is carrying out a certain work, the type of the work being carried out by the work subject may be regarded as the state of the work subject and set.
  • the simulation instruction unit 118 instructs the robot simulator 30 to perform an individual assembly work simulation for setting constraint conditions to the robot simulator 30 provided externally, and acquires the simulation result.
  • the constraint condition setting unit 113 can modify the constraint condition based on the simulation result.
  • the simulation instruction unit 118 instructs the robot simulator 30 to perform an assembly work simulation according to the finally obtained assembly work order, and acquires the simulation result.
  • the simulation results can be used to confirm that the finally obtained assembly work sequence is valid.
  • the constraint condition determination simulation information 125 is information including the conditions when the robot simulator 30 is instructed to perform the individual assembly work simulation for setting the constraint conditions, and the simulation result.
  • the assembly work order simulation information 126 is information including the conditions when the assembly work order simulation is instructed to the robot simulator 30 and the simulation result.
  • FIG. 11 shows an example of the product 50 that plans the assembly work order by the assembly work order planning device 100.
  • the product 50 is completed by arranging the box component B on the base component A, fastening and fixing it with the screw components C and D, and connecting the base component A and the box component B with the wiring components E and F. Has a structure.
  • the robot executes the work of arranging the box part B on the base part A and the work of fastening the screw parts C and D, and the wiring parts E and F. It is assumed that the worker performs the work of connecting the wires.
  • FIG. 12 shows a list of transitions to each assembly state in the assembly work of the product 50.
  • the assembled state of the product 50 is 6 states of the base part A, the box part B, the screw parts C and D, and the wiring parts E and F, which are single units, and 1 state of the assembly product AB consisting of 2 parts, and 3 states.
  • the transition of the product 50 to each assembly state is 33 ways of No1 to No33 shown in FIG. Of these, the transition for assembling parts E and F is performed according to the work by the worker, and the transition for assembling other parts is performed according to the work by the robot.
  • the work subject will be one robot and one worker.
  • the robot shall use the grip hand for arranging the box component B and the driver hand for fastening the screw components C and D.
  • the work defined for the work-oriented robot and workers is as follows.
  • Work by robot Work W0 Standby.
  • Work W1 Assembling box part B.
  • Work W2 Assembly of screw part C.
  • Work W3 Assembly of screw part D.
  • Work W4 Attaching the grip hand.
  • Work W5 Installation of the driver hand.
  • Work W6 Removal of the hand.
  • Work P1 Assembly of wiring component E.
  • Work P2 Assembling the wiring component F.
  • the state before the assembly state transition is at least the box component in the base component A. It is necessary to be in the state after the component AB to which B is assembled.
  • the initial state of the assembly work is defined as a state in which only a single part exists, the base part A is pre-mounted on the stage 312, and neither the grip hand nor the driver hand is attached to the robot. do.
  • the completed state of the assembly work is defined as a state in which the product ABCDEF is obtained and neither the grip hand nor the driver hand is attached to the robot.
  • FIG. 13 shows an example of the assembly work order set based on the assembly state transition (not shown) of the product 50 and the assembly work environment (hand state), and the assembly state for each work.
  • the assembly work sequence consists of one robot and one worker as the main work subject, and seven types of robot work W0 to W6 and work P0 by the worker based on the assembly state and assembly work environment of the product 50. It is set by reinforcement learning to select a total of 10 types of work with 3 types of ⁇ P2.
  • the assembly work order shown in the figure is from work step 0 to 7, with 0 being the initial state and 7 being the completed state.
  • the state values described corresponding to the component A and the like are the same as in the case of FIG.
  • the robot executes work W4 (attachment of the grip hand), and the worker executes work P0 (standby).
  • work W4 attachment of the grip hand
  • work P0 standby
  • the robot executes the work W1 and the worker executes the work P0 (standby). As a result, the base component A and the box component B disappear, and the assembly product AB appears.
  • the robot executes the work W6 (removal of the hand), and the worker executes P1 (assembly of the wiring component E). As a result, the wiring component E and the substructure AB disappear, and the substructure ABE appears.
  • the robot executes the work W5 (attaching the driver hand), and the worker executes the work P2 (assembling the wiring component F).
  • the work W2 (assembly of the screw component C) and W3 (assembly of the screw component D) by the robot can be performed.
  • the wiring component F and the substructured product ABE disappear, and the substructured product ABEF appears.
  • the robot executes work W2 (assembly of screw parts C), and the worker executes work P0 (standby).
  • work W2 assembly of screw parts C
  • work P0 standby
  • the robot executes the work W3 (assembly of the screw component D), and the worker executes the work P0 (standby).
  • the screw component D and the component ABCEF disappear, and the product ABCDEF appears.
  • the robot executes work W6 (removal of the hand), and the worker executes work P0 (standby). As a result, the completed state is obtained and the assembly work is completed.
  • the assembly work order can be planned even if the number of parts increases and the number of required assembly work increases, or if there are multiple robots or workers as the work subject. Furthermore, as an assembly work environment, the states of trays 321, 322, stages 312, etc. are added in addition to the hand, and the work of the transport device is added to the work definition from a predetermined position to another position. You can also plan the assembly work sequence.
  • FIG. 14 is a flowchart illustrating an example of the assembly work order planning process by the assembly work order planning device 100.
  • the processes of steps S21 to S23, S26, S28 to S31 are the steps S1 to S1 of the assembly work order planning process (FIG. 9) by the assembly work order planning device 10. Since it is the same as the processing of S3, S5, S6 to S9, the description thereof will be omitted as appropriate.
  • the CAD system 20 has already modeled the shape model of the assembly work environment composed of the product shape model and the robot, etc., and the assembly work environment / product information 121 and the assembly state transition information (AND / OR tree). It is assumed that 122 can be supplied to the assembly work sequence planning device 10.
  • the assembly work sequence planning process is started, for example, in response to a predetermined operation from the user.
  • the information acquisition unit 111 acquires the assembly work environment / product information 121 and the assembly state transition information 122 from the CAD system 20 and stores them in the storage unit 12 (steps S21 and S22).
  • the assembly work definition unit 112 sets the state of the assembly work environment before work (step S23).
  • the assembly state-based assembly work definition unit 112A defines and defines the assembly work for the assembly state consisting of two parts or parts before assembly based on the assembly state transition information 122.
  • the work is stored in the storage unit 12 as the assembly work information 123 (step S24).
  • the work state-based assembly work definition unit 112B refers to the assembly work environment / product information 121, and thus defines the work required for assembly with respect to the state of the assembly work environment (step S25).
  • the constraint condition setting unit 113 sets the constraint condition and stores it in the storage unit 12 as the constraint condition information 124 (step S26).
  • the simulation instruction unit 118 instructs the robot simulator 30 to perform an individual assembly work simulation for setting constraint conditions, and acquires the simulation result (step S27). Specifically, the assembly state before each assembly work and the state of the assembly work environment are set in the robot simulator 30, and the simulation of the target assembly work is executed. If the simulation causes a problem such as the robot interfering with another robot (collision) or the robot cannot take the posture to assemble the target parts due to insufficient movable angle of the joints, etc. It is judged that the assembly work is not feasible. Therefore, according to this simulation result, the assembly work that cannot be performed is additionally set as a constraint condition. Alternatively, the definition of the assembly work determined to be unrealizable may be deleted.
  • the action selection / value function component 114 defines the action selection function and the value function used for reinforcement learning (step S28).
  • the reward setting unit 115 sets a reward for the work selected for each assembly state (step S29).
  • the learning unit 116 fails to select the work for the assembly state and fails to assemble, or the work selection for the assembly state succeeds and the assembly work is completed and the assembly succeeds. , Perform reinforcement learning by repeating episodes that mean a series of actions. Reinforcement learning ends when a predetermined number of episodes (for example, three times) are successful in succession (step S30).
  • the assembly work sequence generation unit 117 uses the action selection function and the state value function obtained as a result of reinforcement learning to try episodes from the initial state, thereby performing a series of work until the product is completed.
  • An assembly work sequence in which the selection results are connected is generated (step S31).
  • the simulation instruction unit 118 instructs the robot simulator 30 to simulate the assembly work order obtained in step S31, acquires the simulation result, and confirms that there is no problem (there is no problem). Step S32).
  • the method of reinforcement learning is based on a model of Markov decision process (MDP), which selects an action and updates the state in a certain state, and the action or state is good or bad. That is, it is based on the model that value accompanies.
  • MDP Markov decision process
  • action selection is executed using the value of the action value table, and Q-learning that updates the table value by learning the action selection, and action from the action value using the action value function as a deep neural network.
  • DQN Deep Q-learning Network
  • A3C Asynchronous Advantage Actor-Critic
  • each step in one episode the action is selected for the state to transition to the next state, and the result of the action selection or the state is rewarded. Then, if the reward meets the end condition of the episode, the episode ends. Then, when the episode is repeated and the correct action can be selected for the state, or when the action selected for the state becomes deterministic, the reinforcement learning ends.
  • A3C as the word Asynchronous implies, multiple agents (also called actors) perform episodes individually, and each agent trains (learns) a single action selection function, state value function. ), And it is a method used for action selection.
  • FIG. 14 is a diagram for explaining the outline of A3C.
  • the action selection function and the state value function 1401 are configured as a shared deep neural network (DNN). This is referred to as a shared DNN.
  • DNN shared deep neural network
  • the state variables of the input layer are set to 3 types, and the actions of the output layer are set to 3 types.
  • the value (state value) of the output layer is always one.
  • the middle layer is one layer, and the number of nodes is four.
  • the activation function of the node sets softmax in the output layer of action, linear in the output layer of value, and ReLU (Rectified Linear Unit) in the intermediate layer.
  • Each agent has a memory for storing states, actions, next states, rewards, and a DNN for selecting its own action.
  • the agent 1 includes a memory 1411 and a DNN 1421.
  • the agent 2 includes a memory 1412 and a DNN 1422, and the agent 3 includes a memory 1413 and a DNN 1423.
  • Each agent individually trains the shared DNN once the step data is accumulated in each episode process. Then, after the training, the shared DNN is copied to the own DNN, and the action for the state is selected by using the own DNN. This is a configuration for training and action selection not to conflict because the timing of DNN training and action selection is asynchronous between each agent.
  • A2C Advanced Technology-Critic
  • the advantage in A2C is the difference between the action value and the state value, that is, the quantity indicating the goodness of the action selection
  • the Actor-Critic in the A2C is a method in which the action selection and the state value evaluation are calculated separately.
  • FIG. 16 is a diagram for explaining the processing contents of the training of the action selection function and the state value function, and schematically shows the DNN (deep neural network) 1501.
  • DNN deep neural network
  • the DNN1501 takes the state s as an input, the output (measure) of action selection is ⁇ (s), and the output of the state value is V (s).
  • be the parameter of the function (activation function) defined for the node in DNN1501.
  • the policy is also called a probability policy, and is also expressed as the probability ⁇ (a
  • the training of the action value function and the state value function optimizes the parameter ⁇ of the DNN 1501 so that the relationship predicting the relationship can be obtained from the data of the state s, the action a, the next state s', and the reward r. That is. In particular, it is important to be able to correctly estimate the relationship between state value and behavior.
  • the state value function V (s) is expressed by the following equation (1) under the policy ⁇ (s).
  • E means the expected value in the subscript ⁇ (s).
  • the discount rate ⁇ is a coefficient for correcting the next state value (future value) to the current value.
  • the value of the discount rate ⁇ is 0.99 as an example, but it can be adjusted as a parameter for reinforcement learning.
  • the value of the policy is expressed by the following equation (2) as the expected value in the distribution ⁇ of the state s.
  • Equation (2) is called the discounted reward function.
  • Equation (3) There is a policy gradient theorem expressed by the following equation (3) in the change of the discounted reward function with respect to the parameter ⁇ of the DNN.
  • means the gradient (first-order partial differential with respect to the basis).
  • the expected value depends on the range of the state s over the distribution ⁇ of the policy ⁇
  • the action a depends on the range of the policy ⁇ for the state s.
  • the action value function Q (s, a) and the advantage function A (s, a) are as shown in the following equations (4) and (5), and can be calculated from the data.
  • the policy loss should be minimized.
  • the action value and the state value match, so there is also a value loss that means the magnitude of the advantage absolute value.
  • the policy is uniquely determined for the state, and since the policy is stochastically determined, the regularization term modeled by entropy can also be used for optimization. Therefore, the loss function L is defined as the following equation (6) using the policy loss L ⁇ , the value loss L v , and the regularization term L reg, and the loss function is minimized.
  • c v and c reg are coefficients.
  • the policy loss L ⁇ is as shown in the following equation (7) from the definition of the discounted reward function.
  • N used for training is the number of data in the step.
  • the value loss L v is the square of the advantage function A (s, a) as shown in the following equation (8).
  • the regularization term is obtained by calculating the entropy H ( ⁇ (s)) as shown in the following equations (9) and (10).
  • n action is the number of actions.
  • Training by optimization calculation uses the gradient method in the case of deep neural networks. This training itself is similar to learning of a deep neural network, which is different from reinforcement learning, and can be realized by utilizing deep neural network technology.
  • a method of utilizing the characteristics of search for reinforcement learning and utilization of learning results such as the ⁇ -greedy method may be used as a technique in the field of reinforcement learning technology.
  • the state is the assembly state and the state of the assembly work environment, and the variable value is 0 or 1.
  • the positive reward value may be 1
  • the negative reward value may be -1
  • 0 may be set when no reward is generated.
  • the present invention is not limited to the above-described embodiment, and various modifications are possible.
  • the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations.
  • each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a recording device such as a hard disk or SSD, or a recording medium such as an IC card, SD card, or DVD.
  • the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.
  • Assembly status transition information 123. ⁇ ⁇ Assembly work information, 124 ⁇ ⁇ ⁇ constraint condition information, 125 ⁇ ⁇ ⁇ constraint condition judgment simulation information, 126 ⁇ ⁇ ⁇ assembly work order simulation information, 13 ⁇ ⁇ ⁇ input unit, 14 ⁇ ⁇ ⁇ output unit, 15 ⁇ ⁇ Communication unit, 20 ⁇ ⁇ ⁇ CAD system, 30 ⁇ ⁇ ⁇ robot simulator, 40, 50 ⁇ ⁇ ⁇ product, 100 ⁇ ⁇ ⁇ assembly work sequence planning device, 303, 304 ⁇ ⁇ ⁇ hand, 311 ⁇ ⁇ ⁇ pedestal , 312 ... Stage, 321, 322 ... Tray, 331, 332 ... Hand installation stand, 333, 334 ... Replacement hand

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Robotics (AREA)
  • Manipulator (AREA)
  • Automatic Assembly (AREA)
  • General Factory Administration (AREA)

Abstract

La présente invention planifie un ordre d'assemblage d'éléments constituant un produit et un ordre de travail de sujets de travail qui peuvent comprendre des robots et des travailleurs. A cet effet, la présente invention concerne un dispositif de planification d'ordre de travail d'assemblage, qui est caractérisé en ce qu'il comprend : une unité d'acquisition d'informations qui acquiert des informations de transition d'état d'assemblage comprenant des informations indiquant un processus d'assemblage d'une pluralité d'éléments en un produit de sous-assemblage et ensuite en un produit final ; une unité de définition de travail d'assemblage qui définit, sur la base des informations de transition d'état d'assemblage, un travail qui peut être effectué par le sujet de travail par rapport à un état d'assemblage comprenant les deux éléments avant le produit de sous-assemblage ou l'assemblage ; une unité de définition de condition de contrainte qui définit une condition de contrainte liée au fait que le travail peut, ou non, être effectué ; une unité d'apprentissage qui réalise un apprentissage par renforcement sur un procédé de sélection du travail pour l'état d'assemblage en fonction de la condition de contrainte ; et une unité de génération d'ordre de travail d'assemblage qui génère un ordre de travail d'assemblage du produit sur la base du résultat de l'apprentissage par renforcement.
PCT/JP2021/017731 2020-07-16 2021-05-10 Dispositif de planification d'ordre de travail d'assemblage et procédé de planification d'ordre de travail d'assemblage WO2022014128A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020121910A JP7474653B2 (ja) 2020-07-16 2020-07-16 組立作業順序計画装置、及び組立作業順序計画方法
JP2020-121910 2020-07-16

Publications (1)

Publication Number Publication Date
WO2022014128A1 true WO2022014128A1 (fr) 2022-01-20

Family

ID=79554634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/017731 WO2022014128A1 (fr) 2020-07-16 2021-05-10 Dispositif de planification d'ordre de travail d'assemblage et procédé de planification d'ordre de travail d'assemblage

Country Status (2)

Country Link
JP (1) JP7474653B2 (fr)
WO (1) WO2022014128A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1058280A (ja) * 1996-08-16 1998-03-03 Hitachi Ltd 加工工程設計システム
JP2018086711A (ja) * 2016-11-29 2018-06-07 ファナック株式会社 レーザ加工ロボットの加工順序を学習する機械学習装置、ロボットシステムおよび機械学習方法
JP2018140471A (ja) * 2017-02-28 2018-09-13 ファナック株式会社 制御装置及び機械学習装置
JP6599069B1 (ja) * 2018-12-13 2019-10-30 三菱電機株式会社 機械学習装置、加工プログラム生成装置および機械学習方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1058280A (ja) * 1996-08-16 1998-03-03 Hitachi Ltd 加工工程設計システム
JP2018086711A (ja) * 2016-11-29 2018-06-07 ファナック株式会社 レーザ加工ロボットの加工順序を学習する機械学習装置、ロボットシステムおよび機械学習方法
JP2018140471A (ja) * 2017-02-28 2018-09-13 ファナック株式会社 制御装置及び機械学習装置
JP6599069B1 (ja) * 2018-12-13 2019-10-30 三菱電機株式会社 機械学習装置、加工プログラム生成装置および機械学習方法

Also Published As

Publication number Publication date
JP2022018654A (ja) 2022-01-27
JP7474653B2 (ja) 2024-04-25

Similar Documents

Publication Publication Date Title
Darvish et al. A hierarchical architecture for human–robot cooperation processes
CN111221312A (zh) 机器人在生产线的优化方法、系统及在数字孪生的应用
Ota et al. Trajectory optimization for unknown constrained systems using reinforcement learning
Ren et al. Extended tree search for robot task and motion planning
CN115916477A (zh) 机器人演示学习的技能模板分发
JP4900642B2 (ja) 学習制御装置、学習制御方法、およびプログラム
Nilles et al. Robot design: Formalisms, representations, and the role of the designer
US11577392B2 (en) Splitting transformers for robotics planning
US11747787B2 (en) Combining transformers for robotics planning
WO2022014128A1 (fr) Dispositif de planification d'ordre de travail d'assemblage et procédé de planification d'ordre de travail d'assemblage
US11787048B2 (en) Robot planning from process definition graph
Stan et al. Reinforcement learning for assembly robots: A review
US20210060773A1 (en) Robot planning from process definition graph
CN115666871A (zh) 分布式机器人演示学习
JP2021122899A (ja) 軌道生成装置、多リンクシステム、及び軌道生成方法
JP2021035714A (ja) 制御装置、制御方法、及び制御プログラム
CN115114683A (zh) 用于将自主技能执行中的约束反馈到设计中的系统与方法
Fujita Deep Reinforcement Learning Approach for Maintenance Planning in a Flow-Shop Scheduling Problem
Xiang et al. Rmbench: Benchmarking deep reinforcement learning for robotic manipulator control
Amirnia et al. A context-aware real-time human-robot collaborating reinforcement learning-based disassembly planning model under uncertainty
JP2020060996A (ja) シミュレーション装置、シミュレーション方法及びシミュレーションプログラム
KR20200097896A (ko) 매니퓰레이터 urdf파일 생성장치 및 방법
Schellenberg A Workflow for Training Robotic End-to-End Visuomotor Policies in Simulation
WO2021033472A1 (fr) Dispositif de commande, procédé de commande et programme de commande
WO2022185759A1 (fr) Dispositif, procédé et programme de conception de système de cellules de robot

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21843283

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21843283

Country of ref document: EP

Kind code of ref document: A1