EP4326500A1 - Commande d'un robot industriel pour une tâche de préhension - Google Patents

Commande d'un robot industriel pour une tâche de préhension

Info

Publication number
EP4326500A1
EP4326500A1 EP22809169.0A EP22809169A EP4326500A1 EP 4326500 A1 EP4326500 A1 EP 4326500A1 EP 22809169 A EP22809169 A EP 22809169A EP 4326500 A1 EP4326500 A1 EP 4326500A1
Authority
EP
European Patent Office
Prior art keywords
training
post
ann
neural network
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP22809169.0A
Other languages
German (de)
English (en)
Inventor
Jonathan Balzer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vathos GmbH
Original Assignee
Vathos GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vathos GmbH filed Critical Vathos GmbH
Publication of EP4326500A1 publication Critical patent/EP4326500A1/fr
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1669Programme controls characterised by programming, planning systems for manipulators characterised by special application, e.g. multi-arm co-operation, assembly, grasping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/653Three-dimensional objects by matching three-dimensional models, e.g. conformal mapping of Riemann surfaces
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/39Robotics, robotics to robotics hand
    • G05B2219/39543Recognize object and plan hand shapes in grasping movements
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40532Ann for vision processing
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40564Recognize shape, contour of object, extract position and orientation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention relates to a distributed, at least partially computer-implemented system for controlling at least one robot in a gripping task for gripping objects of different object types, an operating method for operating such a system, a central training computer, a central operating method for operating a central training computer, a local Arithmetic unit and a local operating method for operating the local arithmetic unit and a computer program.
  • an industrial robot has the task of removing parts from a box fully automatically, in practice the parts are usually arranged chaotically within the box and may not be of the same type.
  • the removed parts should be sorted and placed on a conveyor belt/pallet or similar for further processing.
  • a production machine is to be equipped with them.
  • a precise grip requires an equally precise determination of the position and orientation (and also "recognition") of the object or component based on images from a 3D camera that can be mounted at the end of the robot arm or above the box.
  • a feature is defined as a transformation of the raw image data into a low-dimensional space.
  • the draft of Feature aims to reduce the search space by filtering out irrelevant content and interference.
  • the method described in the document Lowe, DG Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision 60, 91-110 (2004) is invariant to scaling the image, because it is initially irrelevant for the recognition of an object how far it is from the camera.
  • Such manually designed features are qualitatively and quantitatively strongly tailored to specific object classes and environmental conditions. Optimizing them for the specific application requires expert knowledge and thus severely limits the flexibility of the user.
  • Machine learning methods are generic in the sense that they can be trained on any object class and environmental condition by only providing a sufficient number of data/images that reflect these environmental conditions.
  • the recording of Data can also be carried out by laypersons without any understanding of (optimal) features.
  • so-called deep learning methods learn not only the ability to recognize objects on the basis of their features, but also the optimal structure of the feature itself have in common that they train exclusively on real data (image data of the objects to be grasped.)
  • deep learning learn not only the ability to recognize objects on the basis of their features, but also the optimal structure of the feature itself have in common that they train exclusively on real data (image data of the objects to be grasped.)
  • the disadvantage of these methods is that annotated training data of the object type to be grasped must be provided, which is time-consuming and labor-intensive.
  • the present invention has set itself the task of demonstrating a way with which the gripping task for gripping objects in an industrial (eg manufacturing) process for different object types can be made more flexible and improved.
  • the process for training the system should also be shortened.
  • the gripping task should also be carried out qualitatively better with different object types.
  • This object is achieved by the attached, independent patent claims, in particular by a distributed at least partially computer-implemented system for controlling at least one robot in a gripping task for gripping objects of different object types, an operating method for operating such a system, a central training computer, a central Operating method for operating a central training computer, a local processing unit and a local operating method for operating the local processing unit and a computer program.
  • the present invention relates to a distributed at least partially computer-implemented system for controlling at least one robot in a gripping task for gripping objects of different object types (e.g. screws, workpieces of different shape and/or size, or packaging with or without contents and/or or of components of a production system) which are arranged in a working area of the robot.
  • the system includes:
  • a central training computer with a memory on which an instance of a neural network is stored, the training computer being intended for pre-training and post-training of the neural network (ANN); wherein the ANN is trained for object recognition and position detection, including detection of an orientation of the object, in order to calculate gripping instructions for an end effector unit of the robot for gripping the object; wherein the central training computer is designed to detect an object type and wherein the central training computer is designed to carry out a pre-training exclusively with synthetically generated object data that serve as pre-training data, which is generated by means of a geometric, object-type-specific 3D model of the objects and where, as a result of the pre-training, pre-training parameters of a pre-trained ANN are transmitted to at least one local processing unit via a network interface, and the central training computer is also designed to continuously and cyclically carry out post-training of the ANN and, as a result of the post-training, post-training parameters of a post-trained ANN via the to transmit network interface to at least one local local processing
  • a set of local resources interacting over a local network including: o The robot with a robot controller, a manipulator and the end effector unit, wherein the robot controller is intended to control the robot and in particular its end effector unit to carry out the gripping task for one object of the respective object type; o An optical capture device used to capture image data of objects in the workspace of the robot; o at least one local processing unit for interacting with the robot controller, wherein the at least one local processing unit is intended to store different instances of the ANN by the at least one local processing unit being determined by the central training computer
  • the network interface for data exchange between the central training computer and the set of local processing units, with data exchange taking place via an asynchronous protocol.
  • the neural network is based on learning processes from the class of supervised learning processes.
  • the neural network is trained to grasping objects (side by side or partially on top of each other, superimposed, in a box or on a conveyor belt, etc.) so that the system can react to them by calculating grasping instructions specific to the arrangement state.
  • the arrangement status is characterized by the type/class/type of the respective objects (object identification), their position (bearing recognition), their orientation (orientation recognition) in relation to the respective working area of the robot.
  • the system and the method enable the six-dimensional position recognition and orientation recognition of general objects in space and in particular in the working space of the robot, which can be designed differently (conveyor belt, box, etc.).
  • the retraining takes place continuously and cyclically on the basis of real image data of the objects to be gripped.
  • the initial training or pre-training takes place exclusively on the basis of pre-training data, namely unreal or synthetic object data, which are generated from the 3D model for the specific object type using computer-graphic methods (in particular a synthesis algorithm).
  • the communication between the central training computer and the at least one local processing unit takes place by means of asynchronous synchronization.
  • the pre-trained neural network (also called network for short) allows the detection of objects or parts based on real image data in a simple environment (flat surface) that is less demanding than the target environment (box).
  • the robot is already interacting with the object by putting it down in different positions and in different orientations, or at least executing the dropping phase of the target process.
  • Additional training data, the post-training data is obtained, but this time under realistic conditions. These are transmitted back to the central training computer via the network interface, in particular the WAN.
  • the training is continuously improved, taking into account the real image data, by post-training that is carried out on the central training computer and the result of the post-training with the weights is transferred back to the local processing unit connected to the robot for the end application (e.g. bin picking). .
  • the solution described here combines all the advantages of a data-driven object recognition approach, such as high reliability and flexibility through simple programming and/or parameterization without any effort for the generation of training data and without sacrificing accuracy due to the discrepancy between synthetically generated training data and the real operational environment.
  • the neural network can be stored and/or implemented and/or used in different instances, in particular on the local processing unit.
  • “Instance” here refers to the training state.
  • a first entity could be a pre-trained state, a second entity a first post-trained state, a third entity a second post-trained state, with the post-training data always being generated on the basis of image data actually captured by the optical acquisition device and the pre-training data being generated exclusively on synthetically generated object data (which are also rendered and are therefore also image data).
  • the instance or the state is represented in the weights and pre- and post-training parameters, which are transmitted from the central training computer to the local processing units after each training session.
  • the work area can be a conveyor belt or other 2-dimensional structure for receiving or storing the objects.
  • the workspace can be a 3-dimensional structure for receiving or storing the objects, such as a box.
  • the object can thus be arranged, for example, on a surface or in a storage unit, for example in a box. It is preferably configurable, in particular via a setting parameter on a user interface, how densely the objects are placed.
  • a first analysis phase after implementation of the pre-trained ANN, it is possible to distribute the objects disjunctively in the workspace.
  • the objects can be distributed arbitrarily in the work area and also partially overlap or overlap.
  • the gripping instructions include at least a set of target positions for the set of end effectors that must be operated to perform the gripping task.
  • the gripping instructions can also include a time specification as to when which end effector must be activated synchronized with which other end effector(s) in order to fulfill the gripping task, such as holding together with 2 or more finger grippers.
  • the 3D model is a three-dimensional model that characterizes the surface of each type of object. It can be a CAD model.
  • the format of the 3D model is selectable and can be converted by a conversion algorithm, such as to a triangle mesh in OBJ format, where the surface of the object is approximated by a set of triangles.
  • the 3D model has an (intrinsic) coordinate system.
  • the render engine installed on the central training computer can do the intrinsic Position the 3D model's coordinate system in relation to a virtual camera's coordinate system.
  • the 3D model is positioned on the central training computer in a pure simulation environment so that a depth image can be synthesized by the render engine based on this positioning.
  • the orientation and/or position of the object depicted therein is then uniquely assigned as a label to the image (object data) generated in this way.
  • Object type here means a product type, i.e. the digital specification of the application (e.g. the question of which specific objects should be gripped?).
  • the central training computer can then access the database with the 3D models in order to load the appropriate object type-specific 3D model, e.g. for screws of a certain type, the 3D model of this screw type.
  • the loaded 3D model is placed in all physically plausible or physically possible positions and/or orientations, in particular by a so-called render engine (electronic module on the central training computer).
  • a render engine is preferably implemented on the central training computer. In this case, geometric boundary conditions of the object, such as size, center of gravity, mass and/or degrees of freedom, etc., are taken into account by a synthesis algorithm.
  • An image is then rendered and a depth buffer is saved as a synthesized data object together with the labels (position and orientation, possibly, optional class) of the object depicted or represented in the image (quasi as a synthesized image).
  • the image synthesized in this way serves as pre-training data in a set of correspondingly generated images, which are used for pre-training the neural network on the central training computer.
  • the pre-training data are thus generated in an automatic, algorithmic process (synthesis algorithm) from the 3D model that matches the specific type of object or is specific to the type of object and that is stored in a model memory.
  • the object data can in particular be images. Therefore, no real image data of the objects to be gripped have to be captured and transmitted to the central training computer for the pre-training.
  • the pre-training can thus advantageously be carried out autonomously on the central training computer.
  • the pre-training data is exclusively object data synthesized using the synthesis algorithm.
  • the post-training is used for post-training or the improvement of the machine learning model with actually recorded image data that has been recorded on the local processing unit from the real objects in the working area of the robot.
  • the post-training data is annotated using an annotation algorithm based on generated reference image data or provided with a label (“labeled”).
  • the post-training data is therefore in particular annotated image data based on real image data that has been recorded locally with the optical recording device.
  • the weights of the AN Ns from the pre-training are first loaded. Based on this, a stochastic gradient descent method is continued over the post-training data. The error functional and the gradient are calculated over the set of all training data points.
  • the size and properties of the input data influence the position of the global minimum and thus also the weights (or parameters) of the ANN. Specifically, where RGB images exist, six (6) coordinates (point location and 3 color channels) are fed into the input layer of the ANN. Otherwise three (3) coordinates are fed into the input layer of the ANN.
  • the post-training takes place cyclically. During operation, post-training data based on real image data is continuously recorded. The more real image data and thus post-training data are available, the less synthetic data (object data) the method requires. The synthetic data/real image data ratio is continuously reduced until there is no longer any synthetic data in the post-training dataset
  • pre-training parameters and/or post-training parameters with weights for the neural network are generated (existing weights can be retained or adjusted during post-training).
  • An important technical advantage can be seen in the fact that only the weights in the form of the pre-training parameters and/or post-training parameters have to be transmitted from the central training computer to the local processing unit, which results in transmission in compressed form and helps to save network resources.
  • the end effector unit can be arranged on a manipulator of the robot, which can be designed as a robot arm, for example, in order to carry the end effectors.
  • Manipulators with different kinematics are possible (6-axis robot with 6 degrees of freedom, linear unit with only 3 translational degrees of freedom, etc.).
  • the end effector unit can include a plurality of and also different end effectors.
  • An end effector can be designed, for example, as a vacuum suction device or as a pneumatic gripper. Alternatively or cumulatively, magnetic, mechanical and/or adhesive grippers can be used.
  • Several end effectors can also be activated simultaneously to carry out a coordinated gripping task.
  • the end effector assembly may include one or more end effectors, such as 2 or 3 finger grippers and/or suction cup grippers.
  • the local computing unit can be designed as an edge device, for example.
  • the neural network artificial neural network
  • ANN artificial neural network
  • the software with the algorithms e.g. modified ICP, automatic method for labeling the camera images, etc.
  • ICP automatic method for labeling the camera images, etc.
  • the local processing unit is designed to interact with the robot controller, in particular to exchange data with the controller of the robot.
  • the local processing unit controls the robot at least indirectly by instructing the robot controller accordingly.
  • the local resources thus include two different controllers: on the one hand, the robot controller (on the industrial robot) and, on the other hand, the local computing unit, in particular an edge device, which is set up in particular to evaluate the locally recorded images.
  • the robot controller queries the position of the objects from the local processing unit in order to then "control" the robot. In this respect, the robot controller has control over the local processing unit.
  • the edge device controls the robot indirectly.
  • the modified ICP algorithm is mainly used to provide annotations to the result data of the neural network as reference image data in order to enable an external evaluation of the machine result.
  • the modification of the classic ICP algorithm is that between the iterations of the algorithm, not only the correspondences (in the form of nearest neighbors) are recalculated, but also one of the two point clouds by rendering a depth image of the model from the currently estimated relative location/orientation of model and camera.
  • the amount of error to be minimized is calculated from the distances between corresponding points in space, with the correspondences also being determined in each iteration on the basis of the shortest distances.
  • the chicken-and-egg problem is solved by iterative execution (similar to the concept of iterative training described here).
  • a result data set with the labels in particular the position and orientation of the object in the work area and optionally the class, is determined from the captured image data in which the object to be gripped is depicted.
  • the result record is an intermediate result.
  • the modified ICP algorithm is applied to the result data set to calculate a refined result data set, which serves as the final result.
  • the final result is sent to the robot controller and to the central training computer for follow-up training.
  • a “processing unit” or a “computer” can be understood to mean, for example, a machine or an electronic circuit.
  • the method is then executed "embedded".
  • the local operating method can be closely coupled to the robot controller of the robot.
  • a processor can in particular be a central processing unit (CPU), a microprocessor or a microcontroller, for example an application-specific integrated circuit or a digital signal processor, possibly in combination with a memory unit for storing program instructions, etc .
  • a processor can, for example, also be an IC (integrated circuit), in particular an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit), or e.g. B. a multi-chip module, z. B. a 2.5D or 3D multi-chip module, in which in particular several so-called dies are connected to each other directly or via an interposer or a DSP (digital signal processor) or a graphics processor GPU (graphic processing unit) act.
  • a processor can also be understood to mean a virtualized processor, a virtual machine or a soft CPU.
  • the term "result data set” or “refined result data set” refers to a data set in which the label, i.e. in particular the location and/or orientation or position and optionally the object type (e.g. screw, workpiece plate) of the object is encoded.
  • the gripping instructions can be calculated on the basis of the (refined) result data set.
  • grip transformation matrix from gripper to object coordinates FGO
  • detected object position/orientation transformation from object to robot coordinates FOR.
  • Gripping instructions can be processed by the robot controller in order to control the robot with its end effectors to carry out the object type-specific gripping task.
  • the (refined) result data set must be accompanied by a grip or grip positions "be combined.
  • the handle for the respective object is transmitted as a data set from the central training computer to the local processing unit.
  • the grip encodes the intended relative position and/or orientation of the gripper relative to the object to be gripped. This relationship (position/orientation of the gripper—position/orientation of the object) is advantageously calculated independently of the location and/or position of the object in space (coordinate transformation).
  • the local computing unit “only” calculates the target position and/or orientation of the end effector for the handle. This data set is transferred to the robot controller. The robot controller calculates a path to bring the end effector from the current state to the target state and converts this into axis angles using inverse kinematics.
  • the network interface is used to transmit parameters (weights), in particular pre-training parameters and/or post-training parameters for instantiating the pre-trained or post-trained ANN, from the central computer to the at least one local processing unit.
  • the network interface can be used to transmit the refined result data record generated on the at least one local processing unit as a post-training data record to the central training computer for post-training.
  • the network interface can be used to load the geometric, object-type-specific 3D model on the local processing unit. This can be triggered via the user interface, e.g. by selecting a specific object type.
  • the 3D model e.g., a CAD model, can be loaded from a model store or from the central training computer.
  • labeled or annotated post-training data can be generated on the local computing unit from the image data recorded locally with the optical detection device, which were supplied to the ANN for evaluation, and the synthesized reference image data in an automatic process, namely an annotation algorithm. which are transmitted to the central training computer for the purpose of post-training.
  • the modified ICP algorithm compensates for the weaknesses of the (only) pre-trained network by using strong geometric boundary conditions.
  • the system has a user interface.
  • the user interface can be designed as an application (app) and/or as a web interface, via which an API exchanges all data relevant to the method between the central training computer and the actor.
  • an API exchanges all data relevant to the method between the central training computer and the actor.
  • the user interface is intended to provide at least one selection field to determine an object type of the objects to be grabbed. This can be evaluated as a trigger signal in order to transmit the specific object type to the central training computer, so that the central training computer loads the object-type-specific 3D model from a model memory in response to the specific object type in order to use a synthesis algorithm to generate object-type-specific images in all physically plausible positions and/or orientations that serve as the basis for pre-training the neural network.
  • the synthesis algorithm processes mechanical and/or physical data of the object type such as center of gravity, size and/or stable position data in order to only render physically plausible positions and/or orientations of the object.
  • object data that represent the object in physically possible positions should be rendered.
  • Physically possible positions are in particular the calculated stable positions.
  • An unstable position e.g. the screw is not on its point
  • This has the advantage that computing capacity can be saved and unnecessary data storage and data processing can be avoided!
  • the neural network is trained to provide at least one (depth) image (alternatively: cumulatively to an RGB image) of an object of an object type, object recognition and position detection of the object in the coordinate system of the robot's work area, including detection of the orientation of the object - as a result data set - to output.
  • the neural network can preferably be additionally trained to provide a reliability of the output in the form of a reliability data set.
  • the neural network can be implemented as a deep neural network (DNN).
  • the neural network may have a Votenet architecture.
  • Votenet comprises three modules: first, a backbone for learning local features, second, an evaluation module for evaluating and/or accumulating the individual feature vectors, and third, a conversion module that is designed to convert a result of the accumulation into object detections.
  • the network interface can comprise synchronization means with a broker implemented as a microservice (e.g. RabbitMQ).
  • a broker implemented as a microservice (e.g. RabbitMQ).
  • the data exchange between the local resources and the central training computer can take place exclusively via the local processing unit, which serves as a gateway.
  • the local processing unit acts as a gateway, on the other hand it also carries out all "transactions" because it is the only instance that can "reach" both interaction partners or sides, both the central processing unit and itself or the modules or units of local resources.
  • the local processing unit or the edge device in the local network is usually not accessible for applications in the cloud without additional technical effort (e.g. a VPN).
  • the gripping instructions can include an identification data record which is used to identify at least one end effector suitable for the object from a set of end effectors of the end effector unit.
  • the gripping instructions generated on the local processing unit contain "only" the target location and/or orientation of the end effector for the handle.
  • the gripping instructions are then further processed on the robot controller to calculate a trajectory to bring the end effector from the current state to the target state.
  • the robot controller converts these into axis angles using inverse kinematics. So-called "process knowledge", ie data that defines the application of the robotic process, can preferably be supplied to the system.
  • the process knowledge includes, among other things, the identification data record that indicates the type of gripper and that is read in before the calculation of the gripping position and/or orientation (a vacuum gripper grips differently than a 2-finger gripper).
  • the process knowledge can be implemented in a motion program that runs on the robot controller.
  • the robot also gets the class from us and can choose a gripper depending on it.
  • the optical detection device comprises a device for the detection of depth images and optionally for the detection of intensity images in the visible or in the infrared spectrum.
  • the intensity images can preferably be used to verify the depth images. The quality of the method can thus be improved.
  • the acquisition device for acquiring the depth images and the intensity images can be implemented in a common device. Typically, depth and intensity cameras are integrated into one device. The depth image is calculated using all common measurement principles from one or more intensity images, e.g. using a fringe projection method.
  • the calculated gripping instructions can be supplied to a visualization algorithm, which is used to calculate a virtualized visualization of the gripping instructions that show the specific gripper gripping the object, the calculated virtualized visualization of the gripping instructions being output on a user interface .
  • the output of the virtualized representation makes it possible to carry out a manual verification, for example in order to avoid wrong grips due to an incorrectly determined object type and subsequent incorrect training.
  • a visualization of the grip (gripper relative to the object) is implemented so that the reliability of the detection can be checked before it is put into operation.
  • this visualization can be used during operation and during the recording of data for post-training omitted or is optional.
  • the visualization is rendered on a user interface that maps to local resources in the robot's environment.
  • the post-training of the neural network is carried out iteratively and cyclically following a transmission of post-training data in the form of refined result data sets, which include image data recorded locally by the optical detection device, which are automatically annotated and which are automatically annotated by the local computing unit have been transmitted to the central training computer.
  • the post-training is thus carried out on the central training computer, but the post-training data for this are accumulated on the local processing unit from actually recorded image data of the real objects in the robot's work area.
  • the retraining thus enables a specific retraining in relation to the respective object type. Even if the pre-training was already related to the object type by determining the 3D model, the objects to be grabbed within the object type can still differ, e.g. screws can have a different thread and/or a different length.
  • the post-training data set for post-training the neural network is gradually and continuously expanded by image data recorded locally by sensors at the local resources of the robot.
  • the invention relates to an operating method for operating a system according to one of the preceding claims, with the following method steps:
  • At least one local computing unit reading in pre-training parameters or post-training parameters of a pre-trained or post-trained ANN via the network interface in order to implement the pre-trained or post-trained ANN;
  • At least one local computing unit applying the pre-trained or post-trained AN Ns to the captured image data in order to determine the result data set;
  • At least one local processing unit Execution of a modified ICP algorithm, which, as input data, firstly evaluates the image data of the optical detection device that was supplied to the implemented ANN for use and secondly reference image data and compares them to each other in order to minimize the errors and to generate a refined result data set, wherein the reference image data is a synthesized and/or rendered image that is rendered to the result data set determined by the ANN based on the 3D model; - On the at least one local computing unit: calculating gripping instructions for the end effector unit of the robot based on the generated refined result data set;
  • the flow of generating real training data is as follows:
  • the central training computer On the central training computer: acquiring post-training data via the network interface, the post-training data comprising the labeled or annotated real image data which have been acquired with the optical acquisition device;
  • the invention relates to a central operating method for operating a central training computer in a system as described above.
  • the central operating procedure corresponds to the system and corresponds to a Hardware solution, while the method represents the software implementation.
  • the procedure comprises the following procedural steps:
  • model memory in order to load the 3D model assigned to the detected object type, in particular a CAD model, and to generate synthetic object data therefrom, in particular using a synthesis algorithm, and to use them for the purpose of pre-training;
  • the post-training data comprising a refined set of results based on image data of objects in the working area of the robot that are actually recorded with the optical recording device and that have been annotated in an automatic process on the local computing unit;
  • the steps of acquiring post-training data, post-training and transmitting the post-training parameters are preferably carried out iteratively on the basis of newly acquired post-training data. This makes it possible for the system or method for object detection (what (class)?, position?, orientation?) for the automatic generation of gripping instructions to be continuously improved.
  • the invention relates to a local operating method for operating local computing units in a system, as described above, with the following method steps:
  • the acquisition of the image data for the respective object is triggered before the gripping instructions for the object are executed.
  • the acquisition of the image data can also be carried out while grasping objects that have already been recognized.
  • the end result in the form of the refined result data record is transmitted to the robot or its controller for executing the gripping instruction and is transmitted to the central training computer at the same time or in parallel or with a time delay.
  • the upload of the image data preferably runs parallel to the primary process (control of the robot).
  • the objects can be arranged in the training phase without compliance with restriction conditions and in particular in a box and partially superimposed, from which they are to be picked up by the end effectors. With each grip, image data is automatically recorded, which serves as training data. The training data are thus generated automatically with this method.
  • the objects when using the pre-trained ANN in a pre-training phase, can be arranged while complying with restriction conditions (ie simplified), in particular on one level and disjunctively in the workspace.
  • restriction conditions ie simplified
  • the objects in the working area can be arranged without observing restriction conditions (ie quite complex, eg overlapping, partially obscuring, crooked, etc.).
  • the invention relates to a central training computer, as described above, having a memory on which an instance of a neural network is stored, the training computer being intended for pre-training and post-training the neural network used for object recognition and position detection , including detecting an orientation of the object, is trained to calculate gripping instructions for an end effector unit of the robot for gripping the object;
  • the central training computer is designed to read in an object type
  • the central training computer has a model interface to a model memory in which a geometric 3D model of the objects of the object type is stored for each object type and
  • the central training computer is designed to carry out a pre-training exclusively with synthetically generated object data, which serve as pre-training data, which are generated by means of the geometric, object-type-specific 3D model of the objects of the detected object type, and as a result of the pre-training pre-training parameters of a pre-trained neural network (ANN) are transmitted via a network interface to at least one local processing unit and
  • ANN pre-trained neural network
  • the central training computer is further designed to continuously and cyclically carry out a post-training of the neural network on the basis of post-training data and, as a result of the post-training, to transmit post-training parameters of a post-trained ANN via the network interface to the at least one local computing unit.
  • the invention relates to a local processing unit in a distributed system, as described above, wherein the local processing unit is intended for data exchange with a controller of the robot for controlling the robot and in particular its end effector unit for performing the gripping task for one object at a time, and - where the local processing unit is intended to store different instances of the neural network, in that the local processing unit is determined to receive pre-training parameters and post-training parameters from the central training computer, in order in particular to implement a pre-trained ANN that is continuously and cyclically replaced by a post-trained ANN, until a convergence criterion is met, and
  • the pre-trained or post-trained ANN is used in an inference phase by determining a result data set for the image data captured with the optical capture device
  • a modified ICP algorithm is executed, which, as input data, firstly evaluates the image data of the optical detection device, which was supplied to the implemented ANN for use, and secondly reference image data and compares them with one another in order to minimize the errors and to generate a refined result data set, wherein the reference image data is a synthesized or rendered image that is rendered to the result data set determined by the ANN on the basis of the 3D model and the refined result data set serves as a basis for calculating the gripping instructions for the end effector unit for gripping the object and sending them to the robot controller of the robot to carry out the gripping task.
  • the local processing unit comprises a graphics processing unit (GPU) which is used to implement the neural network.
  • GPU graphics processing unit
  • the operating method for execution on the local processing unit can be executed on the GPU.
  • the invention relates to a computer program, wherein the computer program can be loaded into a memory unit of a processing unit and contains program code sections in order to cause the processing unit to execute the method as described above when the computer program is executed in the local processing unit.
  • the computing unit can be the central training computer for executing the central operating method or the local computing unit for executing the local operating method.
  • the invention relates to a computer program product.
  • the computer program product can be stored on a data carrier or a computer-readable storage medium.
  • FIG. 1 shows an interaction diagram showing the data exchange between the entities involved in a system according to a preferred embodiment of the invention
  • FIG. 3 shows a schematic representation of a central training computer in communication with a number of local processing units
  • FIG. 4 shows a further schematic representation of a gripping process of the robot when gripping an object with the calculated gripping instructions
  • FIG. 5 shows a further schematic representation of a gripping process of the robot when gripping objects in a more complex arrangement in the work area, in particular in a box;
  • FIG. 6 shows a further schematic representation of a gripping process of the robot when gripping objects and when sorting the gripped objects, in particular when sorting into a box;
  • Figure 7 is a flow chart of a method of operation for execution on a system
  • FIG. 8 shows a flow chart of a central operating method for execution on a central training computer
  • FIG. 9 shows a flowchart of a local operating method for execution on a local processing unit
  • 10 shows an exemplary preferred implementation of the neural network as a vote net architecture
  • 11 shows a schematic representation of a synchronization mechanism for the asynchronous exchange of messages between the central training computer ZTR and the respective local processing unit LRE;
  • Fig. 12 UML diagram for the operation of the system, in particular during inference
  • FIG. 14 shows an example of a robot arm with an end effector unit consisting of 4 vacuum grippers and
  • 15 shows an exemplary representation of a 2-finger gripper.
  • the invention relates to the computer-implemented control of an industrial robot for a gripping task for gripping objects of different object types, such as screws, workpieces or intermediate products as part of a production process.
  • FIG. 1 shows an interaction diagram for data exchange between different electronic units for performing the gripping task mentioned above.
  • the system includes a central training computer ZTR and a number of local resources. “Local” here means in the area around the robot, i.e. arranged locally on the robot.
  • the local resources can include at least one local computing unit LRE, at least one optical detection device K and a robot R with a robot controller RS.
  • the local resources are in data exchange via a local network, in particular a wireless network, for example a radio network.
  • the local resources are connected to the central training computer ZTR via a WAN network (Wide Area Network, for example the Internet).
  • WAN network Wide Area Network, for example the Internet
  • the system is designed with a user interface U1, via which a user, referred to as an actor in FIG. 1, can interact with the system.
  • the method can be triggered by an object type or a specific object type being entered (for example screws) on the user interface Ul.
  • the object type data set is sent to the central Training computer ZTR transmitted.
  • the appropriate 3D model (for example as a CAD model) is loaded onto the central training computer ZTR.
  • a synthesis algorithm A1 can then be executed on the central training computer ZTR in order—based on the loaded 3D model—to synthesize or render images in selected shops and/or orientations.
  • the synthesis algorithm A1 is designed to bring the images into all physically plausible positions and/or orientations that are physically possible. In particular, the center of gravity of the object, its size and/or the respective working area are taken into account. Further technical details on the synthesis algorithm are explained below.
  • the synthesis algorithm A1 is used to visualize stable states of the respective object and to synthesize object data, which are the sole basis (input) for the pre-training of the neural network.
  • Grips can also be defined on the user interface Ul, which are transmitted to the central training computer ZTR in the form of a grip data record.
  • the pre-training data thus generated, based solely on CAD model data for the specific object type, is then used to pre-train an artificial neural network (ANN).
  • ANN artificial neural network
  • weights of the network ANN can be provided, with which it becomes possible to implement the neural network ANN.
  • the dishes are transmitted to the local processing unit LRE.
  • the 3D model of the specific object type is also loaded on the local computing unit LRE.
  • the pre-trained neural network ANN can then be implemented on the local computing unit LRE.
  • An annotation method can then be carried out on the local computing unit LRE on the basis of actually recorded image data bd that have been recorded with the optical recording device and in particular with the camera K.
  • the annotation method is used to generate post-training data, which are transmitted to the central training computer for post-training.
  • the training before or after training, takes place exclusively on the central training computer.
  • the data aggregation for post-training is carried out on the local processing unit LRE.
  • the robot controller RS triggers the process with an initialization signal that is sent to the local computer unit LRE.
  • the local processing unit LRE initiates then the triggering of an image recording by the camera K.
  • the camera K is preferably set up in such a way that it can capture the work area FB, B, T of the robot R.
  • the camera K can be designed to capture depth images and, if necessary, intensity images.
  • the image data bd actually captured from the object O are transmitted from the camera K to the local processing unit LRE in order to be evaluated there, ie on the local processing unit LRE. This is done using the previously implemented neural network ANN.
  • the result data record 100 is an intermediate result and includes annotations for the object O depicted in the image data bd.
  • the annotations can also be referred to as labels and include a position detection data record and an orientation data record and optionally a type or class of the respective object O.
  • a modified ICP algorithm A2 is used for fine localization.
  • the result of the modified ICP algorithm A2 serves as the final result and is represented in a refined result data set 200 and improves or refines the intermediate result that originates from the neural network calculation.
  • the gripping instructions with the specific gripping positions can then be calculated from this refined result data set 200 .
  • the gripping instructions can be transmitted to the robot controller RS for execution, so that it can calculate the motion planning of an end effector unit EE.
  • the robot controller RS can then cause the robot R to carry out the movement.
  • the image data bd captured by the camera K are transmitted from the local processing unit LRE to the central training computer ZTR for the purpose of post-training.
  • the final result data with the refined result data record 200 are transmitted from the local processing unit LRE to the central training computer ZTR for the purpose of post-training.
  • a post-training of the neural network ANN can then be carried out on the central training computer ZTR.
  • the post-training is thus based on image data actually captured by the camera K, in which the object O to be gripped is represented.
  • post-training parameters are provided in the form of changed weights g'.
  • the post-training parameters g′ are transmitted from the central training computer ZTR to the local processing unit LRE, so that the post-trained neural network ANN can be implemented and used on the local processing unit LRE.
  • image acquisition - application of the neural network ANN - execution of the modified ICP algorithm A2 - transmission of the image data bd and execution of the post-training on the central training computer ZTR - transmission of post-training parameters g"' can be repeated iteratively or cyclically until a convergence criterion is met and the neural network ANN is optimally matched to the object to be gripped.
  • FIG. 2 shows, in a further schematic illustration, the structural arrangement of the electronic components involved according to a preferred embodiment of the invention.
  • the central training computer ZTR interacts with the local resources via a network interface NS, comprising the local computing unit LRE, at least one camera K, the robot controller RS and the robot R with a manipulator M and a number of end effectors, such as grippers.
  • the local resources interact via a local area network LNW.
  • the robot controller RS can also exchange data directly with the central training computer via the network interface NS, provided it has an HTTPS client.
  • “tunneled” communication via the asynchronous protocol via the local processing unit LRE is preferred. This has the advantage that the system and thus the production remain operational, especially when the Internet connection is temporarily unavailable or not available in sufficient quality.
  • the local processing unit LRE acts as a cache, so to speak, since it can exchange data asynchronously with the central training computer ZTR whenever there is a connection.
  • a translation app is installed on the edge device or the local processing unit LRE.
  • FIG. 3 schematically shows the central training computer ZTR, which interacts with the local resources arranged on the robot R via a WAN, for example.
  • FIG. 4 shows the robot R in a schematic representation with an end effector unit EE, which is also indicated only schematically, and the camera K, which is aligned in such a way that it can see the object O to be gripped in the working area of the robot (FoV, field of view).
  • FIG. 5 shows the scenario similar to that in FIG. 4, except that the objects O are arranged in the working area of the robot R without any restriction conditions, ie for example in a box in a partially overlapping form. This arrangement of the objects O without restrictions makes the object identification and the detection of the position and/or orientation of the object O and thus also the calculation of the grasping instructions more complex.
  • Figure 6 again shows a scenario similar to that shown in Figures 4 and 5, with the difference that the objects O can be arranged here on a conveyor belt FB, shown schematically, in a container B and the robot R is commissioned to move the objects O placed in a transport container T.
  • Figure 7 is a flow chart of a central operating method for operating a system as described above. The method is executed in a distributed manner and includes method steps that are executed on the central training computer ZTR and on the local processing unit LRE.
  • an object type is read in in step S1. This takes place on the central training computer ZTR.
  • a model memory MEM-M is accessed in order to load the 3D model.
  • a render engine (renderer) is used to generate synthetic object data or synthesize image data on the basis of the 3D model.
  • the synthesis algorithm A1 is preferably used for this purpose.
  • the depth image is saved as a result together with the labels (i.e. in particular position and orientation and optionally class).
  • the pre-training of the neural network ANN is carried out on the basis of the previously generated synthetic object data in order to provide pre-training parameters in step S5.
  • the provided pre-training parameters in the form of weights, are transmitted to the at least one local processing unit LRE.
  • step S7 the pre- or post-training parameters are read into the local computing unit LRE.
  • step S8 the neural network ANN is then implemented or instantiated using the weights (parameters) that have been read in.
  • image data bd are captured with the camera K, which are supplied as input in step S10 to the currently implemented entity of the neural network ANN.
  • step S11 the neural network ANN provides a result data record 100, which can function as an intermediate result.
  • a modified ICP algorithm A2 can be applied to the result data record 100 in order to obtain a refined result data record 200 in step S13 calculate or generate.
  • step S14 gripping instructions are calculated from the generated refined result data set 200, which are exchanged with the robot controller RS in step S15 or are transmitted to the same.
  • step S16 post-training data is generated using an annotation algorithm A3.
  • step S17 the post-training data generated locally on the local processing unit LRE are transmitted to the central processing unit ZRE.
  • step S18 the transmitted post-training data is recorded in order to carry out post-training in step S19 on the basis of image data bd actually recorded on the local resources, so that in step S20 post-training parameters can be provided on the central training computer ZTR. These can then be transmitted to the at least one local processing unit LRE in step S21.
  • This post-training data can then be received and processed on the local processing unit LRE by implementing a post-trained neural network that can then be used with new image data.
  • the method may iteratively perform the steps related to the post-training until the method converges (indicated in Figure 7 as a dashed arrow pointing back from S21 to S7), or else it may terminate.
  • FIG. 8 shows a flow chart for a method that is executed on the central training computer ZRE. It relates to steps S1 to S6 and S18 to S21 from the steps described in connection with FIG.
  • FIG. 9 shows a flowchart for a method that is executed on the local processing unit LRE. It relates to steps S7 to S17 from the steps described in connection with FIG.
  • the system or method can perform a number of algorithms.
  • a synthesis algorithm A1 is applied, which serves to synthesize object data in the form of image data.
  • the synthesized object data are generated from the respective object type-specific 3D model.
  • a modified ICP algorithm A2 can be applied, which serves to generate reference (image) data in order to "score" or annotate the result data generated by application of the neural network.
  • an annotation algorithm A3 can be applied to generate this reference image data.
  • the annotation algorithm A3 is used to generate annotated post-training data.
  • the result data is accessed, which is calculated when using the neural network, namely the labels, in particular with position data, orientation data and possibly class identification data.
  • the 3D model is accessed for this data in order to determine the image data that “matches” or is assigned to this data. This image data is then used as reference image data.
  • FIG. 10 shows a preferred architecture of the neural network ANN.
  • the architecture of the neural network used essentially follows that of Votenet, essentially consisting of three modules:
  • an evaluation module for the evaluation and/or accumulation of individual feature vectors with layers for the interpolation of 3D points
  • the backbone is used to learn (optimal) local features.
  • each feature vector casts a vote for the presence of an object.
  • the voting module converts the votes from the voting module into object detections.
  • the neural network is preferably a deep network, DNN.
  • DNN deep neural networks
  • a feature is to be understood as a transformation applied to the raw data with the aim of filtering the disturbances from the raw data (e.g. influences of lighting and viewing angle) but at the same time preserving all information relevant to the task to be solved (e.g. the object position).
  • Feature extraction takes place in Votenet's input module ( Figure 10a).
  • Input to the first layer of the network is a point cloud, i.e. a set of points in three-dimensional space.
  • a point cloud i.e. a set of points in three-dimensional space.
  • this one has hardly any topological information: the neighborhood of two points is not immediately clear.
  • the calculation of a feature is but relies on topological information. Because: The gray value of a single pixel can possibly be found hundreds of times in one and the same image and is therefore not very meaningful. A pixel can only be clearly distinguished from pixels from other image regions and objects can only be differentiated from the background or other objects accordingly - at a higher level of abstraction - together with the gray values in its vicinity.
  • the lack of topological information is compensated for in the backbone by selecting M seed points, which are chosen to sample the point cloud uniformly.
  • M seed points which are chosen to sample the point cloud uniformly.
  • a fixed number of points in the vicinity of each seed point are aggregated and converted into a C-dimensional feature vector using multi-layer perceptrons (consisting of convolution operators coupled with a non-linearity).
  • the input into the voting module thus consists of the 3D position of the M seed points and a feature vector in each case. These vote for the presence of an object by moving in the direction of the focal point of possible detection. An accumulation of seed points indicates the actual presence of an object at that location. The amount of shifts is modeled by concatenating multiple perceptrons. Shifted M seed points in combination with their feature vectors are now present at the output of the voting module.
  • the result of the voting must be transferred to the B desired outputs of the entire network, including the object class, position and orientation of a cuboid enveloping body around the object ("bounding box") and an uncertainty of the respective estimate .
  • Similar scanning and grouping mechanisms come into play here as in the backbone this time applied to the set of seed points and their characteristics:
  • the network can detect a maximum of K different objects within the point cloud fed in, i.e. each seed point is assigned to one of K ⁇ M cluster centers.
  • a B-dimensional output vector is calculated from the elements of a cluster with the help of perceptrons.
  • the values M, N, C, K are so-called hyperparameters and are permanently selected in advance of the training.
  • the number and combination of individual layers within the individual modules are optimized for the application at hand.
  • the actual optimization is carried out using stochastic gradient descent. For more details, see Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization.” arXiv preprint arXiv: 1412.6980 (2014).
  • the synchronization or data exchange between the central training computer ZTR and the local processing unit LRE is described below with reference to FIG.
  • a so-called message broker e.g. RabbitMQ
  • RabbitMQ is run on the central training computer ZTR and the local processing unit LRE, also as a microservice. It provides a FIFO queue (First In First Out) on both sides. This is shown schematically in FIG. Orders for uploading or downloading can be stored in the queue.
  • a service on the local processing unit LRE processes the first message from both queues as soon as there are messages and there is a network connection to the respective broker.
  • Messages from the local broker's queue initiate a transfer from the local to the central processing unit (equivalent to the central training computer ZTR) and vice versa. After the transfer, the original and copy of the file are checked for integrity and equality. The message is only deleted from the respective queue after the transfer has been successfully completed.
  • the data exchange is based on an asynchronous communication protocol. This has the advantage that the method can also be operated on the local processing unit when there is no internet connection to the central training computer.
  • the image from the camera K is first fed into the neural network, which optionally outputs the class, but at least the position and orientation of one or more recognized components.
  • the result is usually too imprecise for a reliable grip.
  • it is further refined with the help of a modified ICP algorithm or a registration method, in that the expected and measured depth image (i.e. sensory recorded with the camera K) are compared with one another.
  • a modified ICP algorithm or a registration method, in that the expected and measured depth image (i.e. sensory recorded with the camera K) are compared with one another.
  • Robot and camera have a common coordinate system as a result of an initial calibration process ("hand-eye calibration"). This allows the detected position and orientation of an object together with the desired position and orientation of the gripper relative to the object can be converted into a gripping position and then handed over to the robot. This now takes over the planning and execution of the actual grip.
  • the neural network Before operation, the neural network must be trained on the basis of a data set of tuples from each image and the component or object contained therein.
  • the parameters of the neural network are optimized using a stochastic gradient descent method in such a way that the deviation between the output expected according to the training data set and the output calculated by the network is minimized.
  • the trained network is a generic model for object recognition, which, in the sense of a black box, assigns the position, orientation and class of all the components (objects) contained in an input image to an input image.
  • the recording of training data is time-consuming and expensive.
  • the training data is generated exclusively synthetically, i.e. by simulation in a computer graphics system.
  • the synthesis falls back on physical a priori knowledge, such as the stable positions under which an object can occur at all, or its symmetry properties.
  • Physical analysis, training data synthesis and the training are of high memory and runtime complexity and are executed on the central training computer with the appropriate performance.
  • the result of the training (weights/parameters of the neural network) is then distributed to one or more local computing units located in the local network of one or more robots.
  • the camera is also connected to the local processing units.
  • the robot now transmits a message via the local network to trigger image acquisition and evaluation.
  • the local processing unit responds with a gripping position. If more than one object is located, the system prioritizes the gripping positions according to certain criteria such as accessibility, efficiency, etc.
  • image analysis essentially consists of two steps:
  • Step 1 is often not accurate enough to perform the grip due to a lack of real training data.
  • Step 2 delivers very precise results, but requires sufficient initialization by step 1.
  • the disadvantages of both steps can be mutually compensated for by the following procedure: Localized in a bootstrapping phase (and grabs) the robot first parts under simpler environmental conditions, i.e. without restriction conditions, e.g. the parts are arranged disjointly on one level instead of overlapping one another in a box. Under these circumstances, the purely synthetically trained network is sufficient for initializing the registration algorithm.
  • the images of the bootstrapping phase can be annotated with the exact result from step 2 and transferred to the central training computer as a real training data set, where the neural network is optimized and transferred back to the local processing unit. This process can be continued iteratively even when the system is already operating in its target environment (eg the box) to further increase accuracy/reliability.
  • the central training computer can be one or more virtual machines in a public or private cloud, or just a single high-performance PC in the user's network. It forms a cluster (even in the limiting case of a single instance), whose elements communicate with each other via a network.
  • a number of microservices are executed on the central training computer using orchestration software (e.g. Kubernetes), e.g. for data storage, geometric analysis of CAD models, data synthesis and training of neural networks (see below).
  • the central training computer communicates with one or more local computing units via a WAN (e.g. the Internet).
  • a WAN e.g. the Internet
  • the local processing unit It is characteristic of the local processing unit that it is always connected to one or more robot controllers RS via a local network.
  • the connection to the central training computer via the WAN can be temporarily interrupted without disrupting the operation of the entire system.
  • the local processing unit in turn, can consist of one or more instances (VMs, PCs, industrial PCs, embedded systems) that form a (Kubernetes) cluster, similar to the central one. Again, all of the software runs as microservices in the form of containers.
  • the camera is at least capable of taking three-dimensional images of the scene, but also intensity images in the visible or infrared spectrum.
  • a three-dimensional image consists of elements (pixels) whose value is assigned to the distance or depth of the scene point that is depicted in each case.
  • the procedure for determining the depth information is irrelevant for the method described here. It is also independent of the choice of manufacturer.
  • the local processing unit downloads a suitable driver from the central training computer and runs it automatically as soon as a known camera is connected.
  • the robot is preferably a standard six-axis industrial robot, but simpler programmable manipulators, such as one or more linear units combined with one another, can also be used.
  • the choice of manufacturer is not relevant as long as the robot controller can communicate with the local processing unit via the network on the transport layer (OSI layer 4). Differences in the communication protocol (OSI layers 5-7) from different manufacturers are compensated for by a special translation microservice on the local processing unit.
  • the robot is equipped with a preferably generic gripping tool such as a vacuum suction cup or a jaw gripper.
  • special grippers are also conceivable, which are equipped with additional, for example tactile, sensors or are adapted to the geometry of the object in order to produce a form fit when gripping.
  • a polygonal geometric model of the part e.g. a triangular mesh
  • One or more handles i. i.e. possible positions/orientations of a suitable gripper relative to the coordinate system of the 3D model, and optionally:
  • This data is transmitted by the user to the central training computer either via a website or an app together with metadata about the product/object to be recognized (name, customer, article number, etc.).
  • the method is based on the following kinematic model of a component or object: We first assume that all objects lie on a plane P. This restrictive assumption can later be softened for removal from a box (see below). The position of an object relative to the local coordinate system of this plane is determined using a Euclidean transformation (R, t) consisting of a
  • an object can only assume a finite, discrete number of orientations.
  • a cube for example, is only placed on one of its six sides on a plane. His training should therefore be limited to these six states.
  • the object In each of the stable orientations (positions), the object can also be rotated around the respective vertical axis.
  • the stable states R are determined using a Monte Carlo method. This can be done as follows: The 3D model is placed in a physical simulation environment at a fixed distance above the plane and dropped. Optionally, the density inside the model (or parts of it) can be specified. The simulation system solves the equations of motion of the falling object and its collisions with the earth. The process is repeated over a large number of randomly selected starting orientations. A histogram is calculated for all final orientations modulo the rotation around the vertical axis. The maxima of this histogram correspond to the stable states sought. The scanning of the rotation group SO (3) when choosing the start orientation must be done with great care in order to avoid a distortion of the estimator (bias).
  • the object's pose relative to the plane's coordinate system is also preserved.
  • the z-axis of this coordinate system is orthogonal to the plane, the x and y components of t> can be discarded. It is precisely this that must be determined when localizing the object. In practice, however, their values are also limited, either by the finite extent of the plane P or by the field of view of the camera.
  • the Z component of is saved for further processing. It is used to place the object on the plane without any gaps during the synthesis of the training data (see below).
  • the value range of the angle of rotation ⁇ p can also be further limited. Due to periodicity, it is generally in the interval [0, 2n]. If there is rotational symmetry around the vertical axis, this interval becomes smaller. In the extreme case of a cylinder that of its base, it shrinks to a single value of 0. The geometric image of the cylinder is independent of its rotation around the vertical axis.
  • the value range of the cube z. B. is, as can be easily seen, [0, 0.57?].
  • the range of values of ⁇ is determined fully automatically as follows: In the simulation environment already described above (for each stable state) a series of top views is rendered by varying the angle ⁇ . By calculating the distance of each image from the first image of the series in terms of the /.2 norm, one obtains a scalar-valued function s over the rotation angle. This is first cleaned of its mean value and transformed into the frequency domain by means of a fast Fourier transformation. The maximum of
  • the search space reduced as a result of the geometric analysis must be scanned evenly. Only positions/orientations shown during training can also be reliably recognized during operation.
  • the local coordinate system of the 3D model is placed at a distance ti, z to the plane, on a Cartesian grid with values between min t x and max t x and min t y and max y .
  • the angle of rotation ⁇ p is also varied in the range [min cp, max ⁇ determined during the analysis phase.
  • a depth image is rendered from the discretized search space for each object position and orientation using a virtual camera whose projection properties match those of the real camera used in operation.
  • Each image is provided with information about the position and orientation (state /, rotation matrix R ⁇ i, and lateral position (t x , t y ) on the plane) of the components it contains.
  • the network can only learn invariance to disturbances if there are representative images in the training data set. For each of the constellations described above, additional images are generated by simulating various disruptive influences. In particular, will
  • this data has been stored in a (local) database before operation on the local computing unit LRE.
  • a (local) database before operation on the local computing unit LRE.
  • this request can come from an app/website that Robot controller RS or another overarching entity (e.g. PLC), possibly with the help of a translation microservice (see above).
  • All data associated with the product and relevant for object recognition is loaded from the database into the main memory, including:
  • a number of handles i.e. physically possible positions/orientation of the gripper relative to the object, which were defined by the user for the product before operation, e.g. via the web interface.
  • the first batches of product are laid out on the level where it would otherwise be, rather than in the target container.
  • the parts are essentially arranged disjunctively, but can otherwise occur in all possible positions and orientations. This corresponds to an application under restriction conditions.
  • the robot controller RS sends a signal to the local computing unit LRE. This triggers the recording of a depth image and, if a corresponding sensor is present, also the recording of a two-dimensional intensity image. All images are stored in the memory of the local processing unit and are transferred from the synchronization service to the central processing unit as soon as there is a network connection. Each saved image is provided with a unique identification number to enable assignment.
  • Each detected object position (location, orientation and stable state) is added to a queue and prioritized according to the reliability of the estimate, which also comes from the neural network.
  • Another service in particular the modified ICP algorithm, processes this queue according to priority.
  • he refines the initial position/orientation estimate by minimizing the error between the measured depth image and the depth image rendered using the 3D model using a variant of the iterative closest point algorithm (ICP). The result is the position/orientation of the 3D model relative to the camera's coordinate system.
  • ICP iterative closest point algorithm
  • the robot controller RS now obtains the next best handle from this queue via further network requests. From this, the robot controller calculates a linear change in the joint angle (point-to-point movement) or a Cartesian linear path on which the gripper can reach the target state without obstacles, and executes this.
  • the gripper is activated at the target position (e.g. by building up a vacuum, closing the gripper jaws, etc.), so that the work cycle can then be completed with the desired manipulation (e.g. inserting into a machine, placing on a pallet, etc.).
  • the robot As soon as the robot has successfully picked up a part, it signals the vision system that, in addition to the identification number of the current image (current images if an intensity image was taken), the final position/orientation of the object relative to the camera is "labeled" in a database can be stored in the local processing unit LRE. These are also sent by the synchronization service to the central processing unit LRE.
  • connection between image data and labels can later be established via the identification number.
  • the process starts again with a new image acquisition as soon as the queue is empty and no more handles can be obtained.
  • FIG. 12 shows a UML diagram for the operation/inference of the neural network ANN.
  • the object to be grasped is placed in the field of view of the camera K either with (e.g. on a plane, disjointly distributed) or without restriction conditions (e.g. in a box, stacked or nested arbitrarily).
  • An image recording by the camera K is then triggered.
  • two processes are initiated, which are shown in two parallel strands in FIG.
  • the main process is shown on the left, which is used to calculate the grasping instructions for the final effect purity EE by evaluating the image data using Votenet.
  • the generation of post-training data is shown in the right strand.
  • the annotation algorithm A3 can be used for this.
  • the annotation algorithm A3 serves to to annotate image data bd captured with the camera K.
  • the annotation is based on synthesized reference image data, which are calculated from the 3D model on the basis of the determined result data set (evaluation of the neural network).
  • the annotated image data is then saved and transmitted to the central training computer ZTR as post-training data.
  • the neural network ANN is applied to the original image data bd captured by the camera K in order to determine the result data set.
  • the modified ICP algorithm A2 can then be applied to generate the refined result data set.
  • the gripping instructions are calculated and/or the respective grip for the object to be gripped is selected.
  • the handle in robot coordinates can be output and, in particular, transmitted to the robot controller RS.
  • the calculated labels, which are represented in the refined results data set, are saved and can be transmitted to the central training computer ZTR for the purpose of post-training.
  • FIG. 13 shows a UML diagram for training the neural network ANN.
  • the optical detection device and in particular the camera K can be designed in a preferred embodiment of the invention to detect both depth images and intensity images of the object O to be gripped.
  • the algorithm checks whether registered intensity images are present, in order to carry out an aggregation of the detected depth images and the detected intensity images if the answer is yes.
  • post-training can be carried out on the Votenet architecture with a 6-dimensional input layer (3 spatial coordinates, 3 color channels).
  • the depth images are aggregated and saved.
  • the post-training carried out on the central training computer ZTR leads to the generation of post-training parameters, which are distributed to the local processing unit LRE via the synchronization mechanism described above.
  • the neural network or the AI model can be refined by continuing the training.
  • the individual images are aggregated into two separate files, the first file containing the actual data for training and the second file containing independent data that validate the recognition performance of the neural network ANN based on metrics (validation data or reference data).
  • metrics are the metrics about the detection rate, which evaluates four different outputs over the validation data set in different ways: 1. An existing object is also detected ("true positives"). 2. An existing object is missed (“false negative”). 3. An irrelevant object is detected (“false positive”). 4. An irrelevant object is ignored (“true negative”).
  • the metrics differ from the loss function used to optimize the weights over the training data set. Small values of the loss function generally do not imply good metrics, so training success must always be evaluated based on both criteria.
  • the total number of real images should exceed the number of synthetically generated images from the first training run.
  • the input data for the neural network in particular the Votenet (cf. FIG. 10) consists of a set of points, each with 3 coordinates. If images are available from an intensity camera and this is calibrated relative to the depth camera, the color/intensity of a point can be used as an additional input feature.
  • the training is initialized with the weights/parameters from the first round. From this initialization, the gradient descent continues until it converges to a new stationary point. If necessary, individual layers are masked completely or temporarily, i.e. excluded from the calculation of the gradient.
  • the architecture of the network is slightly adapted for the follow-up training: Rotations whose axis of rotation runs tangentially to the plane are not regarded as disturbances (cf. above, the calculations for generating the object data on the basis of physically plausible positions and orientations), but are also learned, since Firstly, it can be assumed that such constellations are contained in the real data. Secondly, after the parameters have been synchronized with the local computing unit LRE, the process runs in the target environment, where the parts are placed in any orientation, e.g. in a box.
  • FIG. 14 shows an example of an end effector unit EE with 4 vacuum grippers that are mounted on a robot arm and, in this application, are set up to remove packaging (crates) from a conveyor belt.
  • 15 shows another example of an end effector unit EE with a 2-finger gripper.
  • the different types of gripping tools are suitable for different gripping tasks. Therefore, in a preferred embodiment of the invention, it is provided that in response to the specific object type detected on the local computing unit LRE (here, for example, individual screws or nails), an identification data record is generated in order to select the type of gripper for performing the respective gripping task from a quantity of grippers to select. Only then can the gripper type-specific gripping instructions be calculated with the position information. This is usually done on the robot controller RS in response to the specific object type.
  • the invention can be applied not only to the examples of end effectors mentioned, but also to other handling tools of the robot that have to be instructed by means of the gripping instructions.
  • the components of the local computing unit LRE and/or the central training computer can be realized distributed over a number of physical-technical products.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Robotics (AREA)
  • Computational Linguistics (AREA)
  • Mechanical Engineering (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Manipulator (AREA)

Abstract

Selon un aspect, la présente invention concerne un système distribué pour commander au moins un robot (R) lors d'une tâche de préhension pour prendre des objets (O) de différents types d'objets qui sont disposés dans une zone de travail (FB, B, T) du robot (R). Le système comprend un ordinateur central d'entraînement (ZTR) qui est conçu pour un pré-entraînement et pour un post-entraînement et au moins une unité de calcul locale (LRE) sur laquelle des données d'image réelle de l'objet (O) sont capturées et utilisées pour générer des données de post-entraînement.
EP22809169.0A 2021-11-04 2022-11-02 Commande d'un robot industriel pour une tâche de préhension Pending EP4326500A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP21206501.5A EP4177013A1 (fr) 2021-11-04 2021-11-04 Commande d'un robot industriel pour une tâche de préhension
PCT/EP2022/080483 WO2023078884A1 (fr) 2021-11-04 2022-11-02 Commande d'un robot industriel pour une tâche de préhension

Publications (1)

Publication Number Publication Date
EP4326500A1 true EP4326500A1 (fr) 2024-02-28

Family

ID=78528741

Family Applications (2)

Application Number Title Priority Date Filing Date
EP21206501.5A Withdrawn EP4177013A1 (fr) 2021-11-04 2021-11-04 Commande d'un robot industriel pour une tâche de préhension
EP22809169.0A Pending EP4326500A1 (fr) 2021-11-04 2022-11-02 Commande d'un robot industriel pour une tâche de préhension

Family Applications Before (1)

Application Number Title Priority Date Filing Date
EP21206501.5A Withdrawn EP4177013A1 (fr) 2021-11-04 2021-11-04 Commande d'un robot industriel pour une tâche de préhension

Country Status (2)

Country Link
EP (2) EP4177013A1 (fr)
WO (1) WO2023078884A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824064B (zh) * 2023-07-14 2024-06-04 湖南大学 点云数据模型生成方法、装置、计算设备及存储介质
CN117381800B (zh) * 2023-12-12 2024-02-06 菲特(天津)检测技术有限公司 一种手眼标定方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11941719B2 (en) * 2018-01-23 2024-03-26 Nvidia Corporation Learning robotic tasks using one or more neural networks
RU2700246C1 (ru) * 2019-03-21 2019-09-20 Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) Способ и система захвата объекта с помощью роботизированного устройства
DE102020128653B4 (de) * 2019-11-13 2024-03-14 Nvidia Corporation Greifbestimmung für ein Objekt in Unordnung

Also Published As

Publication number Publication date
WO2023078884A1 (fr) 2023-05-11
EP4177013A1 (fr) 2023-05-10

Similar Documents

Publication Publication Date Title
EP4326500A1 (fr) Commande d'un robot industriel pour une tâche de préhension
DE102018215057B4 (de) Maschinelles-Lernen-Vorrichtung, Robotersystem und maschinelles-Lernen-Verfahren
DE102019130048B4 (de) Ein Robotersystem mit Srückverlustverwaltungsmechanismus
DE102013203381B4 (de) Verfahren und system zum trainieren eines roboters unter verwendung einer von menschen unterstützten aufgabendemonstration
DE102020101767B4 (de) Steuerverfahren und steuerung für ein robotersystem
DE102016122678B4 (de) Werkstückpositions-/-Stellungsberechnungssystem und Handhabungssystem
DE112018002565B4 (de) System und Verfahren zum direkten Anlernen eines Roboters
DE202017106506U1 (de) Einrichtung für tiefes Maschinenlernen zum Robotergreifen
DE112017007398B4 (de) Steuervorrichtung, Greifsystem, Verteilersystem, Programm und Steuerverfahren
DE112017007397B4 (de) Steuervorrichtung, Greifsystem, Verteilersystem, Programm, Steuerverfahren und Herstellungsverfahren
DE102015111080B4 (de) Robotervorrichtung mit maschinellem Sehen
DE112017007399B4 (de) Steuervorrichtung, Greifsystem, Verteilersystem, Programm, Steuerverfahren und Herstellungsverfahren
DE112017007392B4 (de) Steuervorrichtung, Greifsystem, Verteilersystem, Programm, Steuerverfahren und Herstellungsverfahren
DE102014108287A1 (de) Schnelles Erlernen durch Nachahmung von Kraftdrehmoment-Aufgaben durch Roboter
DE102021107568A1 (de) Adaptive planung von zugriffen zum aufheben von behältern
DE102014102943A1 (de) Robotersystem mit Funktionalität zur Ortsbestimmung einer 3D- Kiste
CN108247637A (zh) 一种工业机器人手臂视觉防撞操控方法
DE102020116803A1 (de) System und verfahren zur objekterkennung auf der grundlage von bilddaten
DE102020214633A1 (de) Vorrichtung und Verfahren zum Steuern einer Robotervorrichtung
DE102020112099A1 (de) Ein robotersystem mit einem koordinierten übertragungsmechanismus
DE112022001108T5 (de) Systeme, vorrichtungen und verfahren für universalroboter
DE102019007186A1 (de) Robotersystem und Robotersteuerungsverfahren für kooperatives Arbeiten mit Menschen
DE112018007729B4 (de) Maschinelle Lernvorrichtung und mit dieser ausgestattetes Robotersystem
DE102021109036A1 (de) Vorrichtung und verfahren zum lokalisieren von stellen von objekten aus kamerabildern der objekte
DE102020214301A1 (de) Vorrichtung und verfahren zum steuern eines roboters zum aufnehmen eines objekts in verschiedenen lagen

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20231120

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR