WO2020142495A1

WO2020142495A1 - Multiple robot and/or positioner object learning system and method

Info

Publication number: WO2020142495A1
Application number: PCT/US2019/069077
Authority: WO
Inventors: Zhou TENG; Remus Boca; Thomas Fuhlbrigge; Johnny Holmberg; Magnus Wahlstrom
Original assignee: Abb Schweiz Ag
Priority date: 2018-12-31
Filing date: 2019-12-31
Publication date: 2020-07-09

Abstract

A robot system includes one or more moveable robots and one or more positioners, along with one or more cameras, which are used in conjunction to image a training or target object. An illumination source such as a light can also be used to illuminate the training or target object. A controller can command the robot(s), camera(s), and illumination source(s) to take a number of training images to assist in building a visual memory. The training process can include any variety of changes in position and/or orientation of the robot(s), positioner(s), and camera(s) as well as changes in illumination intensity, direction, etc. The training dataset can be evaluated against pre-determined images to ascertain the robustness of the trained model.

Description

MULTIPLE ROBOT AND/OR POSITIONER OBJECT LEARNING SYSTEM

AND METHOD

TECHNICAL FIELD

The present invention generally relates to robot and robot training, and more particularly, but not exclusively, to robotic systems and training for robotic systems.

BACKGROUND

Providing flexibility and increased productivity in robotic systems and training for robotic systems remains an area of interest. Some existing systems have various shortcomings relative to certain applications. Accordingly, there remains a need for further contributions in this area of technology.

SUMMARY

One embodiment of the present invention is a unique robotic system. Other embodiments include apparatuses, systems, devices, hardware, methods, and combinations for training robotic systems. Further embodiments, forms, features, aspects, benefits, and advantages of the present application shall become apparent from the description and figures provided herewith.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts an embodiment of a robot system.

FIG. 2 depicts an embodiment of a computer.

FIG. 3 depicts an embodiment of a robot system.

FIG. 4 depicts a flow chart embodiment of the present application.

FIG. 5 depicts an embodiment of a robot which can be provided direct to a customer with minimal to no training required.

DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS

For the purposes of promoting an understanding of the principles of the invention, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Any alterations and further modifications in the described embodiments, and any further applications of the principles of the invention as described herein are contemplated as would normally occur to one skilled in the art to which the invention relates.

With reference to FIG. 1 , a schematic of a robot 50 is shown which includes a number of moveable robot components 52 along with an effector 54 useful to manipulate and/or sense a target 56. In some forms the robot 50 can include a single moveable robot component 52. Additionally and/or alternatively, the robot 50 need not include an end effector 54. The robot 50 can be mobile in some embodiments, but other embodiments the robot 50 can be mounted upon a stationary base (e.g. FIG. 1 ). The robot components 52 can take any variety of forms such as arms, links, beams, etc which can be used to position the effector 54. The robot 50 can include any number of moveable components 52 which can take on different sizes, shapes, and other features. The components 52, furthermore, can be interconnected with one another through any variety of useful mechanisms such as links and gears 58, to set forth just two examples. The components 52 can be actuated via any suitable device such as electric actuators, pneumatic or hydraulic pistons, etc. The effector 54 can take any variety of forms such as a gripper, suction effector, belt, etc.

In many embodiments described herein, the target 56 will be understood to be a training object 56 which will be imaged to obtain training and evaluation data useful in the creation of a system capable of quickly recognizing and/or locating objects in working runtime. The training object 56 can be imaged using image capture device 57 which can take the form of any variety of

devices/systems including via cameras, radar, light curtains, etc. Although a single image capture device 57 is depicted in FIG. 1 , it will be appreciated that other embodiments may include two or more devices 57. As will be understood, the term“camera” can refer to a variety of devices capable of detecting electromagnetic radiation, whether in the visible range, infrared range, etc. Such “cameras” can also refer to 2D and/or 3D cameras. Reference may be made below to image capture device 57 as a camera 57, but no limitation is hereby intended that such device 57 is limited to a“camera” unless explicitly or inherently understood to the contrary.

The position and/or orientation of the training object 56 can be

manipulated by a positioner 59. The positioner 59 can take a variety of forms useful to manipulate the training object 56. In one form the positioner 59 can include one or more moveable components and/or effectors useful to change the orientation of the training object 56 for purposes of being imaged by the camera 57 at different orientations. As with the robot 50, the positioner 59 can be stationary, but in some forms may also be moveable. The robot 50, camera 57, and/or positioner 59 can be operated via a controller 55. The controller 55 can be one or more different devices useful to control one or more of the robot 50, camera 57, and positioner 59.

Turning now to FIG. 2, and with continued reference to FIG. 1 , a

schematic diagram is depicted of a computer 60 suitable to host the controller 55 for operating the robot 50. Computer 60 includes a processing device 64, an input/output device 66, memory 68, and operating logic 70. Furthermore, computer 60 can be configured to communicate with one or more external devices 72.

The input/output device 66 may be any type of device that allows the computer 60 to communicate with the external device 72. For example, the input/output device may be a network adapter, network card, or a port (e.g., a USB port, serial port, parallel port, VGA, DVI, HDMI, FireWire, CAT 5, or any other type of port). The input/output device 66 may be comprised of hardware, software, and/or firmware. It is contemplated that the input/output device 66 includes more than one of these adapters, cards, or ports.

The external device 72 may be any type of device that allows data to be inputted or outputted from the computer 60. In one non-limiting example the external device 72 can be the camera 57, positioner 59, etc. To set forth just a few additional non-limiting examples, the external device 72 may be another computer, a server, a printer, a display, an alarm, an illuminated indicator, a keyboard, a mouse, mouse button, or a touch screen display. Furthermore, it is contemplated that the external device 72 may be integrated into the computer 60. For example, the computer 60 may be a smartphone, a laptop computer, or a tablet computer. It is further contemplated that there may be more than one external device in communication with the computer 60. The external device can be co-located with the computer 60 or alternatively located remotely from the computer.

Processing device 64 can be of a programmable type, a dedicated, hardwired state machine, or a combination of these; and can further include multiple processors, Arithmetic-Logic Units (ALUs), Central Processing Units (CPUs), or the like. For forms of processing device 64 with multiple processing units, distributed, pipelined, and/or parallel processing can be utilized as appropriate. Processing device 64 may be dedicated to performance of just the operations described herein or may be utilized in one or more additional applications. In the depicted form, processing device 64 is of a programmable variety that executes algorithms and processes data in accordance with operating logic 70 as defined by programming instructions (such as software or firmware) stored in memory 68. Alternatively or additionally, operating logic 70 for processing device 64 is at least partially defined by hardwired logic or other hardware. Processing device 64 can be comprised of one or more components of any type suitable to process the signals received from input/output device 66 or elsewhere, and provide desired output signals. Such components may include digital circuitry, analog circuitry, or a combination of both.

Memory 68 may be of one or more types, such as a solid-state variety, electromagnetic variety, optical variety, or a combination of these forms. Furthermore, memory 68 can be volatile, nonvolatile, or a mixture of these types, and some or all of memory 68 can be of a portable variety, such as a disk, tape, memory stick, cartridge, or the like. In addition, memory 68 can store data that is manipulated by the operating logic 70 of processing device 64, such as data representative of signals received from and/or sent to input/output device 66 in addition to or in lieu of storing programming instructions defining operating logic 70, just to name one example. As shown in FIG. 2, memory 68 may be included with processing device 64 and/or coupled to the processing device 64.

The operating logic 70 can include the algorithms and steps of the controller, whether the controller includes the entire suite of algorithms necessary to effect movement and actions of the robot 50, or whether the controller includes just those necessary to receive data from the camera 57, determine a point cloud, find edges, utilize object recognition, and/or resolve position of the objects, among other functions. The operating logic can be saved in a memory device whether of the volatile or nonvolatile type, and can be expressed in any suitable type such as but not limited to source code, object code, and machine code.

The edge computer 76 can be structured to be useful in image processing and/or the learning described herein, and are contemplated to be as distributed device note (e.g. a smart device or edge device akin to the geographic

distribution of notes such as in the Internet of Things) as opposed to those functions being performed in a centralized cloud environment. Any data generated herein, however, can be shared to remote stations such as through cloud computing. In one form the edge device can be structured to receive data generated from the camera 57 and operate upon the data to generate point clouds, detect edges, etc. In one nonlimiting form the edge computer can be structured to detect edges in objects, such as by detecting discontinuities in brightness within an image. Such discontinuities can correspond to

discontinuities in depth and/or surface orientation, change in material property, variations in scene illumination, etc. Edge detection can sometimes, but not exclusively, be grouped in two categories: search-based and zero-crossing. The edge computer 76 may also be useful in filtering an image, as well as edge thinning to remove unwanted spurious points on the edges in an image. In some forms the edge computer 76 can also be based upon second-order derivatives of the intensity. Still further, the edge computer 76 can utilize techniques such as the phase stretch transform. In short, the edge computer 76 can be constructed using a variety of techniques and employ a number of associated computations which assist in identifying edges of a training object 56.

FIG. 3 also depicts an illumination source 80 useful to radiate the training object 56 which is positioned using the positioner 59. The illumination source 80 can be constructed to radiate in any number of manners, from focused to diffuse, low to high brightness, specific to broad based electromagnetic radiations, etc. Such electromagnetic radiation can occur over any range of suitable frequencies, from infrared, visible, ultraviolet, etc. In one non-limiting form, the illumination source emit light in the visible electromagnetic band at a variety of intensities. Additional and/or alternative to the above, the illumination source 80 can be positioned at a variety of locations around the training object 56, at a variety of distances, and be oriented to project in a variety of angles relative to the training object 56.

Robots are widely used in the assembly or production lines to automate a lot of manufacturing procedures today. However, to execute pre-programmed manufacturing operations on target parts/objects, it is necessary for robots to first locate them very accurately in the assembly or production lines.

Most existing solutions can be categorized into two methods: one is to take advantage of some fixtures to always place target parts/objects in fixed locations with fixed poses, and the other is to develop robust part/object features based on geometry or color to locate them through mounted sensors. However, in both methods, different kinds of solutions must be specifically developed for different kinds of parts/objects in different kinds of applications, because of the constraints of the specially designed fixtures or manually selected part/object features. Therefore, the weak generality and reusability of existing methods always cause a lot of additional development time and cost for new parts/objects or new applications. In extreme cases, everything has to be redone from the beginning.

Embodiments disclosed herein introduce a self-adaptive optimization system for robots with mounted sensors, which can automatically learn new target parts/objects and locate them in the working runtime.

In the training time, target parts/objects are placed on a positioner in different pose/scene settings; sensor(s) mounted on the robot (e.g. robot arms) are moved around target parts/objects to capture a number of representative images in different robot/sensor settings; then, all the captured images are preprocessed by edge computers and sent to computer servers for training. Based on the evaluation results of the trained system from existing images, computer servers might ask for more images of target parts/objects to continue the training until a robust performance of the trained system is achieved.

After the learning is finished, the trained system with the learned knowledge of target parts/objects can be directly used to locate them in the working runtime.

Embodiments described herein may be used to:

(1 ) Automatically optimize robot/sensor/scene settings to collect new images for object training which can improve the runtime performance of the trained system;

(2) Automatically optimize robot/sensor/scene settings to collect new images of objects to evaluate the runtime performance of the trained system in all different kinds of application scenarios;

(3) A complete control loop which manages and optimizes both the training and the evaluation of the system until the robust performance of the system is achieved;

(4) All image data collecting in the training and the evaluation is optimized. Our system tries to use the least amount of image data of target objects in training and evaluation to achieve the best runtime performance; and

(5) Automatically learn a system which tries to achieve the best runtime performance in different kinds of application scenarios. In many embodiments five separate hardware functions/components are included, such as:

(1 ) A positioner. Target parts/objects are placed on it in different pose/scene settings for learning;

(2) Sensors. Sensors are mounted on robots and used to capture a number of representative images of target parts/objects in different settings. Multiple sensors could be mounted on the same robot arm and be in different types to capture different types of part/object information;

(3) Robot arms and controllers (robots). Robots are used to move the sensors around target parts/objects to capture images in different robot/sensor settings;

(4) Edge computers. Edge computers control the motion of robot arms through robot controllers, preprocess all captured images and send images to computer servers for learning; and

(5) Computer servers. Servers run machine learning algorithms based on all collected images to model the knowledge of target parts/objects for different kinds of application purposes.

The training and evaluation useful to develop the generalized object learning described herein can involve the capture of several images set at different orientations of the training object 56 by the positioner 59, and/or as set using different lighting of the illumination source 80, and/or as set using different positions of the robot 50 and camera 57. The training can be conducted in any number of manners, with the step of evaluation useful to determine when training is sufficient. In one form a matrix of image capture locations, positioner orientations, and/or lighting conditions from the illuminator can be used to develop a range of images useful in the embodiments herein in the creation of a dataset for class of object library. A validation dataset of images can be used to test whether the training matrix has sufficiently covered the training object 56.

Training based on the images obtained during the training can include any number of different methodologies as will be appreciated. Learning can take the form of supervised, semi-supervised, or unsupervised learning, although many embodiments herein contemplate the use of unsupervised learning. Principal component analysis and cluster analysis can be used. Artificial neural networks, support vector machines, Bayesian networks, and genetic algorithms are contemplated in the learning stage. In one non-limiting embodiment, deep learning techniques are also contemplated. In short, any techniques and methods useful to conduct leaning of the training object are contemplated herein. The learning can be conducted in the same computer (server 78, controller 55, etc) or can be separate from the machine that performs any other particular function of the embodiments described herein.

FIG. 4 depicts an embodiment of a self-adaptive optimization procedure of object learning. The procedure starts with the section of a

robot/sensor/object/scene setting. Such a setting can include a position and/or orientation of the robot 50 (with accompanying position and/or orientation of the camera 57, position and/or orientation of the training object 56 via the positioner 59, and one or more parameters associated with the illumination source 80. Images are then taken via the camera 57 which are used to train the model. A validation data set (one or more images designated to test the trained model) is used to evaluate the robustness of the trained model. If the robustness test fails, the procedure loops to the beginning to select a new robot/sensor/object/scene setting. The ability to pick a new robot/sensor/object/scene setting can be through serial testing of a pre-defined test matrix, but can also be adaptive in the sense that intermediate robot/sensor/object/scene setting can be selected based upon the results of the evaluation step. It is also possible that entirely new robot/sensor/object/scene settings are developed based upon the results of the evaluation step. Once the evaluation step is passed, the procedure can cease and a validated, trained model can be used.

Additionally and/or alternatively, the learning can include:

(1 ) Place target parts/objects in certain pose/scene settings on the positioner;

(2) Adjust robot/sensor/scene parameters;

(3) Move the robot arm and collect new images of target parts/objects;

(4) New images are preprocessed by edge computers and sent to computer servers for new or continuous object training;

(5) Run machine learning algorithms to train new object models.

Depending on the application purposes, different kinds of training criteria and methods can be used; (6) Evaluate the performance of the trained system. For example, an optimized set of new images of target parts/objects can be collected for evaluation, combined with all failure images from previous evaluation tests;

(7) If the required system performance has not been achieved, optimize robot/sensor/object/scene parameters to collect new images for more training of object models.

Repeat the procedures (1 ) - (7) until the required system performance is achieved.

Embodiments described herein can be structured to robustly conduct object recognition, object detection, and pose estimation.

Building upon the embodiments described above with respect to FIGS. 1 - 4, an extension of the embodiments above (including all variations in the embodiments above) includes adding one or more robots to the single robot 50 above and/or one or more positioners to the single positioner 59. Such extension can greatly speed up the training and evaluation of the previous system and improve the performance of the system in the working runtime. The robots 5-0 can be the same, but can also be different. Likewise, the positioners 59 used can be the same but can also be different. The extension of the embodiments above by the inclusion of multiple robots 50 and/or multiple positioners 59 can be characterized by:

(1 ) Multiple robots and positioners are utilized in the training and evaluation of the system;

(2) Training and evaluation of the system can be done simultaneously; (3) Images for object training and evaluation in different

robot/sensor/object/scene settings can be collected simultaneously;

(4) Images of multiple different target objects/parts can be collected and the knowledge of them can be trained and evaluated by the same system simultaneously;

(5) The system can be trained and evaluated for multiple application purposes simultaneously.

To speed up the training and evaluation procedure, multiple robots 50 and positioners 59 can be utilized in the following ways:

(1 ) Robot(s): at least two robots are used if only one positioner;

(2) Positioner(s): multiple positioners with same or different target objects/parts can be used if only one robot;

(3) A robot with the current robot and sensor settings can collect images of different settings of same/different target objects/parts from multiple positioners;

(4) A positioner with the current pose/scene setting of a target object/part can be used to collect images by multiple robots in different robot and/or sensor parameters simultaneously;

(5) Training and evaluation of the system can be done by multiple robots simultaneously. In a simple case of two robots with one positioner, for example, one robot can be used to collect new images of target objects/parts with different robot and/or sensor settings for training, and the other can be used to collect new images of target objects/parts to evaluate the performance of the trained system simultaneously; (6) More than one robots can be used to collect new images of target objects/parts in different robot and/or sensor parameters for training

simultaneously;

(7) More than one robots can be used to collect new images of target objects/parts in different robot and/or sensor parameters to evaluate the performance of the trained system simultaneously;

(8) With multiple positioners and robots, images of different settings of same target objects/parts with different scene settings on different positioners can be collected for training and evaluation simultaneously;

(9) With multiple positioners and robots, different target objects/parts can be set on different positioners for image collecting simultaneously; therefore, different target objects/parts can be trained and evaluated simultaneously;

(10) With multiple positioners and robots, different settings of

same/different target objects/parts for different application purposes can be trained and evaluated by the same system simultaneously.

A method of training a robot system may include providing a first robot and a second robot, the first robot being coupled with a first camera and the second robot being coupled with a second camera, providing at least one training object, capturing a first plurality of training images with the first camera of the at least one training object while varying a parameter of the first robot or the first camera, capturing a second plurality of training images with the second camera of the at least one training object while varying a parameter of the second robot or the second camera, training a system to recognize the at least one training object based on the first plurality of training images or the second plurality of training images, and evaluating the system using pre-selected test images.

The capturing and training steps may be repeated if the evaluating indicates a deficiency in the system. The method may also include processing the first or second plurality of training images with an edge detector and communicating data from the edge detector to a computer server. The method may include positioning and/or orienting the at least one training object with at least one positioner. The at least one positioner may be structured to place the at least one training object in a plurality of positions and/or orientations. The method may also include radiating the at least one training object with a light source when capturing the first or second plurality of training images.

The at least one training object may include a first training object and a second training object. The first plurality of training images may be of the first training object, and the second plurality of training images may be of the second training object. The first training object and the second training object may be matching training objects. The parameter that is varied while capturing the first plurality of training images may be a different parameter than the parameter that is varied while capturing the second plurality of training images. The parameter that is varied while capturing the first plurality of training images may be a robot parameter, including robot motion. The parameter that is varied while capturing the second plurality of training images may be a camera parameter, including lighting condition. The first training object and the second training object may also be different training objects. The parameter that is varied while capturing the first plurality of training images may be the same parameter as the parameter that is varied while capturing the second plurality of training images.

The at least one training object may be one training object that is disposed between the first and second robots, and the first and second plurality of training images may be captured simultaneously. The parameter that is varied while capturing the first plurality of training images may be a different parameter than the parameter that is varied while capturing the second plurality of training images.

The system may be trained with the first plurality of training images, and the system may be evaluated with the second plurality of training images. Thus, the second plurality of training images may be the pre-selected test images. Evaluating the system may include assigning a score based on a number of the pre-selected test images that are correctly identified by the system after the training.

The system may be trained based on the first plurality of training images and the second plurality of training images. The method may also include providing a third robot coupled with a third camera, capturing a third plurality of training images with the third camera of the at least one training object while varying a parameter of the third robot or the third camera, the system being evaluated with the third plurality of training images, the third plurality of training images thereby being the pre-selected test images. With continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc), additional constraints can be placed upon the system to further speed and/or improve the training and evaluation process. The training and evaluation procedure in the previously introduced system can be very general, and attempts to achieve the best performance in all kinds of possible application scenarios. The learned system is very knowledgeable about target objects/parts, but such knowledge can consume large amount of time and hardware resources to finish the training and evaluation. On the other hand, most applications do not require such a knowledgeable system about target objects/parts, for the reason that there are only a limited number of different kinds of cases to deal with for most specific applications. Therefore, for most specific applications, such a knowledgeable system is overqualified and it wastes a lot of time and hardware resources to achieve such as a knowledgeable system about target objects/parts.

Again, continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc), further additional refinements introduce an application-case driven approach for object learning with robots in the previous system. The main idea is that, for most specific applications, a lot of settings/parameters about robots, sensors, target

objects/parts and runtime scenes are known and fixed in certain specific ranges. All these known settings/parameters should be used in the training and evaluation procedure of the system. By adding more constraints in the training and evaluation procedure, we can make it much faster and also achieve same robust performance in the working runtime for target applications.

Given the additional constraints in the refinements of the embodiments described above, the following features will be appreciated in these refinements:

(1 ) The training and evaluation procedure of the system can be driven by specific application cases; all robot/sensor/object/scene settings in the training and evaluation procedure are optimized based on the requirements of specific application cases;

(2) Other specific knowledge of each application which can help improve the system performance is also utilized in object learning of the system for that application.

The training and evaluation procedure in refinements discussed immediately above can be sped up in the following methods, based on the constraints and requirements of different specific application cases:

(1 ) Restrict the motion range of target objects/parts in the images. In most specific application cases, target objects/parts might only appear in a specific range of locations in the images of the working runtime. Therefore, it is not necessary to always run the algorithms of training and evaluation in the whole images. Less background distraction can be introduced;

(2) Restrict the settings/parameters of sensors. For example, distance, exposure time, viewpoints, and so on. Images for training and evaluation of target objects/parts are only needed to be captured in the application-case required settings/parameters of sensors; (3) Restrict the settings of target objects/parts. Target objects/parts might be only set in certain ranges of poses in the working runtime. As a result, it is not necessary to consider all possible poses of target objects/parts in the training and evaluation;

(4) Restrict the settings of the scenes according to specific application cases. For example, light conditions, occlusions;

All (2), (3) and (4) can bring in less variance about the appearances of target objects/parts in the training and evaluation;

(5) If certain objects always co-appear with target objects/parts in the application scenarios, for example, bins, they can also be used to help locate target objects/parts and they can be included in the training and evaluation;

and/or

(6) With more constraints as above and smaller regions on the images to be focused on, high resolution images which provide more details of target objects/parts can be used for training and evaluation and they can help improve the performance of the system.

A method of training a robot system may include providing a robot coupled with a camera, setting a robot or camera parameter, capturing a training image of a training object with the camera using the robot or camera parameter, changing the setting and capturing another training image and repeating such setting and capturing to obtain a plurality of training images based on different settings, training a system to recognize the training object based on the plurality of training images, and evaluating the system using pre-selected test images. The setting may be a robot position which is limited to a range of motion when the robot is repositioned for the capturing steps. The setting may be a camera distance from the training object which is limited during the capturing steps. The setting may be an exposure time of the camera which is limited during the capturing steps. The setting may be a pose of the training object which is limited when repositioning the robot and/or positioner for the capturing steps. The setting may be a lighting condition which is restricted to an upper and/or lower limit during the capturing steps.

A worktime object may be introduced into the training over the course of one or more iterations as the robot and/or positioner are repositioned and the training images are collected. The worktime object may be a bin. The training object may be located within the bin.

The method may also include radiating the training object with a light source when capturing the training images with the camera.

The plurality of training images may be split into a first group of training images and a second group of training images. The system may then be trained with the first group and the system may be evaluated with the second group. Thus, the second group may be treated as pre-selected test images. The system may also assign a score based on a number of the pre-selected test images that are correctly identified by the system after the training to evaluate the system.

The method may also include determining an operational constraint of the robot or camera for a particular target application, such as a range of robot motion, camera distance, exposure time, training object pose or lighting condition, that will limit use of the robot in the target application. Changes to the robot or camera parameter setting may then be limited to correspond to the operational constraint when capturing images for training and evaluating the system.

The embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc) as well as the additional constraints discussed immediately above to further speed and/or improve the training and evaluation process (e.g. the application specific cases) can be extended to a system that accounts for objects of the same class. For example, the training object 56 that is used to develop a system that robustly recognizes the object in various poses, lighting conditions, etc can itself be a type of object that fits within a broader class of objects. To set forth just one possibility, the training object 56 may be a threaded machine screw having a rounded head that fits within a larger class of “screws.” The embodiments described above can be extended to train a system on one particular type of screw, further train the system on another type of screw (wood screw), and so on. Any number of different objects within a class can be used for training purposes with the intent that the trained system will be able to recognize any of the objects from within the class of objects. For example, a second training object 56 that falls within the same class as the original training object 56 can be put through the same range of robotic image capture locations, positioner orientations, lighting conditions, etc to develop its own range of images useful in the creation of a dataset for that particular object. The ability to form a visual data set on a class of objects using the procedures described herein can be applied to any of the embodiments (one or more robots, one or more positioners, edge detectors, etc).

With continued reference to the embodiments described above (single robot, multiple robots, edge detectors, training, camera(s), etc) as well as the additional constraints discussed immediately above to further speed and/or improve the training and evaluation process (e.g. the application specific cases) and the creation of a system capable of detecting objects within a class of objects, a system can be developed and provided to an end user which takes into account and utilizes information provided from the various training features described above. A vision system integrated with industrial robots typically require extensive training and evaluation for each new task or part. The training and evaluation is performed by a knowledgeable and experienced robot operator. In this patent application, we develop a system including a built-in perception controller and visual memory where a class of objects models are known. By providing the knowledge of known objects, the training of the robotic vision system for each part is removed because the class of parts / objects are already defined in the robot visual memory. The improvement of having built-in

perception controller, visual memory and perception functions provide the ability to deliver a robot system that can work out of the box, eliminating the need for training and evaluation. This can result in cost and time reductions, simplification of the robot interaction, and possible deployment of large number of robots in a short period of time. As used herein, the term“visual memory” will be understood to include any suitable model, dataset, and/or algorithm which is capable of being referenced and evaluated when a worktime image is collected and an attempt is made to identify the object in the worktime image from the visual memory.

Also as used herein,“worktime” can be contrasted with“training time” in that“worktime” involves tasks associated with a production process intended to result in a sales or servicing event for a customer, while“training time” involves tasks associated with training a robot and which may not result in a sales or servicing event. In this regard“worktime” may result in no training, and a “training time” event may result in no actual work production.

Industrial robots can utilize vision / perception systems to perform flexible and adaptive tasks. The vision systems integrated with an industrial robots typically need to be trained before use. There are many tasks a vision system integrated with industrial robots can be used for, such as: material handling, machine tending, welding, dispensing, paint, machining and so on. The vision system is used for example to recognize and locate parts / objects of interest in the scene. These parts of interest usually are standard within a customer, across segments or industries. Example of such parts are: screws, nuts, bolts, plates, gears, chains, hinges, latches, hasps, catches and so on. The vision system integrated with industrial robots typically need to be trained and tested for each part type in production conditions. For example, the training usually consists in selecting an area of interest in an image to select a part or section of the part, or collecting many images with different scales, viewpoints, light conditions, labeling them and then use a neural network to calculate the object model or models.

In order to simplify the robot operation for standard parts, described herein is a robotic perception system that has built-in perception tasks. A robot, for example, is able to recognize, locate, sort, count, calculate grasps or relative positions of the tool to the part of interest with built-in / visual memory functions that are deployed with a (brand new) robot. In addition to being able to perform perception tasks, such a robot have a tool specialized for handling screws. Such a robot is a robot knowing how to handle screws for example. In conclusion, such a system will include all the hardware and software components to handle only a class of parts, such as: vision sensors mounted on the arm or statically mounted, computational hardware, robot tool, end effector, software and algorithms.

It will be appreciated that the systems described herein that can provide a robot“out of the box” ready to detect and manipulate objects can provide the following features:

1 ) Complete robotic system, ready to be deployed for handling standard parts;

2) Build-in visual memory for a class of standard parts;

3) Robot tool suitable to handle the class of object built-in the system; and/or

4) Perception algorithms suited for the class of parts present in the visual memory. A built-in visual memory for a class of objects, the associated robot tool or tools to handle / manipulate the objects from the visual memory and the perception tasks / functions integrated in the robot controller are the critical components of a robot system that know how to pick parts without training. Such a system is a robotic system that know how to handle parts, e.g., a robot that knows how to pick screws, or nuts, or bolts, or gears ... out of the box. The robot includes all the tools, hardware and software, to start working the moment is set ready to work.

This robot system will know to handle / manipulate the objects that exist in the robot visual memory and this means that the robot perception functions can identify, recognize, check for presence, count, sort, locate, and generate grasps for the known objects. The robot perception functions are optimized for the known objects from the visual memory.

The robot system know how to handle a specific class of parts. The robot system knows, for example, to pick a screw from a box of screws and the robot operator needs to specify what the robot does with the screw after it was pick. The logic of the robot sequence after the screw was picked, for example, is specific for each installation, and has to be introduced by the robot operator.

Details about the hardware components:

a) Robot arm can have visual sensors integrated. There is an option to provide visual sensors that are statically mounted. In this case, additional steps are needed to install the visual sensors before operation. In both cases, the robot visual memory include complete functionality to solve perception tasks for the known objects;

b) Robot tool suitable for the known objects;

c) Robot controller responsible for the robot arm motion and necessary functions; and/or

d) Perception controller responsible for keeping the class of objects database, knowledge about the robot geometric structure with the kinematic parameters. It is also provide the connections to the visual sensors. The perception tasks / functions are implemented in the perception controller.

Details about software controller:

a) Robot visual memory is a collection of one or more models, e.g., neural network models, 3D models ... that allow a wide range of perception tasks to be solved;

b) Robot perception tasks include a set of perception functions that can be solved using the models from the robot visual memory and the visual data measured with the visual sensors; and/or

c) Robot geometric and kinematic data stored on the perception controller that can include: CAD models of the robot arm and robot tool, kinematic parameters of the robot arm and tool.

An illustration of hardware components of an“out of the box” robot capable of detecting and manipulating objects from a class of objects is depicted in FIG. 5. The perception controller is in communication with the robot controller structured to control one or more functions of the robot (e.g. activate an actuator), visual sensors used to detect robot surroundings (e.g. a camera), a visual memory which includes the class of objects dataset, as well as the perception tasks.

While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only the preferred embodiments have been shown and described and that all changes and modifications that come within the spirit of the inventions are desired to be protected. It should be understood that while the use of words such as preferable, preferably, preferred or more preferred utilized in the description above indicate that the feature so described may be more desirable, it nonetheless may not be necessary and embodiments lacking the same may be contemplated as within the scope of the invention, the scope being defined by the claims that follow. In reading the claims, it is intended that when words such as“a,”“an,”“at least one,” or“at least one portion” are used there is no intention to limit the claim to only one item unless specifically stated to the contrary in the claim. When the language“at least a portion” and/or“a portion” is used the item can include a portion and/or the entire item unless specifically stated to the contrary. Unless specified or limited otherwise, the terms“mounted,”

“connected,”“supported,” and“coupled” and variations thereof are used broadly and encompass both direct and indirect mountings, connections, supports, and couplings. Further,“connected” and“coupled” are not restricted to physical or mechanical connections or couplings.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A method comprising:

(a) providing a first robot and a second robot, the first robot being coupled with a first camera and the second robot being coupled with a second camera;

(b) providing at least one training object;

(c) capturing a first plurality of training images with the first camera of the at least one training object while varying a parameter of the first robot or the first camera;

(d) capturing a second plurality of training images with the second camera of the at least one training object while varying a parameter of the second robot or the second camera;

(e) training a system to recognize the at least one training object based on the first plurality of training images or the second plurality of training images; and

(f) evaluating the system using pre-selected test images.

2. The method of claim 1 , repeating steps (c) through (e) if the evaluating indicates a deficiency in the system.

3. The method of claim 1 , which further includes processing the first or second plurality of training images with an edge detector.

4. The method of claim 3, which further includes communicating data from the edge detector to a computer server.

5. The method of claim 1 , which further includes positioning and/or orienting the at least one training object with at least one positioner.

6. The method of claim 5, wherein the at least one positioner is structured to place the at least one training object in a plurality of positions and/or orientations.

7. The method of claim 1 , wherein the at least one training object comprises a first training object and a second training object, the first plurality of training images being of the first training object, and the second plurality of training images being of the second training object.

8. The method of claim 7, wherein the first training object and the second training object are matching training objects.

9. The method of claim 8, wherein the parameter varied while capturing the first plurality of training images is a different parameter than the parameter varied while capturing the second plurality of training images.

10. The method of claim 9, wherein the parameter varied while capturing the first plurality of training images is a robot parameter and the parameter varied while capturing the second plurality of training images is a camera parameter.

11. The method of claim 10, wherein the parameter varied while capturing the first plurality of training images is a robot motion and the parameter varied while capturing the second plurality of training images is a lighting condition.

12. The method of claim 7, wherein the first training object and the second training object are different training objects.

13. The method of claim 12, wherein the parameter varied while capturing the first plurality of training images is a same parameter as the parameter varied while capturing the second plurality of training images.

14. The method of claim 1 , wherein the at least one training object comprises one training object disposed between the first and second robots, the first and second plurality of training images being captured simultaneously.

15. The method of claim 14, wherein the parameter varied while capturing the first plurality of training images is a different parameter than the parameter varied while capturing the second plurality of training images.

16. The method of claim 1 , wherein the system is trained with the first plurality of training images and the system is evaluated with the second plurality of training images, the second plurality of training images thereby being the pre selected test images.

17. The method of claim 16, wherein evaluating the system includes assigning a score based on a number of the pre-selected test images that are correctly identified by the system after the training.

18. The method of claim 1 , wherein the system is trained based on the first plurality of training images and the second plurality of training images.

19. The method of claim 18, which further includes providing a third robot coupled with a third camera, capturing a third plurality of training images with the third camera of the at least one training object while varying a parameter of the third robot or the third camera, the system being evaluated with the third plurality of training images, the third plurality of training images thereby being the pre selected test images.

20. The method of claim 1 , which further includes radiating the at least one training object with a light source when capturing the first or second plurality of training images.