US20210205986A1 - Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose - Google Patents

Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose Download PDF

Info

Publication number
US20210205986A1
US20210205986A1 US17/146,885 US202117146885A US2021205986A1 US 20210205986 A1 US20210205986 A1 US 20210205986A1 US 202117146885 A US202117146885 A US 202117146885A US 2021205986 A1 US2021205986 A1 US 2021205986A1
Authority
US
United States
Prior art keywords
robot
operator
task
machine learning
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/146,885
Inventor
Simon Kalouche
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nimble Robotics Inc
Original Assignee
Nimble Robotics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nimble Robotics Inc filed Critical Nimble Robotics Inc
Priority to US17/146,885 priority Critical patent/US20210205986A1/en
Assigned to QOOWA, INC. reassignment QOOWA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KALOUCHE, SIMON
Assigned to Nimble Robotics, Inc. reassignment Nimble Robotics, Inc. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: QOOWA, INC.
Publication of US20210205986A1 publication Critical patent/US20210205986A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/003Programme-controlled manipulators having parallel kinematics
    • B25J9/0063Programme-controlled manipulators having parallel kinematics with kinematics chains having an universal joint at the base
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1689Teleoperation
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/36Nc in input of data, input key till input tape
    • G05B2219/36184Record actions of human expert, teach by showing
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40002Camera, robot follows direction movement of operator head, helmet, headstick
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40116Learn by operator observation, symbiosis, show, watch
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40413Robot has multisensors surrounding operator, to understand intention of operator
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S901/00Robots
    • Y10S901/02Arm motion controller
    • Y10S901/03Teaching system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S901/00Robots
    • Y10S901/02Arm motion controller
    • Y10S901/09Closed loop, sensor feedback controls arm movement

Definitions

  • the disclosure relates generally to teleoperation of robots and specifically to teleoperation of robots based on a pose of a human operator.
  • teleoperation of robots having multiple degrees of freedom is accomplished using complex controllers that may be specifically designed for a particular robot arm.
  • these controllers may be as simple as using a joystick, but more commonly these controllers are complicated devices, such as body worn exoskeletons that map the exoskeleton's joint angles to the robot's joint angles.
  • handheld or worn hardware is used to teleoperate the robot.
  • the teleoperation of a high DOF robot is challenging, not intuitive, and slow because of the lack of direct mapping from joysticks and buttons to the many degrees of freedom of the robot.
  • controllers provide a relatively cheap method of teleoperating a robot, they require significant training or automation to handle low-level functionality and are typically not time efficient.
  • a robot having two or more legs (a high DOF system) operated in real-time using a controller would require low-level algorithms for balancing the robot to be autonomously handled, while the controller or joystick would be used for high-level commands (e.g., which direction and speed the robot should ambulate in).
  • controlling a robot arm using joysticks requires the joystick to map 6 DOF or more into 2 or 3 DOF interfaces of the joystick, which is not intuitive and can lead to slow teleoperating speeds for even simple tasks.
  • an exoskeleton can be worn to control a robot, which may allow for more intuitive and direct control of a robot arm with a morphology that is similar to the arm of a human operator.
  • This method of teleoperation is easier for the operator to learn and can integrate haptic feedback to allow the operator to feel forces that the robot is sensing when it interacts with its environment.
  • exoskeletons are complex systems that are expensive, not easily donned or doffed, not portable or mobile, and typically not accommodating for differences in limb or body size from one operator to another.
  • Another alternative for teleoperation is the use of motion capture systems.
  • Embodiments relate to teleoperation of a robot of a robotic system based on a pose of an operator. Teleoperation indicates operation of a system or machine at a distance.
  • the system includes an image capturing device and an operator system controller that are remotely located from a robotic system controller and a robot.
  • the image capturing device captures an image of a subject (i.e., operator).
  • the operator system controller is coupled to the image capturing device and maps a processed version of the captured image to a three-dimensional skeleton model of the subject.
  • the operator system controller generates body pose information of the subject in the captured image.
  • the body pose information indicates a pose of the subject in the captured image.
  • the robotic system controller communicates with the operator system controller over a network.
  • the robotic system controller generates a plurality of kinematic parameters of a robot by processing the body pose information received from the operator system controller based on a configuration of the robot.
  • the robotic system controller controls one or more actuators of the robot according to the plurality of kinematic parameters, causing the robot to take a pose corresponding to the pose of the subject in the captured image.
  • FIG. 1 illustrates a block diagram of a system for teleoperation of robotic systems, according to an embodiment.
  • FIG. 2 illustrates a block diagram of an operator system controller, according to one embodiment.
  • FIG. 3 illustrates a block diagram of a robotic system controller, according to one embodiment.
  • FIG. 4 illustrates a flowchart of a method for teleoperating a robot by mapping a pose of an operator, according to one embodiment.
  • FIG. 5 illustrates a schematic block diagram of a training phase of an imitation learning engine, according to one embodiment.
  • FIG. 6 illustrates a schematic block diagram of an operational phase of the imitation learning engine, according to one embodiment.
  • Embodiments relate to allowing an operator to wirelessly and intuitively control the joint space and/or end-effector space of a remotely located robot by simply moving one's hands, arms, legs, etc. without the need for traditional external calibrated motion capture systems, worn exoskeletons/sensors, or traditional but unintuitive joysticks.
  • tasks that robots are currently unable to accomplish autonomously can be executed semi-autonomously via human teleoperation while the recorded data of how the human operator guided the robot to accomplish the arbitrary task can be used as training examples to use to enable robots to learn how to accomplish similar tasks in the future.
  • One embodiment for a method of teleoperating a robot based on a pose of a subject includes two major steps: (i) generating body pose information of the subject in a captured image, and (ii) generating a plurality of kinematic parameters of the robot based on the generated body pose information of the subject in the captured image.
  • an algorithm is used to localize an array of body parts of the subject in the captured image.
  • the algorithm projects the localized body parts of the subject onto a three-dimensional (3D) skeleton model of the subject.
  • the 3D skeleton model is output as an estimate of the pose and is used for estimating and tracking the poses of the subject in a next captured image.
  • the 3D skeleton model is then mapped, directly or indirectly, to a configuration of the robot to determine a plurality of joint angles of the robot that correspond to the position and/or orientation of the subject's pose.
  • a subject herein refers to any moving objects that have more than one pose.
  • the moving objects include, among other objects, animals, people, and robots.
  • embodiments herein are described with reference to humans as the subject, note that the present invention can be applied essentially in the same manner to any other object or animal having more than one pose.
  • the subject may also be referred to as an operator.
  • the localized body parts herein refer to any portion of the subject that can be conceptually identified as one or more joints and links.
  • the localized body parts include, among other parts, a head, a torso, a left arm, a right arm, a left hand, a right hand, a left leg, and a right leg.
  • the localized body parts can be subdivided into other parts (e.g., a left arm has a left upper arm and a left forearm, a left hand has a left thumb and left fingers).
  • the one or more body parts may be localized relative to a camera, an external landmark, or another point on the subject's body. Note that the number of localized body parts is not limited and can be increased or decreased according to the purposes of the pose estimation and tracking. Body parts may also be referred to herein as limbs, segments, and links, and vice versa.
  • a model herein refers to a representation of the subject by joints and links.
  • the model is a human body represented as a hierarchy of joints and links with a skin mesh attached.
  • Various models with joints and links can be used as the model of the subject.
  • the model is a subset of joints and links of the human body.
  • the model may be a hand that includes one or more of the following: a palm, a thumb, and a finger.
  • the skeleton model is referred to throughout, but it is understood that the skeleton model may not represent the full human body and instead may represent a portion of the human body.
  • FIG. 1 illustrates a block diagram of a system 100 for teleoperation of robotic systems 115 a - 115 d , according to an embodiment.
  • the system 100 includes, among other components, a network 105 that connects operator systems 110 a - 110 d (collectively referred to as “operator systems 110 ” and also individually referred to as “operator system 110 ”), robotic systems 115 a - 115 d (collectively referred to as “robotic systems 115 ” and also individually referred to as “robotic system 115 ”), and a processing server 120 .
  • a network 105 that connects operator systems 110 a - 110 d (collectively referred to as “operator systems 110 ” and also individually referred to as “operator system 110 ”), robotic systems 115 a - 115 d (collectively referred to as “robotic systems 115 ” and also individually referred to as “robotic system 115 ”), and a processing server 120 .
  • the network 105 provides a communication infrastructure between the operator systems 110 , the robotic systems 115 , and the processing server 120 .
  • the network 105 is typically the Internet, but may be any network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile wired or wireless network, a private network, or a virtual private network.
  • LAN Local Area Network
  • MAN Metropolitan Area Network
  • WAN Wide Area Network
  • the network 105 enables users in different locations to teleoperate robots of robotic systems, for example, for the purposes of robotic labor.
  • the operator system 110 enables an operator to teleoperate one or more corresponding robotic systems 115 .
  • the operator system 110 may be located at a distance from its corresponding one or more robotic systems 115 .
  • the operator system 110 is controlled by the operator, who may be the subject of one or more captured images.
  • the subject and the operator are referred to interchangeably, but it is also understood that, in some embodiments, the subject in the captured images may be a separate subject from the operator of the operator system 110 .
  • the operator takes one or more poses, and a robot mimics a processed mapping of the poses.
  • the operator may take a specific series of continuous or non-continuous poses that causes the robot to accomplish a certain task.
  • the operator system 110 captures images of the subject and generates body pose information of the subject in the captured images.
  • the generated body pose information is a representation of the pose of the subject in the captured images, which dictates a pose that a robot of a corresponding robotic system 115 takes.
  • the operator system 110 then transmits the generated body pose information to the corresponding robotic system 115 via the network 105 .
  • the operator system 110 a corresponds to robotic system 115 a
  • the operator system 110 b corresponds to robotic system 115 b
  • the operator system 110 c corresponds to robotic system 115 c
  • the operator system 110 d corresponds to robotic system 115 d .
  • one operator system 110 may correspond to two or more robotic systems 115 .
  • the operator system 110 includes an image capturing device 125 and an operator system controller 130 .
  • the image capturing device 125 captures images and/or video of the subject whose pose is to be mapped to a robot of a corresponding robotic system 115 .
  • the image capturing device 125 may comprise one or more cameras positioned and/or oriented to capture part or all of the subject's body.
  • the image capturing device 125 may be positioned on the subject's body and oriented such that segments of the subject's body are within a field of view of the image capturing device 125 .
  • the image capturing device 125 may be positioned external to the subject's body such that all or portions of the subject's body are within the field of view of the image capturing device 125 .
  • the image capturing device 125 may be part of a camera assembly, an external mobile device, a virtual reality (VR) or augmented reality (AR) headset, a standalone VR or AR camera assembly, a similar portable imaging device, or some combination thereof.
  • the field of view of the image capturing device 125 may vary to capture more or less of the subject's body.
  • the image capturing device 125 may comprise standard lenses or wide angle lenses (e.g., a fisheye lens).
  • the image capturing device 125 may capture two-dimensional (2D) images.
  • the image capturing device 125 may comprise one or more depth cameras or cameras in stereo to capture images with depth information.
  • the image capturing device 125 may capture images of the operator at a random or specified interval.
  • the operator may take a series of poses that cause the robot to accomplish a task.
  • the image capturing device 125 may capture images as it detects movement of the operator. In some embodiments, the image capturing device 125 sends the captured images to the operator system controller 130 . In alternative embodiments, the image capturing device 125 is integrated with the operator system controller 130 .
  • the image capturing device 125 captures images/and or video of equipment that is worn or manipulated by an operator.
  • the operator may be wearing a glove or holding a wand or a controller that includes visual markers.
  • the image capturing device 125 may detect and capture a pose or motion of the visual markers, which can then be mapped to the robot of the corresponding robotic system 115 .
  • This configuration may be beneficial for robots including an end-effector or an instrument that resembles the glove or wand/controller manipulated by the operator.
  • the wand/controller may include buttons or switches as additional input for robot control, which may improve intuitive control and/or efficiency of the operator.
  • the operator system controller 130 generates body pose information of the subject in the captured image.
  • the generated body pose information indicates a pose of the subject in the captured image.
  • the operator system controller 130 may be a desktop, a laptop, a mobile device, or a similar computing device.
  • the operator system controller 130 receives the captured images from the image capturing device 125 .
  • the operator system controller 130 may execute an algorithm that localizes an array of body parts of the subject in the captured image.
  • the algorithm projects the localized body parts of the subject onto a three-dimensional (3D) skeleton model of the subject.
  • the 3D skeleton model is output as the estimate of the pose and is used for estimating and tracking the poses of the subject in a next captured image.
  • the operator system controller 13 may execute an algorithm that directly predicts an estimate of the pose of the subject.
  • the operator system controller 130 transmits the body pose information of the subject to the corresponding robotic system 115 .
  • the operator system controller 130 may transmit additional teleoperation data to one or more corresponding robotic systems 115 .
  • the teleoperation data may be parameters associated with each captured image and/or processed image that are transmitted throughout teleoperation or may be calibration parameters that are transmitted before or during initial stages of teleoperation.
  • the parameters may be manually set by an operator (e.g., via a user interface), automatically determined by the operator system 110 or robotic system 115 , and/or could be updated throughout teleoperation.
  • the teleoperation data may be transmitted as a set of one or more parameters.
  • Parameters may relate to motion scaling or sensitivity, pause functionality, origin reset, Cartesian or joint axis locking and unlocking, bounding volumes, ‘home’ positions and orientations, quick-snap orientations and positions and other similar features.
  • Pause functionality enables the teleoperator to perform a gesture or use a specific pose that, when detected by the image capturing device 125 , pauses motion and/or operation of the robot arm, which effectively pauses tracking between the teleoperator pose and the robot arm.
  • a counter-gesture or counter-pose may be performed by the teleoperator to resume motion and/or operation of the robot arm. This feature may be used by the teleoperator to change or adjust their position, for example, to improve their comfort during teleoperation.
  • Origin reset enables the teleoperator to modify the reference point to which the robot's motion or pose is relative. In one embodiment, this enables the teleoperator to keep the robot's motion within a comfortable range of human arm motion.
  • Motion scaling enables motion from the operator to be mapped to motion of the robot on a different scale. For example, certain precise tasks performed by the robot may include small-scale motion (e.g., sub-millimeter motion) while the operator may move on a relatively larger scale (e.g., a centimeter scale); by scaling the motion of the operator, a robot may then move on a relatively smaller scale (e.g., a micron scale).
  • a large robot may perform large motions; motion of the operator may occur on a relatively smaller scale (e.g., the centimeter scale), which may be scaled to correspond to motion of the robot on a relatively larger scale (e.g., a meter scale).
  • Motion scaling may be applied linearly or non-linearly to individual axes in Cartesian space or joint space. Cartesian or joint-axis locking enables an operator to constrain the motion of a robot to a plane, a line, or point in 3D space. It may also be used to lock orientation of one or more segments and/or end-effectors of the robot along one or more axes. Bounding volumes may constrain a robot to only move within a certain subspace of its total workspace.
  • Quick-snap orientations or positions may enable the robot to take a predefined pose or a pose calculated based on a vision system of the robot. If the vision system of the robot identifies a target object in the environment, the operator system controller 130 may suggest a pose based on the target object to the teleoperator who can then select for the robot to snap to the suggested pose. These features may be used in any combination and may apply to the entire robot or a portion of the robot (e.g., one or more segments and/or end-effectors). The operator system controller 130 is discussed in further detail with regards to FIG. 2 .
  • the robotic system 115 controls the robot and causes the robot to move in accordance with a pose of the operator.
  • the robotic system 115 receives the generated body pose information of the subject in the captured images and, based on the generated body pose information, determines mapping parameters and one or more kinematic parameters of the robot.
  • the robotic system 115 includes a robot 135 , an image capturing device 140 , and a robotic system controller 145 .
  • the robot 135 is a machine comprising one or more segments and one or more joints that are designed to manipulate, ambulate, or both in the case of mobile manipulation.
  • the robot 135 may have an anthropomorphic design (having a human morphology) or similarly dimensioned segments resembling a human operator.
  • the robot 135 may have segments and joints that resemble body parts (e.g., limbs such as an arm, a leg, etc.) of the human operator and are designed to ambulate in a similar way.
  • the robot 135 may have an end-effector that resembles a human hand (e.g., having several fingers, joints, and degrees of freedom) or that functions similar to a hand (e.g., a claw, a 3-finger gripper, an adaptive gripper, an internal or external gripper, etc.).
  • the robot may not have an anthropomorphic design, where the robot's joints and segments do not closely align to joints and segments on the human operator's body.
  • the robot 135 may have one or more ambulating segments (achieving mobility via wheels, legs, wheeled legs, or similar methods), a stationary arm with an end-effector, a combination of one or more ambulating segments and an end-effector, or some combination thereof.
  • each joint may have one or more actuators.
  • the robot 135 may include a gripper at the end-effector.
  • the robot end-effector is gripper agnostic and can be used with several existing or custom grippers with varying number of degrees of freedom.
  • the robot or robot arm may be equipped with a mobile base for locomoting around its environment using wheels, tracks, legs, or a multi-modal design incorporating legs with wheels or treads or any combination thereof.
  • the teleoperation interface is robot agnostic and need not be paired with any particular robot arm to work as intended.
  • the image capturing device 140 captures images and/or video of the robot 135 and a local area surrounding the robot 135 .
  • the local area is the environment that surrounds the robot 135 .
  • the local area may be a room that the robot 135 is inside.
  • the image capturing device 140 captures images of the local area to identify objects that are near the robot 135 . Identifying nearby objects enables the robotic system 115 to determine if there are any objects the robot will interact with to perform a task or if there are any constraints to the range of motion of the robot 135 .
  • the robot 135 may be located in a small room near one or more walls, near one or more other robots, or other similar objects that the robot 135 aims to avoid during ambulation or manipulation.
  • the image capturing device 140 may capture images at a random, continuous, or specified interval to determine changes in the environment and subsequently update any constraints that need to be placed on the range of motion of the robot 135 .
  • the image capturing device 140 may be positioned and/or oriented to capture all or a portion of the robot 135 and its environment. Embodiments in which the image capturing device 140 comprises one or more cameras, the cameras may be located or mounted directly on varying parts of the robot or can be external to the robot.
  • the image capturing device 135 may be part of an imaging assembly, an external mobile device, a virtual reality headset, a standalone virtual reality camera assembly, a similar portable imaging device, a computer webcam, dedicated high-resolution camera(s), or some combination thereof.
  • the field of view of the image capturing device 135 may vary to capture more or less of the robot 135 .
  • the image capturing device 135 may comprise standard lenses or wide angle lenses (e.g., a fisheye lens).
  • the image capturing device 135 may capture two-dimensional images.
  • the image capturing device 135 may comprise one or more depth cameras or cameras in stereo to capture images with depth information.
  • the robotic system controller 145 receives the generated body pose information from its corresponding operator system 110 and accordingly determines a set of mapping parameters and kinematic parameters to control the motion of the robot 135 .
  • the body pose information may be in the form of a 3D skeleton model of the subject based on a pose of the subject in one or more captured images.
  • the robotic system controller 115 maps the 3D skeleton model to the configuration of the robot 135 .
  • the robotic system controller 145 may have one or more control modes for mapping the arm and/or leg poses and joint angles to segments and joint angles of the robot 135 .
  • a first control mode may be a direct mapping if the robot 135 has an anthropomorphic design or similarly dimensioned arms and/or legs to the operator.
  • a second control mode may be an indirect mapping if the robot 135 does not have an anthropomorphic design.
  • the robotic system controller 145 is able to map an operator pose to a robot with any type of configuration.
  • the robotic system controller 145 determines one or more kinematic parameters for the robot 135 .
  • These kinematic parameters may include x-, y-, and z-coordinates; roll, pitch, and yaw; and joint angles for each segment and joint of the robot 135 .
  • the workspace coordinates of the robot 135 may be selected or pre-determined.
  • the robotic system controller 145 may also receive and process force and/or haptic feedback from sensors on the robot 135 ; the robotic system controller 145 may transmit the force and/or haptic feedback to the operator system 110 , which enables the operator to feel forces that the robot 135 is sensing as it moves and interacts with its environment.
  • the force and/or haptic feedback from the robot 135 may be conveyed to the operator by visual or audible modalities, for example, in the form of augmented reality features on the operator system 110 .
  • the robotic system controller 145 may be a desktop, a laptop, a mobile device, or a similar computing device. The robotic system controller 145 is discussed in further detail with regards to FIG. 3 .
  • the processing server 120 enables users to operate the operator systems 110 and robotic systems 115 via the network 105 .
  • the processing server 120 may be embodied in a single server or multiple servers. Further, each server may be located at different geographic locations to serve users of the operator system 110 or the robotic system 115 in different geographic locations. In the embodiment of FIG. 1 , the processing server 120 may host the platform that allows users of the operator system 110 and the robotic system 115 to access and control each system without needing to install or download the platform onto their own devices.
  • the processing server 120 processes the data collected from the operator systems 110 and robotic systems 115 .
  • the processing server 120 executes a machine learning algorithm that learns from examples of robots being teleoperated to accomplish a variety of tasks in various environments and applications.
  • the system 100 may be used as a control input to crowdsourcing teleoperation of robotic labor. Because crowdsourcing leverages the network effect, the teleoperative nature of the system 100 enables the creation of a large data set of diverse demonstration tasks in diverse environments (which does not currently exist and is difficult/expensive to generate). In this configuration, the system 100 enables the use of powerful tools such as crowdsourcing data collection and deep imitation learning and meta-learning algorithms (which requires large amounts of data) to teach a robot to accomplish certain tasks. This learning process becomes possible when a robot is exposed to thousands of examples of how to properly (and not properly) accomplish a task.
  • the processing server 120 includes the imitation learning engine 150 .
  • the imitation learning engine 150 implements an algorithm to learn how a robot can perform different tasks based on the examples from human operators.
  • the imitation learning engine 150 inputs into its model the data consisting of thousands of examples of robots executing a pose or performing a task based on the subject performing the tasks through teleoperation.
  • a few examples of specific algorithms that may be employed are neural networks, imitation learning, meta-learning, deep multi-modal embedding, deep reinforcement learning, and other similar learning algorithms.
  • the imitation learning engine 150 learns and extracts representations from these examples to determine appropriate movements for the robot to perform similar and unseen tasks in the same or different environments as provided in the demonstration training dataset. Accordingly, the imitation learning engine 150 stores a “label” corresponding to each task that includes the determined appropriate movements for each task.
  • the imitation learning engine 150 can exist locally on the robotic system controller of a robot, on the operator system controller of an operator, or in the cloud running on a cloud server.
  • the data collected from each robot-teleoperator pair can be shared collectively in a database that enables data sharing for parallelized learning such that a first robot in a first environment performs a task, and, once the task is learned by the imitation learning engine 150 , a second robot in a second environment may also learn the motions to perform the same task (as well as a third robot in a third environment, a fourth robot in a fourth environment, and so on, until an Nth robot in an Nth environment).
  • FIG. 2 illustrates a block diagram of the operator system controller 130 , according to one embodiment.
  • the operator system controller 130 generates body pose information of a subject in a captured image.
  • the operator system controller 130 may be a desktop, a laptop, a mobile device, or a similar computing device.
  • One or more of the components in the operator system controller 130 may be embodied as software that may be stored in a computer-readable storage medium, such as memory 205 .
  • the memory 205 stores, among others, a user device communication module 210 , a pose estimation module 215 , a user interface module 220 , a robotic system controller interface 225 , and an imitation learning system interface 230 .
  • the computer-readable storage medium for storing the software modules may be volatile memory such as RAM, non-volatile memory such as a flash memory or a combination thereof.
  • a bus 240 couples the memory 205 and the processor 235 .
  • the bus 240 additionally couples the memory 205 to an image capturing device interface 245 , a user interface circuit 250 , and a network interface 255 .
  • Some embodiments of the operator system controller 130 have different components than those described here. Similarly, in some cases, functions can be distributed among the components in a different manner than is described here.
  • the user device communication module 210 is software, firmware, or a combination thereof for communicating with user devices via the network 105 .
  • a user device may be a device that an operator uses as part of the operator system 110 .
  • a user device may be a mobile computing device, and the operator system controller 130 may be a desktop or a laptop that communicates with the user device.
  • the user device communication module 210 receives commands and requests from the user device to access and control the operator system 110 .
  • the pose estimation module 215 estimates a body pose of a subject in a captured image.
  • the pose estimation module 215 may include, among others, an image processor 260 , a skeletal model mapper 265 , and a tracking module 270 as described below in detail.
  • the image processor 260 receives and processes the images captured by the image capturing device 125 .
  • the image processor 260 identifies a subject and the subject's body parts in a captured image. For example, the image processor 260 identifies hands, fingers, arms, elbows, shoulders, legs, knees, a head, etc. of the subject.
  • the image processor 260 may use a machine learning model (e.g., a pre-trained deep learning model or convolutional neural network) to identify these body parts in each captured image. Additionally, the machine learning model localizes body parts and the dimensions between adjacent body parts or joints. Embodiments in which the captured images are without depth information, the localized body parts are two-dimensional characteristics of the pose of the subject.
  • the machine learning model may use spatial motion information from an IMU on the mobile device from the relationship between a changing image perspective and the 6-axis motion of the image capturing device 125 (in an embodiment in which the image capturing device and the IMU are embedded in the same device and do not move relative to one another).
  • the operator may manually set the subject's body part dimensions.
  • the machine learning model may track certain body parts, joints, or segments relative to other body joints, parts, or segments, relative to an external landmark, or relative to the image capturing device 140 .
  • the skeletal model mapper 265 projects the two-dimensional localized body parts to a three-dimensional skeleton model of the operator.
  • the skeletal model mapper 265 executes an algorithm that enhances the alignment between a 2D pixel location of each body part in the captured image and the 3D skeleton model.
  • the 3D skeleton model of the operator may be calibrated for operators of different sizes.
  • the 3D skeleton model may include several parameters, such as body part dimensions (e.g., limb lengths), joint angles between adjacent body parts (e.g., limbs), and other relevant pose information.
  • An output of the 3D skeleton model may be estimated pose information, which may include x-, y-, and z-coordinate positions with respect to a coordinate system (i.e., workspace) of each body part of the operator; roll, pitch, and yaw of the one or more body parts of the operator; and joint angles between adjacent body parts.
  • the skeletal model mapper 265 creates the 3D skeleton model during a calibration process, where the 3D skeleton model represents an initial estimated pose of the operator.
  • the 3D skeleton model may receive as input the two-dimensional localized body parts from subsequent captured images of the subject and may output pose information for the pose of the subject in the subsequent captured images. In this configuration, the 3D skeleton model can be used to estimate and track poses of the subject based on subsequent captured images of the subject.
  • the tracking module 270 tracks the poses of the subject in subsequent images captured by the image capturing device 125 .
  • the tracking module 270 receives one or more processed images from the image processor 260 , and uses it to estimate pose information of the subject in the processed images.
  • the one or more processed images may be images that were captured subsequent to the captured images used to generate the 3D skeleton model.
  • the pose estimation module 215 is able to estimate a pose of a subject in real-time as images are captured by the image capturing device 125 .
  • the pose estimation of the subject is transmitted to the corresponding robotic system controller 145 . This enables a robot of a corresponding robotic system to take a pose in accordance with the subject in real-time.
  • the pose estimation module 215 may directly input one or more captured images into a machine learning model.
  • the machine learning model may then output an estimation of the pose of the subject in the captured images or may then output a prediction of a pose or a motion of the robot.
  • the pose estimation module 215 does not separately localize body parts of the subject in the captured images and generate a corresponding 3D skeleton model.
  • the user interface module 220 may update a user interface that allows the user to interact with and control the operator system 110 .
  • the user interface module 220 may provide a graphical user interface (GUI) that displays the robot 135 .
  • GUI graphical user interface
  • the GUI may display the robot 135 in its current environment and/or a simulated model of the robot in a simulated environment.
  • the GUI may include a manual controller that allows individual control of each of the robot's joint angles as well as the position and orientation of an end-effector of the robot 135 .
  • the GUI may additionally include a point-and-click function that enables the operator to select, via a mouse or a touchscreen on the user device, objects in the robot's environment.
  • the system 100 may infer how the operator would like that object manipulated or handled by the robot.
  • a simulation of that action may then be shown to the user via the user interface (e.g., mobile screen, monitor, AR/VR, etc.) before the robot executes the task.
  • the GUI may include options for the user to approve or reject the simulated action. In this configuration, the operator ensures that the autonomy of completing the specified task is correct before allowing the robot to move.
  • the GUI may include options to enable or disable modes that dictate the autonomy of the robot 135 .
  • the operator system controller 130 or the corresponding robotic system controller 145 may store automated motions that have been pre-defined, programmed, or previously-learned. These modes may increase the speed and efficiency of the operator.
  • the GUI may provide suggestions to an operator that may further streamline teleoperation of the robot 135 .
  • Suggestions may include poses or “snap” poses for the robot 135 to take. These poses may be poses that pre-defined, programmed, or previously-learned poses.
  • a “snap” pose may snap one or more segments and/or end-effectors of the robot 135 into a pose or to an object to perform a dexterous task.
  • learned graspable objects e.g., door handles, writing instruments, utensils, etc.
  • the robot 135 may be able to manipulate objects quickly and minimize fine robot control by an operator.
  • the user interface module 220 may present an image and/or video stream of the robot 135 in the GUI on a monitor, mobile device, a head set (AR, VR, and/or MR), or similar.
  • the user interface module 220 may overlay onto the video stream a simulation of the robot 135 or a portion of the robot 135 (e.g., an end-effector of the robot 135 ).
  • an operator may be able to position and/or orient the robot 135 in 6D space.
  • An operator may be able to add one or more set points that define a pose or motion of the robot 135 .
  • the set points may be ordered in a defined sequence. Each set point may be associated with one or more types that each indicate an action that the robot may take at the set point.
  • the robot 135 may then move through the set points in the defined sequence.
  • the user interface module 220 may provide a simulation of the defined sequence in the GUI as an overlay on the image and/or video stream of the robot 135 .
  • Example set point types may include contact, grasping, trajectory, or other similar actions, or some combination thereof.
  • a contact set point may define that the robot 135 contacts an object, tool, or area within its environment.
  • a grasping set point may define that the robot 135 grasp an object when it reaches the set point.
  • a trajectory set point may be used as a waypoint in a trajectory to ensure that the robot 135 moves through a target trajectory, for example, to avoid collisions with itself and/or the environment.
  • the user interface module 220 may also provide one or more suggestions for snap poses that each correspond to a target pose.
  • the user interface module 220 may also provide one or more snap regions that correspond to each snap pose.
  • An operator may select a snap pose and, in some embodiments, a snap region.
  • the GUI may provide a simulation of the robot 135 snapping to the pose. The operator may select to accept or reject the simulation. If the simulation is accepted, the user interface module 220 may add the snap pose as a set point.
  • the user interface module 220 may additionally communicate depth information of the robot 135 and its environment to the operator.
  • a VR headset may be used to project stereo images into each eye that were captured using a stereo image capturing device on the robot 135 .
  • the human brain perceives depth information as human eyes naturally do without a VR headset.
  • the user interface module 220 may use a mobile device, a monitor, or a head set (AR, VR, and/or MR) to display a video stream from the image capturing device 140 of the robot 135 to the operator.
  • AR, VR, and/or MR head set
  • additional features may be added to enhance depth perception of a 3D world projected onto a 2D computer monitor or mobile device.
  • a processed depth stream from a depth camera may be displayed in depth form or as a point cloud to the operator.
  • Multiple videos may be displayed from the image capturing device 140 of the robot 135 , which may include multiple cameras with different perspectives (top view, side view, isometric view, gripper camera view, etc.) of the robot 135 .
  • Augmented reality (AR) features may be overlaid in real-time onto the video stream from the image capturing device 140 of the robot 135 to enhance depth perception.
  • AR Augmented reality
  • Example AR features may include depth-based augmented reality boxes, lines, shapes, and highlighting; square grids that align with 3D features in the environment of the robot 135 ; real or augmented laser pointer projected from an end-effector of the robot 135 to objects in the environment of the robot 135 with a measured distance reading to that object; use of background, foreground, stripes, and masking to distinguish objects of interest from the background; use of chromostereopsis methods where glasses with different colored lenses and processed display videos may be used to create an illusion of depth; use of processed images via spatio-temporal blur and focus rendering; use of a homunculus control panel with one or more camera feeds; a simulated robot configuration rendered over a transformed perspective of the point cloud image; and/or one or more of the previously described features depth enhancing features. These features may be integrated into the user interface module 220 individually or in some combination thereof.
  • the AR features may be generated using stereo or depth sensing cameras of the image capturing device 140 .
  • the robotic system controller interface 225 couples the operator system controller 130 to the robotic system 115 via the network 105 .
  • the robotic system controller interface 225 may transmit data to the robotic system controller 145 and receive data from the robotic system controller 145 .
  • the robotic system controller interface 225 transmits the generated pose estimation of the subject and tracking information to the robotic system 115 .
  • the robotic system controller interface 225 may transmit additional data, such as the images captured by the image capturing device 125 and/or commands or requests input by the user via the user device.
  • the robotic system controller interface 225 may receive captured images of the robot 135 captured by the image capturing device 140 and haptic feedback from the robotic system controller 145 .
  • the robotic system controller interface 225 may transmit data in real-time or at specified or random intervals.
  • the imitation learning system interface 230 provides data from the operator system 110 to the imitation learning engine 150 online or offline.
  • the imitation learning system interface 230 transmits data associated with a subject performing a task, such as the captured images, the 3D skeleton model, the pose tracking information, and/or other relevant information.
  • the imitation learning system interface 230 may transmit this data in real-time or at specified or random intervals. This enables the imitation learning engine 150 to continually improve online in real-time in a parallelized framework with every additional teleoperational task completed, which enables the robots connected within the system 100 to become more capable of autonomously performing tasks and requires fewer human interventions.
  • the image capturing device interface 245 is software, firmware, hardware, or a combination thereof that couples the operator system controller 130 to the image capturing device 125 .
  • the image capturing device interface 245 may be a USB cable that couples to the bus 240 .
  • image capturing device interface 245 may enable a wireless connection to the image capturing device 125 , e.g., via the network 105 , Bluetooth, or a similar connection.
  • the user interface circuit 250 is software, firmware, hardware, or a combination thereof that couples the user interface to the operator system controller 130 .
  • the user interface circuit 250 may couple a keyboard and/or a mouse to the operator system controller 130 via the bus 240 .
  • the user interface circuit 250 may enable a touchscreen or monitor on a user device of the operator system 110 .
  • the network interface 255 is a hardware component that couples the operator system controller 130 to the network 105 .
  • the network interface 255 may be a network interface card, a network adapter, a LAN adapter, or a physical network interface that couples to the bus 240 .
  • FIG. 3 illustrates a block diagram of a robotic system controller, according to one embodiment.
  • the robotic system controller 145 receives the generated body pose information from its corresponding operator system 110 and accordingly determines a set of kinematic parameters to move the robot 135 .
  • the robotic system controller 145 may be a desktop, a laptop, custom computer, a mobile device, or a similar computing device.
  • the robotic system controller 145 includes components that are stored in a computer-readable storage medium, such as memory 305 . In the embodiment of FIG.
  • the memory 305 stores an operator system controller interface 310 , a robot mapping module 315 , a robot kinematics module 320 , a feedback module 325 , and an imitation learning system interface 330 . Instructions of the software modules are retrieved and executed by a processor 335 .
  • the computer-readable storage medium for storing the software modules may be volatile memory such as RAM, non-volatile memory such as a flash memory or a combination thereof.
  • a bus 340 couples the memory 305 and the processor 335 .
  • the bus 340 additionally couples the memory 305 to an image capturing device interface 345 , a robot interface 350 , and a network interface 355 .
  • Some embodiments of the operator system controller 130 have different components than those described here. Similarly, in some cases, functions can be distributed among the components in a different manner than is described here.
  • the operator system controller interface 310 enables communication between the robotic system 115 and the operator system controller 130 via the network 105 .
  • the operator system controller interface 310 may transmit data to the operator system controller 130 and receive data from the operator system controller 130 .
  • the operator system controller interface 310 receives the generated pose estimation of the subject and tracking information from the operator system 110 .
  • the operator system controller interface 310 may transmit captured images of the robot 135 and its environment captured by the image capturing device 140 and feedback from the robot 135 including but not limited to force, torque, position, velocity, and other sensory feedback from the robot's joints, end-effector, segments, or externally in the robot's environment.
  • the operator system controller interface 310 transmits additional data, such as the configuration of the robot 135 , current or previous states of the robot 135 including kinematic parameters for each state, information regarding the local area surrounding the robot 135 , or some combination thereof.
  • the operator system controller interface 310 may transmit data in real-time or at specified or random intervals.
  • the robot mapping module 315 maps the estimated pose of the operator to the configuration of the robot 135 .
  • mapping the estimated pose to the robot 135 is performed by aligning and potentially scaling the limbs and joint angles of the operator to the segments and joint angles of the robot 135 .
  • the robot mapping module 315 may create a set of mapping parameters, which may include scaling coefficients, relationships of corresponding joints or segments, and other relevant information.
  • the robot mapping module may have several control modes for mapping. For example, in a first control mode, direct mapping may be employed if the robot 135 has an anthropomorphic design or similarly dimensioned arms, legs, and/or fingers. Direct mapping maps the limbs and joint angles of the operator directly to the segments and joint angles of the robot 135 . In this configuration, control of the robot 135 may be intuitive to the operator, especially if a virtual reality headset is used by the operator.
  • indirect mapping may be employed if the robot 135 does not have an anthropomorphic design or similarly dimensioned arms, legs, and/or fingers.
  • Indirect mapping may use a linear or non-linear function to map an estimate of the limbs and joint angles of the operator to the segments and joint angles of the robot 135 .
  • Indirect mapping may be used if 1) the robot's dimensions are on a different scale compared to the operator's body, 2) the robot has a different kinematic configuration or number of joints compared to the operator's body, or 3) it is desired to have varying levels of control sensitivity in joint or end-effector space.
  • end-effector mapping may be employed if the robot 135 has an arm or leg that includes an end-effector where only the end-effector ambulates in accordance with the operator.
  • End-effector mapping may track the poses of the operator's hand rather than the operator's limbs.
  • the position and/or orientation of the fingers and/or the joint angles of the operator's hands are mapped to the position and/or orientation of the segments and/or joint angles of the end-effector.
  • control of just the end-effector of the robot 135 may be intuitive when the robot 135 does not have an anthropomorphic design.
  • the arm or leg of the robot 135 may be stationary or may ambulate according to the first or second control mode.
  • the robot mapping module 315 may use one or control modes simultaneously for different portions of the robot 135 .
  • the operator's pose corresponds to a velocity or force controller rather than a position or pose tracker.
  • an origin position may be defined by the operator or automatically set to a default point in the operator's workspace.
  • the robot 135 may not move.
  • the operator's hand is positioned at a distance greater than the threshold distance from the origin along one or more axes, the robot 135 may move at a velocity along an axis proportional to the distance the operator's hand is from the origin.
  • the operator may use a user interface to toggle between position and orientation control.
  • a rotation vector connecting the origin to a point on the operator's body part (e.g., a palm center) and a norm of the rotation vector control a rotation axis and a proportional angular velocity about that rotation vector.
  • a hand tracker may set thresholds relating to the operator's hand orientation such that when the hand orientation is within an angular threshold in roll, pitch, and yaw, the angular velocity of the robot 135 is zero. If the hand orientation exceeds those thresholds, the angular velocity of the robot 135 becomes proportional to an angular pose of the operator's hand relative to a coordinate frame at the origin.
  • the operator may control the position and orientation of the one or more segments and/or end-effectors of robot 135 in velocity mode, allowing the operator to maintain his/her hand in a comfortable position.
  • the pose of the operator may still be tracked, but in this embodiment, the relative position of the operator's hand relative to a defined origin maps to velocity of the robot 135 as opposed to the position of the operator's body mapping to the position of the robot 135 .
  • a user interface may display this functionality to operator control more intuitive.
  • the user interface may display a marker (e.g., a dot, simulated hand, or coordinate frame) that corresponds to the operator's hand position, which may have a coordinate frame overlaid onto it to illustrate the orientation of the operator's hand relative to a coordinate frame at the origin.
  • the marker may be surrounded by a circle that defines the velocity threshold such that if the marker is within the circle, the robot 135 remains stationary in its current pose. If the marker is outside of the circle, then the robot 135 moves in the direction of the vector from the origin to the marker at a velocity proportional to a function of the norm of that vector.
  • the operator may provide third person demonstrations that the robot mapping module 315 interprets such that the robot 135 performs higher level task-related motions.
  • the operator may manipulate an object in his/her environment, which the robot mapping module 315 maps to the robot manipulating a corresponding object (may or may not be the same object as the operator) in its environment in accordance with a processed version of the operators motion.
  • the robot mapping module 315 may not map the exact poses or trajectory of the operator but rather may infer poses or a trajectory to achieve a similar high level task.
  • the operator may have a test object in his/her environment. The operator may specify an object in the environment of the robot 135 that corresponds to the test object.
  • the robotic system controller 145 may infer the object correspondence.
  • the operator may manipulate the test object in some way, such as picking it up and placing it in a bin which will provide high level task information to the robotic system controller 145 that the robot 135 place the corresponding object in its environment in a bin.
  • the objects in the operator's environment may not correspond identically with those in the environment of the robot 135 .
  • the bins in the operator's environment and the robot's environment might be different sizes, shapes, colors, may appear differently, and may be placed in different locations relative to the test/corresponding object and/or operator/robot.
  • the robot 135 may have a higher level of intelligence and may be trained on extracting higher level task-related information from the operator demonstration as opposed to fine motor control commands from the operator mapping explicitly to motion.
  • This task-mapping mode may be used to manipulate objects in lower-level control mode such that however the operator manipulates the test object, the robot 135 manipulates the corresponding object in the same or similar (inferred) way.
  • the robot kinematics module 320 determines one or more kinematic parameters for the robot 135 .
  • the kinematic parameters correspond to a position and an orientation for each segment and/or joint of the robot 135 .
  • the kinematic parameters may include one or more of the following: a set of x-, y-, and z-coordinates with respect to the coordinate system (i.e., workspace) of the robot 135 ; roll, pitch, and yaw describing orientation of one or more segments of the robot 135 ; joint angles between adjacent segments; a set of transformation coefficients between the body of the operator and the configuration of the robot 135 .
  • the robot kinematics module 320 determines these kinematic parameters based on the mapping parameters from the robot mapping module 315 that maps the body pose of the operator to the configuration of the robot 135 .
  • the robot kinematics module 320 may send the kinematic parameters to the robot interface 350 for motion of the robot 135 in accordance with the kinematic parameters.
  • the robot kinematics module 320 determines a set of kinematic parameters for each subsequent pose. For the subsequent poses that the robot 135 may take, the robot kinematics module 230 may consider an initial state of the robot 135 (e.g., current pose) and a target state of the robot 135 (corresponding to the pose of the subject) to determine a movement to transition the robot 135 from the current state to the target state. The robot kinematics module 320 may generate an intermediate set of parameters that represent the transitional movement (i.e., a motion trajectory). In the embodiment of FIG.
  • the robot kinematics module 320 may perform an optimization algorithm to determine the optimal transitional movement.
  • the robot kinematics module 320 may consider any constraints placed on the robot 135 , for example to prevent self-collision or collisions with objects in the local area of the robot 135 as determined from the image capturing device 140 .
  • the operator system controller interface 310 sends the kinematic parameters and intermediate parameters to the operator system controller 130 such that a simulation of the movement is displayed in a user interface of the user device, enabling the operator to approve or reject the simulated movement before the robot 135 takes the pose.
  • the feedback module 325 receives and processes feedback from the robot 135 .
  • the robot 135 may include sensors on each segment or at each joint, such as torque sensors, encoders, cameras, IMUs, and other possible sensors.
  • the feedback module 325 may monitor the feedback from the sensors to ensure that the detected feedback stays within an acceptable range. For example, monitoring feedback from the torque sensors ensures that the segments and/or joints of the robot 135 do not experience excessive load-bearing forces.
  • the feedback module 325 may constrain a motion or a pose of the robot 135 if the feedback module 325 detects feedback that is outside of an acceptable range. In the embodiment of FIG.
  • the operator system controller interface 310 may transmit force or haptic feedback from the feedback module 325 to the operator system 110 , which may enable the operator to feel forces that the robot 135 is sensing as it moves and interacts with its environment.
  • the operator system 110 may update a user interface of the user device to inform the operator of the feedback and if any detected feedback is outside of an acceptable range.
  • the operator system 110 may provide multisensory feedback (e.g., visual or audio feedback) through, for example, AR or display features.
  • the imitation learning system interface 330 provides data from the robotic system 115 to the imitation learning engine 150 .
  • the imitation learning system interface 330 transmits data such as images captured by the image capturing device 140 of the robot 135 and its environment, images captured by the image capturing device 125 of the operator, mapping parameters, kinematic parameters, corresponding initial and target states and the associated intermediate parameters, sensor feedback, and other relevant information such as an embedding or information of the type of task being performed.
  • the imitation learning engine 150 learns and labels the poses for a robot to accomplish each task.
  • the imitation learning system interface 330 may transmit this data in real-time or at specified or random intervals.
  • the image capturing device interface 345 is software, firmware, hardware, or a combination thereof that couples the operator system controller 130 to the image capturing device 140 .
  • the image capturing device interface 345 may be a USB cable that couples to the bus 340 .
  • image capturing device interface 345 may enable a wireless connection to the image capturing device 140 , e.g., via the network 105 , Bluetooth, or a similar connection.
  • the robot interface 350 may be software, firmware, hardware, or a combination thereof that couples the robotic system controller 145 to the robot 135 .
  • the robot interface 350 may be a power cable, USB cable, or a similar connection.
  • the robot interface 350 may be a wireless connection via the network 105 , Bluetooth, or a similar wireless connection.
  • the robotic system controller 145 transmits the intermediate parameters and the kinematic parameters to one or more actuators at the respective joints of the robot 135 . In this configuration, the actuators move the robot 135 in accordance with the parameters received.
  • the robot 135 may additionally send sensor feedback to the robotic system controller 145 via the robot interface 350 .
  • the network interface 355 is a hardware component that couples the robotic system controller 145 to the network 105 .
  • the network interface 355 may be a network interface card, a network adapter, a LAN adapter, or a physical network interface that couples to the bus 340 .
  • FIG. 4 illustrates a flowchart of a method 400 for teleoperating a robot by mapping a pose of an operator, according to one embodiment.
  • the method 400 can be performed using a computer system (e.g., system 100 ).
  • An image capturing device (e.g., image capturing device 125 ) captures 405 an image of a subject.
  • the image capturing device may be part of an imaging assembly, an external mobile device, a virtual reality headset, a standalone virtual reality camera assembly, a webcam, a similar portable imaging device, or some combination thereof.
  • the image capturing device may be positioned on the subject's body and oriented such that segments of the subject's body are within a field of view of the image capturing device, or the image capturing device may be positioned external to the subject's body such that all or portions of the subject's body are within the field of view of the image capturing device. In the embodiment of FIG.
  • the image capturing device captures images that are two-dimensional (i.e., without depth information).
  • the image capturing device captures 405 images of the subject as the subject takes a series of poses, which are to be mapped to a robot of a robotic system, causing the robot to perform a task.
  • An image processor processes 410 the captured image(s) to localize one or more body parts of the subject.
  • the image processor identifies the subject and the subject's body parts in the captured image. For example, the image processor identifies hands, fingers, arms, elbows, shoulders, legs, knees, a head, etc. of the subject.
  • the image processor may use a machine learning model (e.g., a pre-trained deep learning model or convolutional neural network) to identify these body parts in each captured image. Additionally, the machine learning model localizes body parts and the dimensions between adjacent body parts or joints.
  • a skeletal model mapper maps 415 the localized body parts to a human body skeletal model.
  • the skeletal model mapper projects the two-dimensional localized body parts to a three-dimensional skeleton model of the operator.
  • the skeletal model mapper executes an optimization algorithm that maximizes the alignment between a 2D pixel location of each body part in the captured image and the 3D skeleton model.
  • the 3D skeleton model represents an initial estimated pose of the operator.
  • the 3D skeleton model may include several parameters, such as body part dimensions (e.g., limb lengths), joint angles between adjacent body parts (e.g., limbs), and other relevant pose information.
  • a pose estimation module (e.g., pose estimation module 215 ) generates 420 body pose information of the subject.
  • the body pose information of the subject is generated based on the skeletal model.
  • a machine learning model estimates the body pose information based on the captured image(s) or a processed version of the captured image(s) of the subject. The machine learning model is used to estimate and track poses of the subject for subsequently received captured images of the subject.
  • a robot mapping module maps 425 the body pose estimates to a configuration of a robot (e.g., robot 135 ).
  • the robot mapping module maps the body pose estimates of the operator to the configuration of the robot.
  • the robot mapping module may create a set of mapping parameters, which may include scaling coefficients, relationships of corresponding joints or segments, and other relevant information.
  • the robot mapping module may use one or more control modes (e.g., direct mapping, indirect mapping, end-effector mapping) for mapping.
  • a robot kinematics module (e.g., robot kinematics module 320 ) generates 430 kinematic parameters of the robot (e.g., robot 135 ).
  • the kinematic parameters correspond to a position and an orientation for each segment and/or joint of the robot.
  • the kinematic parameters may include one or more of the following: a set of x-, y-, and z-coordinates with respect to the coordinate system (i.e., workspace) of the robot 135 ; roll, pitch, and yaw of one or more segments of the robot; joint angles between adjacent segments; a set of transformation coefficients between the body of the operator and the configuration of the robot.
  • the robot kinematics module determines these kinematic parameters based on the mapping parameters from the robot mapping module that maps the 3D skeleton model of the operator to the configuration of the robot.
  • a robotic system controller (e.g., robotic system controller 145 ) sends 435 the generated kinematic parameters to one or more actuators of the robot (e.g., robot 135 ).
  • the actuators ambulate the one or more segments and joints to a target pose (corresponding to the pose of the subject).
  • a feedback module detects 440 sensor feedback of the robot (e.g., robot 135 ).
  • the feedback module monitors the feedback from sensors on the robot to ensure that the detected feedback stays within an acceptable range.
  • the feedback module may constrain a motion or a pose of the robot if the feedback module detects feedback that is outside of an acceptable range.
  • steps 410 , 415 , and 440 may be omitted.
  • sequence of steps 430 , 435 , and 440 may be modified.
  • FIG. 5 illustrates a schematic block diagram of a training phase of the imitation learning engine 150 , according to one embodiment.
  • the imitation learning engine 150 implements a learning algorithm to learn how a robot can perform different tasks based on example demonstrations from human operators.
  • the imitation learning engine 150 inputs into its model a large number of examples of robots executing a pose or performing a task based on the subject performing the tasks.
  • the imitation learning engine 150 learns using these examples to determine appropriate movements for the robot to perform the same tasks. Accordingly, the imitation learning engine 150 stores a “label” for each task that includes the determined appropriate movements for each task.
  • the imitation learning engine inputs data from several examples of a human operator teleoperating a robot to perform a task.
  • an example includes a task label 505 associated with the task performed by the robot, captured images 510 , object information 515 , a robot state 520 of the robot before taking a pose, and kinematic parameters 525 associated with each robot state 520 .
  • the task label 505 indicates the task performed by the robot.
  • the captured images 510 are one or more images captured of the local area surrounding the robot.
  • the object information 515 includes data regarding objects located in the local area surrounding the robot.
  • the robot state 520 is an initial configuration of the robot before taking the pose corresponding to the pose of the subject.
  • the kinematic parameters 525 are the kinematic parameters associated with the configuration of the robot taking the pose corresponding to the pose of the subject.
  • the imitation learning engine 150 receives as input the task label 505 , the captured images 510 , the object information 515 , and the robot state 520 before each pose, and then, for each pose in the sequence of poses to complete the task, outputs a prediction of the kinematic parameters to achieve each pose or robot motion trajectory.
  • the imitation learning engine 150 performs error detection 530 and compares the predicted kinematic parameters to the actual kinematic parameters for each pose or robot motion trajectory. Based on a calculated difference 535 , the imitation learning engine 150 may adjust the coefficients of its machine learning model to reduce the detected error.
  • the imitation learning engine 150 may perform the training process multiple times for one or more task examples that it receives.
  • FIG. 6 illustrates a schematic block diagram of an operational phase of the imitation learning engine 150 , according to one embodiment.
  • the imitation learning engine 150 determines the configuration of a robot at several time steps such that when executed in sequence enable the robot to perform a task.
  • the imitation learning engine 150 may be executed for one or more remotely located robots. As illustrated in FIG.
  • the task label 605 indicates the task to be performed by the robot.
  • the captured images 610 are one or more images captured of the local area surrounding the robot.
  • the object information 615 includes data regarding objects located in the local area surrounding the robot (e.g., objects that the robot will interact with or will avoid).
  • the imitation learning engine 150 may output kinematic parameters 630 , a robot state 635 , and object information 635 for the robot at the subsequent time step.
  • These kinematic parameters 630 may include x-, y-, and z-coordinates; roll, pitch, and yaw; and joint angles for each segment and joint of the robot.
  • the robot state 635 represents the subsequent configuration of the robot.
  • the object information 635 may change from the previous time-step, for example, if the robot interacted with any objects in its environment or if the position or orientation of the robot changed with respect to the objects.
  • the imitation learning engine 150 may repeat this process for each subsequent time step, enabling the robot to accomplish the task associated with the task label 605 .
  • a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
  • Embodiments of the disclosure may also relate to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus.
  • any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Abstract

A system enables teleoperation of a robot based on a pose of a subject. The system includes an image capturing device and an operator system controller that are remotely located from a robotic system controller and a robot. The image capturing device captures images of the subject. The operator system controller maps a processed version of the captured image to a three-dimensional skeleton model of the subject and generates body pose information of the subject in the captured image. The robotic system controller communicates with the operator system controller over a network. The robotic system controller generates a plurality of kinematic parameters for the robot and causes the robot to take a pose corresponding to the pose of the subject in the captured image.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of U.S. Provisional Application No. 62/512,179, filed May 30, 2017, which is incorporated by reference in its entirety.
  • BACKGROUND
  • The disclosure relates generally to teleoperation of robots and specifically to teleoperation of robots based on a pose of a human operator.
  • Traditionally, teleoperation of robots having multiple degrees of freedom (DOF) is accomplished using complex controllers that may be specifically designed for a particular robot arm. In some instances, these controllers may be as simple as using a joystick, but more commonly these controllers are complicated devices, such as body worn exoskeletons that map the exoskeleton's joint angles to the robot's joint angles. In both situations, handheld or worn hardware is used to teleoperate the robot. In the case of handheld joysticks or traditional remote controllers, the teleoperation of a high DOF robot is challenging, not intuitive, and slow because of the lack of direct mapping from joysticks and buttons to the many degrees of freedom of the robot. While these controllers provide a relatively cheap method of teleoperating a robot, they require significant training or automation to handle low-level functionality and are typically not time efficient. For example, a robot having two or more legs (a high DOF system) operated in real-time using a controller would require low-level algorithms for balancing the robot to be autonomously handled, while the controller or joystick would be used for high-level commands (e.g., which direction and speed the robot should ambulate in). Similarly, controlling a robot arm using joysticks requires the joystick to map 6 DOF or more into 2 or 3 DOF interfaces of the joystick, which is not intuitive and can lead to slow teleoperating speeds for even simple tasks.
  • Alternatively, an exoskeleton can be worn to control a robot, which may allow for more intuitive and direct control of a robot arm with a morphology that is similar to the arm of a human operator. This method of teleoperation is easier for the operator to learn and can integrate haptic feedback to allow the operator to feel forces that the robot is sensing when it interacts with its environment. However, exoskeletons are complex systems that are expensive, not easily donned or doffed, not portable or mobile, and typically not accommodating for differences in limb or body size from one operator to another. Another alternative for teleoperation is the use of motion capture systems. However, current motion capture systems rely on either 1) optical systems that require retrofitting a room with an array of calibrated cameras and tagging the operator with reflective markers at body locations of interest for tracking or 2) wearable inertial measurement units (IMUs) that require precise calibration, are susceptible to drifting, and are tedious to don and doff.
  • SUMMARY
  • Embodiments relate to teleoperation of a robot of a robotic system based on a pose of an operator. Teleoperation indicates operation of a system or machine at a distance. The system includes an image capturing device and an operator system controller that are remotely located from a robotic system controller and a robot.
  • In one embodiment, the image capturing device captures an image of a subject (i.e., operator). The operator system controller is coupled to the image capturing device and maps a processed version of the captured image to a three-dimensional skeleton model of the subject. The operator system controller generates body pose information of the subject in the captured image. The body pose information indicates a pose of the subject in the captured image. The robotic system controller communicates with the operator system controller over a network. The robotic system controller generates a plurality of kinematic parameters of a robot by processing the body pose information received from the operator system controller based on a configuration of the robot. The robotic system controller controls one or more actuators of the robot according to the plurality of kinematic parameters, causing the robot to take a pose corresponding to the pose of the subject in the captured image.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a block diagram of a system for teleoperation of robotic systems, according to an embodiment.
  • FIG. 2 illustrates a block diagram of an operator system controller, according to one embodiment.
  • FIG. 3 illustrates a block diagram of a robotic system controller, according to one embodiment.
  • FIG. 4 illustrates a flowchart of a method for teleoperating a robot by mapping a pose of an operator, according to one embodiment.
  • FIG. 5 illustrates a schematic block diagram of a training phase of an imitation learning engine, according to one embodiment.
  • FIG. 6 illustrates a schematic block diagram of an operational phase of the imitation learning engine, according to one embodiment.
  • The figures depict embodiments of the present disclosure for purposes of illustration only. Alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
  • DETAILED DESCRIPTION
  • Embodiments relate to allowing an operator to wirelessly and intuitively control the joint space and/or end-effector space of a remotely located robot by simply moving one's hands, arms, legs, etc. without the need for traditional external calibrated motion capture systems, worn exoskeletons/sensors, or traditional but unintuitive joysticks. In a crowd-sourced teleoperation application, tasks that robots are currently unable to accomplish autonomously can be executed semi-autonomously via human teleoperation while the recorded data of how the human operator guided the robot to accomplish the arbitrary task can be used as training examples to use to enable robots to learn how to accomplish similar tasks in the future.
  • One embodiment for a method of teleoperating a robot based on a pose of a subject includes two major steps: (i) generating body pose information of the subject in a captured image, and (ii) generating a plurality of kinematic parameters of the robot based on the generated body pose information of the subject in the captured image. In the step of generating body pose information, an algorithm is used to localize an array of body parts of the subject in the captured image. The algorithm then projects the localized body parts of the subject onto a three-dimensional (3D) skeleton model of the subject. The 3D skeleton model is output as an estimate of the pose and is used for estimating and tracking the poses of the subject in a next captured image. In the step of generating the plurality of kinematic parameters, the 3D skeleton model is then mapped, directly or indirectly, to a configuration of the robot to determine a plurality of joint angles of the robot that correspond to the position and/or orientation of the subject's pose.
  • A subject herein refers to any moving objects that have more than one pose. The moving objects include, among other objects, animals, people, and robots. Although embodiments herein are described with reference to humans as the subject, note that the present invention can be applied essentially in the same manner to any other object or animal having more than one pose. In several instances, the subject may also be referred to as an operator.
  • The localized body parts herein refer to any portion of the subject that can be conceptually identified as one or more joints and links. For example, in a human subject, the localized body parts include, among other parts, a head, a torso, a left arm, a right arm, a left hand, a right hand, a left leg, and a right leg. The localized body parts can be subdivided into other parts (e.g., a left arm has a left upper arm and a left forearm, a left hand has a left thumb and left fingers). The one or more body parts may be localized relative to a camera, an external landmark, or another point on the subject's body. Note that the number of localized body parts is not limited and can be increased or decreased according to the purposes of the pose estimation and tracking. Body parts may also be referred to herein as limbs, segments, and links, and vice versa.
  • A model herein refers to a representation of the subject by joints and links. In one embodiment, the model is a human body represented as a hierarchy of joints and links with a skin mesh attached. Various models with joints and links can be used as the model of the subject. In alternative embodiments, the model is a subset of joints and links of the human body. For example, the model may be a hand that includes one or more of the following: a palm, a thumb, and a finger. For the sake of clarity, the skeleton model is referred to throughout, but it is understood that the skeleton model may not represent the full human body and instead may represent a portion of the human body.
  • FIG. 1 illustrates a block diagram of a system 100 for teleoperation of robotic systems 115 a-115 d, according to an embodiment. The system 100 includes, among other components, a network 105 that connects operator systems 110 a-110 d (collectively referred to as “operator systems 110” and also individually referred to as “operator system 110”), robotic systems 115 a-115 d (collectively referred to as “robotic systems 115” and also individually referred to as “robotic system 115”), and a processing server 120. In the embodiment of FIG. 1, four operator systems 110 a, 110 b, 110 c, 110 d and four corresponding robotic systems 115 a, 115 b, 115 c, 115 d are illustrated, but it is understood that the number of each system is not limited and can be increased or decreased. Some embodiments of the system 100 have different components than those described here. Similarly, in some cases, functions can be distributed among the components in a different manner than is described here.
  • The network 105 provides a communication infrastructure between the operator systems 110, the robotic systems 115, and the processing server 120. The network 105 is typically the Internet, but may be any network, including but not limited to a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile wired or wireless network, a private network, or a virtual private network. The network 105 enables users in different locations to teleoperate robots of robotic systems, for example, for the purposes of robotic labor.
  • The operator system 110 enables an operator to teleoperate one or more corresponding robotic systems 115. The operator system 110 may be located at a distance from its corresponding one or more robotic systems 115. In the embodiment of FIG. 1, the operator system 110 is controlled by the operator, who may be the subject of one or more captured images. For the sake of clarity, it is understood that the subject and the operator are referred to interchangeably, but it is also understood that, in some embodiments, the subject in the captured images may be a separate subject from the operator of the operator system 110. Generally, the operator takes one or more poses, and a robot mimics a processed mapping of the poses. The operator may take a specific series of continuous or non-continuous poses that causes the robot to accomplish a certain task. As the operator takes the one or more poses, the operator system 110 captures images of the subject and generates body pose information of the subject in the captured images. The generated body pose information is a representation of the pose of the subject in the captured images, which dictates a pose that a robot of a corresponding robotic system 115 takes. The operator system 110 then transmits the generated body pose information to the corresponding robotic system 115 via the network 105. In the embodiment of FIG. 1, the operator system 110 a corresponds to robotic system 115 a, the operator system 110 b corresponds to robotic system 115 b, the operator system 110 c corresponds to robotic system 115 c, and the operator system 110 d corresponds to robotic system 115 d. In alternative embodiments, one operator system 110 may correspond to two or more robotic systems 115. In the embodiment of FIG. 1, the operator system 110 includes an image capturing device 125 and an operator system controller 130.
  • The image capturing device 125 captures images and/or video of the subject whose pose is to be mapped to a robot of a corresponding robotic system 115. The image capturing device 125 may comprise one or more cameras positioned and/or oriented to capture part or all of the subject's body. The image capturing device 125 may be positioned on the subject's body and oriented such that segments of the subject's body are within a field of view of the image capturing device 125. Alternatively, the image capturing device 125 may be positioned external to the subject's body such that all or portions of the subject's body are within the field of view of the image capturing device 125. For example, the image capturing device 125 may be part of a camera assembly, an external mobile device, a virtual reality (VR) or augmented reality (AR) headset, a standalone VR or AR camera assembly, a similar portable imaging device, or some combination thereof. The field of view of the image capturing device 125 may vary to capture more or less of the subject's body. For example, the image capturing device 125 may comprise standard lenses or wide angle lenses (e.g., a fisheye lens). The image capturing device 125 may capture two-dimensional (2D) images. In alternative embodiments, the image capturing device 125 may comprise one or more depth cameras or cameras in stereo to capture images with depth information. The image capturing device 125 may capture images of the operator at a random or specified interval. In some embodiments, the operator may take a series of poses that cause the robot to accomplish a task. The image capturing device 125 may capture images as it detects movement of the operator. In some embodiments, the image capturing device 125 sends the captured images to the operator system controller 130. In alternative embodiments, the image capturing device 125 is integrated with the operator system controller 130.
  • In some embodiments, the image capturing device 125 captures images/and or video of equipment that is worn or manipulated by an operator. For example, the operator may be wearing a glove or holding a wand or a controller that includes visual markers. The image capturing device 125 may detect and capture a pose or motion of the visual markers, which can then be mapped to the robot of the corresponding robotic system 115. This configuration may be beneficial for robots including an end-effector or an instrument that resembles the glove or wand/controller manipulated by the operator. In some embodiments, the wand/controller may include buttons or switches as additional input for robot control, which may improve intuitive control and/or efficiency of the operator.
  • The operator system controller 130 generates body pose information of the subject in the captured image. The generated body pose information indicates a pose of the subject in the captured image. The operator system controller 130 may be a desktop, a laptop, a mobile device, or a similar computing device. In the embodiment of FIG. 1, the operator system controller 130 receives the captured images from the image capturing device 125. The operator system controller 130 may execute an algorithm that localizes an array of body parts of the subject in the captured image. The algorithm then projects the localized body parts of the subject onto a three-dimensional (3D) skeleton model of the subject. The 3D skeleton model is output as the estimate of the pose and is used for estimating and tracking the poses of the subject in a next captured image. Alternatively, the operator system controller 13 may execute an algorithm that directly predicts an estimate of the pose of the subject. The operator system controller 130 transmits the body pose information of the subject to the corresponding robotic system 115.
  • The operator system controller 130 may transmit additional teleoperation data to one or more corresponding robotic systems 115. The teleoperation data may be parameters associated with each captured image and/or processed image that are transmitted throughout teleoperation or may be calibration parameters that are transmitted before or during initial stages of teleoperation. In some embodiments, the parameters may be manually set by an operator (e.g., via a user interface), automatically determined by the operator system 110 or robotic system 115, and/or could be updated throughout teleoperation. The teleoperation data may be transmitted as a set of one or more parameters. Parameters may relate to motion scaling or sensitivity, pause functionality, origin reset, Cartesian or joint axis locking and unlocking, bounding volumes, ‘home’ positions and orientations, quick-snap orientations and positions and other similar features. Pause functionality enables the teleoperator to perform a gesture or use a specific pose that, when detected by the image capturing device 125, pauses motion and/or operation of the robot arm, which effectively pauses tracking between the teleoperator pose and the robot arm. A counter-gesture or counter-pose may be performed by the teleoperator to resume motion and/or operation of the robot arm. This feature may be used by the teleoperator to change or adjust their position, for example, to improve their comfort during teleoperation. Origin reset enables the teleoperator to modify the reference point to which the robot's motion or pose is relative. In one embodiment, this enables the teleoperator to keep the robot's motion within a comfortable range of human arm motion. Motion scaling enables motion from the operator to be mapped to motion of the robot on a different scale. For example, certain precise tasks performed by the robot may include small-scale motion (e.g., sub-millimeter motion) while the operator may move on a relatively larger scale (e.g., a centimeter scale); by scaling the motion of the operator, a robot may then move on a relatively smaller scale (e.g., a micron scale). As another example, a large robot may perform large motions; motion of the operator may occur on a relatively smaller scale (e.g., the centimeter scale), which may be scaled to correspond to motion of the robot on a relatively larger scale (e.g., a meter scale). Motion scaling may be applied linearly or non-linearly to individual axes in Cartesian space or joint space. Cartesian or joint-axis locking enables an operator to constrain the motion of a robot to a plane, a line, or point in 3D space. It may also be used to lock orientation of one or more segments and/or end-effectors of the robot along one or more axes. Bounding volumes may constrain a robot to only move within a certain subspace of its total workspace. Quick-snap orientations or positions may enable the robot to take a predefined pose or a pose calculated based on a vision system of the robot. If the vision system of the robot identifies a target object in the environment, the operator system controller 130 may suggest a pose based on the target object to the teleoperator who can then select for the robot to snap to the suggested pose. These features may be used in any combination and may apply to the entire robot or a portion of the robot (e.g., one or more segments and/or end-effectors). The operator system controller 130 is discussed in further detail with regards to FIG. 2.
  • The robotic system 115 controls the robot and causes the robot to move in accordance with a pose of the operator. The robotic system 115 receives the generated body pose information of the subject in the captured images and, based on the generated body pose information, determines mapping parameters and one or more kinematic parameters of the robot. In the embodiment of FIG. 1, the robotic system 115 includes a robot 135, an image capturing device 140, and a robotic system controller 145.
  • The robot 135 is a machine comprising one or more segments and one or more joints that are designed to manipulate, ambulate, or both in the case of mobile manipulation. The robot 135 may have an anthropomorphic design (having a human morphology) or similarly dimensioned segments resembling a human operator. For example, the robot 135 may have segments and joints that resemble body parts (e.g., limbs such as an arm, a leg, etc.) of the human operator and are designed to ambulate in a similar way. In some embodiments, the robot 135 may have an end-effector that resembles a human hand (e.g., having several fingers, joints, and degrees of freedom) or that functions similar to a hand (e.g., a claw, a 3-finger gripper, an adaptive gripper, an internal or external gripper, etc.). In other embodiments, the robot may not have an anthropomorphic design, where the robot's joints and segments do not closely align to joints and segments on the human operator's body. Generally, the robot 135 may have one or more ambulating segments (achieving mobility via wheels, legs, wheeled legs, or similar methods), a stationary arm with an end-effector, a combination of one or more ambulating segments and an end-effector, or some combination thereof. To move the robot 135, each joint may have one or more actuators.
  • In some embodiments, the robot 135 may include a gripper at the end-effector. The robot end-effector is gripper agnostic and can be used with several existing or custom grippers with varying number of degrees of freedom. The robot or robot arm may be equipped with a mobile base for locomoting around its environment using wheels, tracks, legs, or a multi-modal design incorporating legs with wheels or treads or any combination thereof. The teleoperation interface is robot agnostic and need not be paired with any particular robot arm to work as intended.
  • The image capturing device 140 captures images and/or video of the robot 135 and a local area surrounding the robot 135. The local area is the environment that surrounds the robot 135. For example, the local area may be a room that the robot 135 is inside. The image capturing device 140 captures images of the local area to identify objects that are near the robot 135. Identifying nearby objects enables the robotic system 115 to determine if there are any objects the robot will interact with to perform a task or if there are any constraints to the range of motion of the robot 135. For example, the robot 135 may be located in a small room near one or more walls, near one or more other robots, or other similar objects that the robot 135 aims to avoid during ambulation or manipulation. This enables safe use of the robot 135, especially if the robot 135 is in the presence of humans. The image capturing device 140 may capture images at a random, continuous, or specified interval to determine changes in the environment and subsequently update any constraints that need to be placed on the range of motion of the robot 135. The image capturing device 140 may be positioned and/or oriented to capture all or a portion of the robot 135 and its environment. Embodiments in which the image capturing device 140 comprises one or more cameras, the cameras may be located or mounted directly on varying parts of the robot or can be external to the robot. Similar to the image capturing device 125, the image capturing device 135 may be part of an imaging assembly, an external mobile device, a virtual reality headset, a standalone virtual reality camera assembly, a similar portable imaging device, a computer webcam, dedicated high-resolution camera(s), or some combination thereof. The field of view of the image capturing device 135 may vary to capture more or less of the robot 135. For example, the image capturing device 135 may comprise standard lenses or wide angle lenses (e.g., a fisheye lens). The image capturing device 135 may capture two-dimensional images. In alternative embodiments, the image capturing device 135 may comprise one or more depth cameras or cameras in stereo to capture images with depth information.
  • The robotic system controller 145 receives the generated body pose information from its corresponding operator system 110 and accordingly determines a set of mapping parameters and kinematic parameters to control the motion of the robot 135. As previously described, the body pose information may be in the form of a 3D skeleton model of the subject based on a pose of the subject in one or more captured images. The robotic system controller 115 maps the 3D skeleton model to the configuration of the robot 135. The robotic system controller 145 may have one or more control modes for mapping the arm and/or leg poses and joint angles to segments and joint angles of the robot 135. For example, a first control mode may be a direct mapping if the robot 135 has an anthropomorphic design or similarly dimensioned arms and/or legs to the operator. A second control mode may be an indirect mapping if the robot 135 does not have an anthropomorphic design. As such, the robotic system controller 145 is able to map an operator pose to a robot with any type of configuration. By mapping the 3D skeleton model to the configuration of the robot 135, the robotic system controller 145 determines one or more kinematic parameters for the robot 135. These kinematic parameters may include x-, y-, and z-coordinates; roll, pitch, and yaw; and joint angles for each segment and joint of the robot 135. The workspace coordinates of the robot 135 may be selected or pre-determined. The robotic system controller 145 may also receive and process force and/or haptic feedback from sensors on the robot 135; the robotic system controller 145 may transmit the force and/or haptic feedback to the operator system 110, which enables the operator to feel forces that the robot 135 is sensing as it moves and interacts with its environment. In an alternative embodiment, the force and/or haptic feedback from the robot 135 may be conveyed to the operator by visual or audible modalities, for example, in the form of augmented reality features on the operator system 110. The robotic system controller 145 may be a desktop, a laptop, a mobile device, or a similar computing device. The robotic system controller 145 is discussed in further detail with regards to FIG. 3.
  • The processing server 120 enables users to operate the operator systems 110 and robotic systems 115 via the network 105. The processing server 120 may be embodied in a single server or multiple servers. Further, each server may be located at different geographic locations to serve users of the operator system 110 or the robotic system 115 in different geographic locations. In the embodiment of FIG. 1, the processing server 120 may host the platform that allows users of the operator system 110 and the robotic system 115 to access and control each system without needing to install or download the platform onto their own devices.
  • In addition, the processing server 120 processes the data collected from the operator systems 110 and robotic systems 115. The processing server 120 executes a machine learning algorithm that learns from examples of robots being teleoperated to accomplish a variety of tasks in various environments and applications. In an example application, the system 100 may be used as a control input to crowdsourcing teleoperation of robotic labor. Because crowdsourcing leverages the network effect, the teleoperative nature of the system 100 enables the creation of a large data set of diverse demonstration tasks in diverse environments (which does not currently exist and is difficult/expensive to generate). In this configuration, the system 100 enables the use of powerful tools such as crowdsourcing data collection and deep imitation learning and meta-learning algorithms (which requires large amounts of data) to teach a robot to accomplish certain tasks. This learning process becomes possible when a robot is exposed to thousands of examples of how to properly (and not properly) accomplish a task. In the embodiment of FIG. 1, the processing server 120 includes the imitation learning engine 150.
  • The imitation learning engine 150 implements an algorithm to learn how a robot can perform different tasks based on the examples from human operators. The imitation learning engine 150 inputs into its model the data consisting of thousands of examples of robots executing a pose or performing a task based on the subject performing the tasks through teleoperation. A few examples of specific algorithms that may be employed are neural networks, imitation learning, meta-learning, deep multi-modal embedding, deep reinforcement learning, and other similar learning algorithms. The imitation learning engine 150 learns and extracts representations from these examples to determine appropriate movements for the robot to perform similar and unseen tasks in the same or different environments as provided in the demonstration training dataset. Accordingly, the imitation learning engine 150 stores a “label” corresponding to each task that includes the determined appropriate movements for each task. The imitation learning engine 150 can exist locally on the robotic system controller of a robot, on the operator system controller of an operator, or in the cloud running on a cloud server. In any embodiment, the data collected from each robot-teleoperator pair can be shared collectively in a database that enables data sharing for parallelized learning such that a first robot in a first environment performs a task, and, once the task is learned by the imitation learning engine 150, a second robot in a second environment may also learn the motions to perform the same task (as well as a third robot in a third environment, a fourth robot in a fourth environment, and so on, until an Nth robot in an Nth environment).
  • FIG. 2 illustrates a block diagram of the operator system controller 130, according to one embodiment. As described with regards to FIG. 1, the operator system controller 130 generates body pose information of a subject in a captured image. The operator system controller 130 may be a desktop, a laptop, a mobile device, or a similar computing device. One or more of the components in the operator system controller 130 may be embodied as software that may be stored in a computer-readable storage medium, such as memory 205. In the embodiment of FIG. 2, the memory 205 stores, among others, a user device communication module 210, a pose estimation module 215, a user interface module 220, a robotic system controller interface 225, and an imitation learning system interface 230. Instructions of the software modules are retrieved and executed by a processor 235. The computer-readable storage medium for storing the software modules may be volatile memory such as RAM, non-volatile memory such as a flash memory or a combination thereof. A bus 240 couples the memory 205 and the processor 235. The bus 240 additionally couples the memory 205 to an image capturing device interface 245, a user interface circuit 250, and a network interface 255. Some embodiments of the operator system controller 130 have different components than those described here. Similarly, in some cases, functions can be distributed among the components in a different manner than is described here.
  • The user device communication module 210 is software, firmware, or a combination thereof for communicating with user devices via the network 105. A user device may be a device that an operator uses as part of the operator system 110. For example, a user device may be a mobile computing device, and the operator system controller 130 may be a desktop or a laptop that communicates with the user device. The user device communication module 210 receives commands and requests from the user device to access and control the operator system 110.
  • The pose estimation module 215 estimates a body pose of a subject in a captured image. In the embodiment of FIG. 2, the pose estimation module 215 may include, among others, an image processor 260, a skeletal model mapper 265, and a tracking module 270 as described below in detail.
  • The image processor 260 receives and processes the images captured by the image capturing device 125. The image processor 260 identifies a subject and the subject's body parts in a captured image. For example, the image processor 260 identifies hands, fingers, arms, elbows, shoulders, legs, knees, a head, etc. of the subject. The image processor 260 may use a machine learning model (e.g., a pre-trained deep learning model or convolutional neural network) to identify these body parts in each captured image. Additionally, the machine learning model localizes body parts and the dimensions between adjacent body parts or joints. Embodiments in which the captured images are without depth information, the localized body parts are two-dimensional characteristics of the pose of the subject. The machine learning model may use spatial motion information from an IMU on the mobile device from the relationship between a changing image perspective and the 6-axis motion of the image capturing device 125 (in an embodiment in which the image capturing device and the IMU are embedded in the same device and do not move relative to one another). In alternative embodiments, the operator may manually set the subject's body part dimensions. In some embodiments, the machine learning model may track certain body parts, joints, or segments relative to other body joints, parts, or segments, relative to an external landmark, or relative to the image capturing device 140.
  • The skeletal model mapper 265 projects the two-dimensional localized body parts to a three-dimensional skeleton model of the operator. In the embodiment of FIG. 2, the skeletal model mapper 265 executes an algorithm that enhances the alignment between a 2D pixel location of each body part in the captured image and the 3D skeleton model. The 3D skeleton model of the operator may be calibrated for operators of different sizes. In the embodiment of FIG. 2, the 3D skeleton model may include several parameters, such as body part dimensions (e.g., limb lengths), joint angles between adjacent body parts (e.g., limbs), and other relevant pose information. An output of the 3D skeleton model may be estimated pose information, which may include x-, y-, and z-coordinate positions with respect to a coordinate system (i.e., workspace) of each body part of the operator; roll, pitch, and yaw of the one or more body parts of the operator; and joint angles between adjacent body parts. In some embodiments, the skeletal model mapper 265 creates the 3D skeleton model during a calibration process, where the 3D skeleton model represents an initial estimated pose of the operator. The 3D skeleton model may receive as input the two-dimensional localized body parts from subsequent captured images of the subject and may output pose information for the pose of the subject in the subsequent captured images. In this configuration, the 3D skeleton model can be used to estimate and track poses of the subject based on subsequent captured images of the subject.
  • The tracking module 270 tracks the poses of the subject in subsequent images captured by the image capturing device 125. The tracking module 270 receives one or more processed images from the image processor 260, and uses it to estimate pose information of the subject in the processed images. In some embodiments, the one or more processed images may be images that were captured subsequent to the captured images used to generate the 3D skeleton model. In this configuration, the pose estimation module 215 is able to estimate a pose of a subject in real-time as images are captured by the image capturing device 125. The pose estimation of the subject is transmitted to the corresponding robotic system controller 145. This enables a robot of a corresponding robotic system to take a pose in accordance with the subject in real-time.
  • In alternative embodiments, the pose estimation module 215 may directly input one or more captured images into a machine learning model. The machine learning model may then output an estimation of the pose of the subject in the captured images or may then output a prediction of a pose or a motion of the robot. In this configuration, the pose estimation module 215 does not separately localize body parts of the subject in the captured images and generate a corresponding 3D skeleton model.
  • The user interface module 220 may update a user interface that allows the user to interact with and control the operator system 110. In the embodiment of FIG. 2, the user interface module 220 may provide a graphical user interface (GUI) that displays the robot 135. The GUI may display the robot 135 in its current environment and/or a simulated model of the robot in a simulated environment. The GUI may include a manual controller that allows individual control of each of the robot's joint angles as well as the position and orientation of an end-effector of the robot 135. The GUI may additionally include a point-and-click function that enables the operator to select, via a mouse or a touchscreen on the user device, objects in the robot's environment. Based on the object in the environment and past experiences with similar objects, the system 100 may infer how the operator would like that object manipulated or handled by the robot. A simulation of that action may then be shown to the user via the user interface (e.g., mobile screen, monitor, AR/VR, etc.) before the robot executes the task. The GUI may include options for the user to approve or reject the simulated action. In this configuration, the operator ensures that the autonomy of completing the specified task is correct before allowing the robot to move. The GUI may include options to enable or disable modes that dictate the autonomy of the robot 135. For example, the operator system controller 130 or the corresponding robotic system controller 145 may store automated motions that have been pre-defined, programmed, or previously-learned. These modes may increase the speed and efficiency of the operator. Similarly, the GUI may provide suggestions to an operator that may further streamline teleoperation of the robot 135. Suggestions may include poses or “snap” poses for the robot 135 to take. These poses may be poses that pre-defined, programmed, or previously-learned poses. A “snap” pose may snap one or more segments and/or end-effectors of the robot 135 into a pose or to an object to perform a dexterous task. For example, learned graspable objects (e.g., door handles, writing instruments, utensils, etc.) may have corresponding snap poses that enable the robot 135 to grasp the object. In this configuration, the robot 135 may be able to manipulate objects quickly and minimize fine robot control by an operator.
  • In one embodiment, the user interface module 220 may present an image and/or video stream of the robot 135 in the GUI on a monitor, mobile device, a head set (AR, VR, and/or MR), or similar. The user interface module 220 may overlay onto the video stream a simulation of the robot 135 or a portion of the robot 135 (e.g., an end-effector of the robot 135). Using the GUI, an operator may be able to position and/or orient the robot 135 in 6D space. An operator may be able to add one or more set points that define a pose or motion of the robot 135. The set points may be ordered in a defined sequence. Each set point may be associated with one or more types that each indicate an action that the robot may take at the set point. The robot 135 may then move through the set points in the defined sequence. The user interface module 220 may provide a simulation of the defined sequence in the GUI as an overlay on the image and/or video stream of the robot 135. Example set point types may include contact, grasping, trajectory, or other similar actions, or some combination thereof. A contact set point may define that the robot 135 contacts an object, tool, or area within its environment. A grasping set point may define that the robot 135 grasp an object when it reaches the set point. A trajectory set point may be used as a waypoint in a trajectory to ensure that the robot 135 moves through a target trajectory, for example, to avoid collisions with itself and/or the environment. In this embodiment, the user interface module 220 may also provide one or more suggestions for snap poses that each correspond to a target pose. The user interface module 220 may also provide one or more snap regions that correspond to each snap pose. An operator may select a snap pose and, in some embodiments, a snap region. The GUI may provide a simulation of the robot 135 snapping to the pose. The operator may select to accept or reject the simulation. If the simulation is accepted, the user interface module 220 may add the snap pose as a set point.
  • The user interface module 220 may additionally communicate depth information of the robot 135 and its environment to the operator. In one embodiment, a VR headset may be used to project stereo images into each eye that were captured using a stereo image capturing device on the robot 135. In this configuration, the human brain perceives depth information as human eyes naturally do without a VR headset. In an alternative embodiment, the user interface module 220 may use a mobile device, a monitor, or a head set (AR, VR, and/or MR) to display a video stream from the image capturing device 140 of the robot 135 to the operator. In these embodiments, additional features may be added to enhance depth perception of a 3D world projected onto a 2D computer monitor or mobile device. A processed depth stream from a depth camera may be displayed in depth form or as a point cloud to the operator. Multiple videos may be displayed from the image capturing device 140 of the robot 135, which may include multiple cameras with different perspectives (top view, side view, isometric view, gripper camera view, etc.) of the robot 135. Augmented reality (AR) features may be overlaid in real-time onto the video stream from the image capturing device 140 of the robot 135 to enhance depth perception. Example AR features may include depth-based augmented reality boxes, lines, shapes, and highlighting; square grids that align with 3D features in the environment of the robot 135; real or augmented laser pointer projected from an end-effector of the robot 135 to objects in the environment of the robot 135 with a measured distance reading to that object; use of background, foreground, stripes, and masking to distinguish objects of interest from the background; use of chromostereopsis methods where glasses with different colored lenses and processed display videos may be used to create an illusion of depth; use of processed images via spatio-temporal blur and focus rendering; use of a homunculus control panel with one or more camera feeds; a simulated robot configuration rendered over a transformed perspective of the point cloud image; and/or one or more of the previously described features depth enhancing features. These features may be integrated into the user interface module 220 individually or in some combination thereof. The AR features may be generated using stereo or depth sensing cameras of the image capturing device 140.
  • The robotic system controller interface 225 couples the operator system controller 130 to the robotic system 115 via the network 105. The robotic system controller interface 225 may transmit data to the robotic system controller 145 and receive data from the robotic system controller 145. In the embodiment of FIG. 2, the robotic system controller interface 225 transmits the generated pose estimation of the subject and tracking information to the robotic system 115. In some embodiments, the robotic system controller interface 225 may transmit additional data, such as the images captured by the image capturing device 125 and/or commands or requests input by the user via the user device. The robotic system controller interface 225 may receive captured images of the robot 135 captured by the image capturing device 140 and haptic feedback from the robotic system controller 145. The robotic system controller interface 225 may transmit data in real-time or at specified or random intervals.
  • The imitation learning system interface 230 provides data from the operator system 110 to the imitation learning engine 150 online or offline. The imitation learning system interface 230 transmits data associated with a subject performing a task, such as the captured images, the 3D skeleton model, the pose tracking information, and/or other relevant information. The imitation learning system interface 230 may transmit this data in real-time or at specified or random intervals. This enables the imitation learning engine 150 to continually improve online in real-time in a parallelized framework with every additional teleoperational task completed, which enables the robots connected within the system 100 to become more capable of autonomously performing tasks and requires fewer human interventions.
  • The image capturing device interface 245 is software, firmware, hardware, or a combination thereof that couples the operator system controller 130 to the image capturing device 125. For example, the image capturing device interface 245 may be a USB cable that couples to the bus 240. In another embodiment, image capturing device interface 245 may enable a wireless connection to the image capturing device 125, e.g., via the network 105, Bluetooth, or a similar connection.
  • The user interface circuit 250 is software, firmware, hardware, or a combination thereof that couples the user interface to the operator system controller 130. For example, the user interface circuit 250 may couple a keyboard and/or a mouse to the operator system controller 130 via the bus 240. In another embodiment, the user interface circuit 250 may enable a touchscreen or monitor on a user device of the operator system 110.
  • The network interface 255 is a hardware component that couples the operator system controller 130 to the network 105. For example, the network interface 255 may be a network interface card, a network adapter, a LAN adapter, or a physical network interface that couples to the bus 240.
  • FIG. 3 illustrates a block diagram of a robotic system controller, according to one embodiment. As described with regards to FIG. 1, the robotic system controller 145 receives the generated body pose information from its corresponding operator system 110 and accordingly determines a set of kinematic parameters to move the robot 135. The robotic system controller 145 may be a desktop, a laptop, custom computer, a mobile device, or a similar computing device. The robotic system controller 145 includes components that are stored in a computer-readable storage medium, such as memory 305. In the embodiment of FIG. 3, the memory 305 stores an operator system controller interface 310, a robot mapping module 315, a robot kinematics module 320, a feedback module 325, and an imitation learning system interface 330. Instructions of the software modules are retrieved and executed by a processor 335. The computer-readable storage medium for storing the software modules may be volatile memory such as RAM, non-volatile memory such as a flash memory or a combination thereof. A bus 340 couples the memory 305 and the processor 335. The bus 340 additionally couples the memory 305 to an image capturing device interface 345, a robot interface 350, and a network interface 355. Some embodiments of the operator system controller 130 have different components than those described here. Similarly, in some cases, functions can be distributed among the components in a different manner than is described here.
  • The operator system controller interface 310 enables communication between the robotic system 115 and the operator system controller 130 via the network 105. The operator system controller interface 310 may transmit data to the operator system controller 130 and receive data from the operator system controller 130. In the embodiment of FIG. 3, the operator system controller interface 310 receives the generated pose estimation of the subject and tracking information from the operator system 110. The operator system controller interface 310 may transmit captured images of the robot 135 and its environment captured by the image capturing device 140 and feedback from the robot 135 including but not limited to force, torque, position, velocity, and other sensory feedback from the robot's joints, end-effector, segments, or externally in the robot's environment. In some embodiments, the operator system controller interface 310 transmits additional data, such as the configuration of the robot 135, current or previous states of the robot 135 including kinematic parameters for each state, information regarding the local area surrounding the robot 135, or some combination thereof. The operator system controller interface 310 may transmit data in real-time or at specified or random intervals.
  • The robot mapping module 315 maps the estimated pose of the operator to the configuration of the robot 135. In one embodiment, mapping the estimated pose to the robot 135 is performed by aligning and potentially scaling the limbs and joint angles of the operator to the segments and joint angles of the robot 135. The robot mapping module 315 may create a set of mapping parameters, which may include scaling coefficients, relationships of corresponding joints or segments, and other relevant information. In the embodiment of FIG. 3, the robot mapping module may have several control modes for mapping. For example, in a first control mode, direct mapping may be employed if the robot 135 has an anthropomorphic design or similarly dimensioned arms, legs, and/or fingers. Direct mapping maps the limbs and joint angles of the operator directly to the segments and joint angles of the robot 135. In this configuration, control of the robot 135 may be intuitive to the operator, especially if a virtual reality headset is used by the operator.
  • In a second control mode, indirect mapping may be employed if the robot 135 does not have an anthropomorphic design or similarly dimensioned arms, legs, and/or fingers. Indirect mapping may use a linear or non-linear function to map an estimate of the limbs and joint angles of the operator to the segments and joint angles of the robot 135. Indirect mapping may be used if 1) the robot's dimensions are on a different scale compared to the operator's body, 2) the robot has a different kinematic configuration or number of joints compared to the operator's body, or 3) it is desired to have varying levels of control sensitivity in joint or end-effector space.
  • In a third control mode, end-effector mapping may be employed if the robot 135 has an arm or leg that includes an end-effector where only the end-effector ambulates in accordance with the operator. End-effector mapping may track the poses of the operator's hand rather than the operator's limbs. The position and/or orientation of the fingers and/or the joint angles of the operator's hands are mapped to the position and/or orientation of the segments and/or joint angles of the end-effector. In this configuration, control of just the end-effector of the robot 135 may be intuitive when the robot 135 does not have an anthropomorphic design. In some embodiments, the arm or leg of the robot 135 may be stationary or may ambulate according to the first or second control mode. The robot mapping module 315 may use one or control modes simultaneously for different portions of the robot 135.
  • In a fourth control mode, the operator's pose corresponds to a velocity or force controller rather than a position or pose tracker. In this embodiment, an origin position may be defined by the operator or automatically set to a default point in the operator's workspace. When the operator's hand (or other body part) is within a certain threshold distance from the origin, the robot 135 may not move. When the operator's hand is positioned at a distance greater than the threshold distance from the origin along one or more axes, the robot 135 may move at a velocity along an axis proportional to the distance the operator's hand is from the origin. To control robot orientation, the operator may use a user interface to toggle between position and orientation control. In one embodiment, a rotation vector connecting the origin to a point on the operator's body part (e.g., a palm center) and a norm of the rotation vector control a rotation axis and a proportional angular velocity about that rotation vector. Alternatively, a hand tracker may set thresholds relating to the operator's hand orientation such that when the hand orientation is within an angular threshold in roll, pitch, and yaw, the angular velocity of the robot 135 is zero. If the hand orientation exceeds those thresholds, the angular velocity of the robot 135 becomes proportional to an angular pose of the operator's hand relative to a coordinate frame at the origin. In this configuration, the operator may control the position and orientation of the one or more segments and/or end-effectors of robot 135 in velocity mode, allowing the operator to maintain his/her hand in a comfortable position. The pose of the operator may still be tracked, but in this embodiment, the relative position of the operator's hand relative to a defined origin maps to velocity of the robot 135 as opposed to the position of the operator's body mapping to the position of the robot 135. A user interface may display this functionality to operator control more intuitive. For example, the user interface may display a marker (e.g., a dot, simulated hand, or coordinate frame) that corresponds to the operator's hand position, which may have a coordinate frame overlaid onto it to illustrate the orientation of the operator's hand relative to a coordinate frame at the origin. The marker may be surrounded by a circle that defines the velocity threshold such that if the marker is within the circle, the robot 135 remains stationary in its current pose. If the marker is outside of the circle, then the robot 135 moves in the direction of the vector from the origin to the marker at a velocity proportional to a function of the norm of that vector.
  • In a fifth control mode, the operator may provide third person demonstrations that the robot mapping module 315 interprets such that the robot 135 performs higher level task-related motions. In this embodiment, the operator may manipulate an object in his/her environment, which the robot mapping module 315 maps to the robot manipulating a corresponding object (may or may not be the same object as the operator) in its environment in accordance with a processed version of the operators motion. The robot mapping module 315 may not map the exact poses or trajectory of the operator but rather may infer poses or a trajectory to achieve a similar high level task. For example, the operator may have a test object in his/her environment. The operator may specify an object in the environment of the robot 135 that corresponds to the test object. In some embodiments, the robotic system controller 145 may infer the object correspondence. The operator may manipulate the test object in some way, such as picking it up and placing it in a bin which will provide high level task information to the robotic system controller 145 that the robot 135 place the corresponding object in its environment in a bin. The objects in the operator's environment may not correspond identically with those in the environment of the robot 135. In the example described, the bins in the operator's environment and the robot's environment might be different sizes, shapes, colors, may appear differently, and may be placed in different locations relative to the test/corresponding object and/or operator/robot. In this control mode, the robot 135 may have a higher level of intelligence and may be trained on extracting higher level task-related information from the operator demonstration as opposed to fine motor control commands from the operator mapping explicitly to motion. This task-mapping mode may be used to manipulate objects in lower-level control mode such that however the operator manipulates the test object, the robot 135 manipulates the corresponding object in the same or similar (inferred) way.
  • The robot kinematics module 320 determines one or more kinematic parameters for the robot 135. In the embodiment of FIG. 3, the kinematic parameters correspond to a position and an orientation for each segment and/or joint of the robot 135. The kinematic parameters may include one or more of the following: a set of x-, y-, and z-coordinates with respect to the coordinate system (i.e., workspace) of the robot 135; roll, pitch, and yaw describing orientation of one or more segments of the robot 135; joint angles between adjacent segments; a set of transformation coefficients between the body of the operator and the configuration of the robot 135. The robot kinematics module 320 determines these kinematic parameters based on the mapping parameters from the robot mapping module 315 that maps the body pose of the operator to the configuration of the robot 135. The robot kinematics module 320 may send the kinematic parameters to the robot interface 350 for motion of the robot 135 in accordance with the kinematic parameters.
  • As the operator takes a series of poses that collectively cause the robot 135 to perform a task, the robot kinematics module 320 determines a set of kinematic parameters for each subsequent pose. For the subsequent poses that the robot 135 may take, the robot kinematics module 230 may consider an initial state of the robot 135 (e.g., current pose) and a target state of the robot 135 (corresponding to the pose of the subject) to determine a movement to transition the robot 135 from the current state to the target state. The robot kinematics module 320 may generate an intermediate set of parameters that represent the transitional movement (i.e., a motion trajectory). In the embodiment of FIG. 3, the robot kinematics module 320 may perform an optimization algorithm to determine the optimal transitional movement. The robot kinematics module 320 may consider any constraints placed on the robot 135, for example to prevent self-collision or collisions with objects in the local area of the robot 135 as determined from the image capturing device 140. In some embodiments, the operator system controller interface 310 sends the kinematic parameters and intermediate parameters to the operator system controller 130 such that a simulation of the movement is displayed in a user interface of the user device, enabling the operator to approve or reject the simulated movement before the robot 135 takes the pose.
  • The feedback module 325 receives and processes feedback from the robot 135. In the embodiments of FIGS. 1-3, the robot 135 may include sensors on each segment or at each joint, such as torque sensors, encoders, cameras, IMUs, and other possible sensors. The feedback module 325 may monitor the feedback from the sensors to ensure that the detected feedback stays within an acceptable range. For example, monitoring feedback from the torque sensors ensures that the segments and/or joints of the robot 135 do not experience excessive load-bearing forces. In some embodiments, the feedback module 325 may constrain a motion or a pose of the robot 135 if the feedback module 325 detects feedback that is outside of an acceptable range. In the embodiment of FIG. 3, the operator system controller interface 310 may transmit force or haptic feedback from the feedback module 325 to the operator system 110, which may enable the operator to feel forces that the robot 135 is sensing as it moves and interacts with its environment. In some embodiments, the operator system 110 may update a user interface of the user device to inform the operator of the feedback and if any detected feedback is outside of an acceptable range. The operator system 110 may provide multisensory feedback (e.g., visual or audio feedback) through, for example, AR or display features.
  • The imitation learning system interface 330 provides data from the robotic system 115 to the imitation learning engine 150. The imitation learning system interface 330 transmits data such as images captured by the image capturing device 140 of the robot 135 and its environment, images captured by the image capturing device 125 of the operator, mapping parameters, kinematic parameters, corresponding initial and target states and the associated intermediate parameters, sensor feedback, and other relevant information such as an embedding or information of the type of task being performed. Based on the tasks performed by the operator and the corresponding states and kinematic parameters of the robot 135, the imitation learning engine 150 learns and labels the poses for a robot to accomplish each task. The imitation learning system interface 330 may transmit this data in real-time or at specified or random intervals. This enables the imitation learning engine 150 to continually improve online in real-time, in a parallelized framework where the robotic systems 115 collectively learn from their own and other's demonstrations and experiences. With every additional teleoperational task completed, the robots become more capable of autonomously performing tasks and require fewer human interventions.
  • The image capturing device interface 345 is software, firmware, hardware, or a combination thereof that couples the operator system controller 130 to the image capturing device 140. For example, the image capturing device interface 345 may be a USB cable that couples to the bus 340. In another embodiment, image capturing device interface 345 may enable a wireless connection to the image capturing device 140, e.g., via the network 105, Bluetooth, or a similar connection.
  • The robot interface 350 may be software, firmware, hardware, or a combination thereof that couples the robotic system controller 145 to the robot 135. For example, the robot interface 350 may be a power cable, USB cable, or a similar connection. In alternative embodiments, the robot interface 350 may be a wireless connection via the network 105, Bluetooth, or a similar wireless connection. In the embodiment of FIG. 3, the robotic system controller 145 transmits the intermediate parameters and the kinematic parameters to one or more actuators at the respective joints of the robot 135. In this configuration, the actuators move the robot 135 in accordance with the parameters received. The robot 135 may additionally send sensor feedback to the robotic system controller 145 via the robot interface 350.
  • The network interface 355 is a hardware component that couples the robotic system controller 145 to the network 105. For example, the network interface 355 may be a network interface card, a network adapter, a LAN adapter, or a physical network interface that couples to the bus 340.
  • FIG. 4 illustrates a flowchart of a method 400 for teleoperating a robot by mapping a pose of an operator, according to one embodiment. The method 400 can be performed using a computer system (e.g., system 100).
  • An image capturing device (e.g., image capturing device 125) captures 405 an image of a subject. The image capturing device may be part of an imaging assembly, an external mobile device, a virtual reality headset, a standalone virtual reality camera assembly, a webcam, a similar portable imaging device, or some combination thereof. The image capturing device may be positioned on the subject's body and oriented such that segments of the subject's body are within a field of view of the image capturing device, or the image capturing device may be positioned external to the subject's body such that all or portions of the subject's body are within the field of view of the image capturing device. In the embodiment of FIG. 4, the image capturing device captures images that are two-dimensional (i.e., without depth information). The image capturing device captures 405 images of the subject as the subject takes a series of poses, which are to be mapped to a robot of a robotic system, causing the robot to perform a task.
  • An image processor (e.g., image processor 260) processes 410 the captured image(s) to localize one or more body parts of the subject. The image processor identifies the subject and the subject's body parts in the captured image. For example, the image processor identifies hands, fingers, arms, elbows, shoulders, legs, knees, a head, etc. of the subject. The image processor may use a machine learning model (e.g., a pre-trained deep learning model or convolutional neural network) to identify these body parts in each captured image. Additionally, the machine learning model localizes body parts and the dimensions between adjacent body parts or joints.
  • A skeletal model mapper (e.g., skeletal model mapper 265) maps 415 the localized body parts to a human body skeletal model. The skeletal model mapper projects the two-dimensional localized body parts to a three-dimensional skeleton model of the operator. In the embodiment of FIG. 4, the skeletal model mapper executes an optimization algorithm that maximizes the alignment between a 2D pixel location of each body part in the captured image and the 3D skeleton model. The 3D skeleton model represents an initial estimated pose of the operator. In the embodiment of FIG. 4, the 3D skeleton model may include several parameters, such as body part dimensions (e.g., limb lengths), joint angles between adjacent body parts (e.g., limbs), and other relevant pose information.
  • A pose estimation module (e.g., pose estimation module 215) generates 420 body pose information of the subject. In some embodiments, the body pose information of the subject is generated based on the skeletal model. In alternative embodiments, a machine learning model estimates the body pose information based on the captured image(s) or a processed version of the captured image(s) of the subject. The machine learning model is used to estimate and track poses of the subject for subsequently received captured images of the subject.
  • A robot mapping module (e.g., robot mapping module 315) maps 425 the body pose estimates to a configuration of a robot (e.g., robot 135). The robot mapping module maps the body pose estimates of the operator to the configuration of the robot. The robot mapping module may create a set of mapping parameters, which may include scaling coefficients, relationships of corresponding joints or segments, and other relevant information. In the embodiment of FIG. 4, the robot mapping module may use one or more control modes (e.g., direct mapping, indirect mapping, end-effector mapping) for mapping.
  • A robot kinematics module (e.g., robot kinematics module 320) generates 430 kinematic parameters of the robot (e.g., robot 135). In the embodiment of FIG. 4, the kinematic parameters correspond to a position and an orientation for each segment and/or joint of the robot. The kinematic parameters may include one or more of the following: a set of x-, y-, and z-coordinates with respect to the coordinate system (i.e., workspace) of the robot 135; roll, pitch, and yaw of one or more segments of the robot; joint angles between adjacent segments; a set of transformation coefficients between the body of the operator and the configuration of the robot. The robot kinematics module determines these kinematic parameters based on the mapping parameters from the robot mapping module that maps the 3D skeleton model of the operator to the configuration of the robot.
  • A robotic system controller (e.g., robotic system controller 145) sends 435 the generated kinematic parameters to one or more actuators of the robot (e.g., robot 135). In accordance with the generated kinematic parameters, the actuators ambulate the one or more segments and joints to a target pose (corresponding to the pose of the subject).
  • A feedback module (e.g., feedback module 325) detects 440 sensor feedback of the robot (e.g., robot 135). The feedback module monitors the feedback from sensors on the robot to ensure that the detected feedback stays within an acceptable range. In some instances, the feedback module may constrain a motion or a pose of the robot if the feedback module detects feedback that is outside of an acceptable range.
  • Various modifications or changes may be made to the method 400 illustrated in FIG. 4. For example, steps 410, 415, and 440 may be omitted. Also, the sequence of steps 430, 435, and 440 may be modified.
  • FIG. 5 illustrates a schematic block diagram of a training phase of the imitation learning engine 150, according to one embodiment. During the training phase, the imitation learning engine 150 implements a learning algorithm to learn how a robot can perform different tasks based on example demonstrations from human operators. The imitation learning engine 150 inputs into its model a large number of examples of robots executing a pose or performing a task based on the subject performing the tasks. The imitation learning engine 150 learns using these examples to determine appropriate movements for the robot to perform the same tasks. Accordingly, the imitation learning engine 150 stores a “label” for each task that includes the determined appropriate movements for each task.
  • In the embodiment of FIG. 5, the imitation learning engine inputs data from several examples of a human operator teleoperating a robot to perform a task. Each example includes a series of poses by the subject and by the robot that occurred over a period of time, t=0 to t=Z, where Z indicates the amount of time to complete the task. As illustrated in FIG. 5, an example includes a task label 505 associated with the task performed by the robot, captured images 510, object information 515, a robot state 520 of the robot before taking a pose, and kinematic parameters 525 associated with each robot state 520. The task label 505 indicates the task performed by the robot. The captured images 510 are one or more images captured of the local area surrounding the robot. The object information 515 includes data regarding objects located in the local area surrounding the robot. The robot state 520 is an initial configuration of the robot before taking the pose corresponding to the pose of the subject. The kinematic parameters 525 are the kinematic parameters associated with the configuration of the robot taking the pose corresponding to the pose of the subject.
  • The imitation learning engine 150 receives as input the task label 505, the captured images 510, the object information 515, and the robot state 520 before each pose, and then, for each pose in the sequence of poses to complete the task, outputs a prediction of the kinematic parameters to achieve each pose or robot motion trajectory. The imitation learning engine 150 performs error detection 530 and compares the predicted kinematic parameters to the actual kinematic parameters for each pose or robot motion trajectory. Based on a calculated difference 535, the imitation learning engine 150 may adjust the coefficients of its machine learning model to reduce the detected error. The imitation learning engine 150 may perform the training process multiple times for one or more task examples that it receives.
  • FIG. 6 illustrates a schematic block diagram of an operational phase of the imitation learning engine 150, according to one embodiment. During the operational phase, the imitation learning engine 150 determines the configuration of a robot at several time steps such that when executed in sequence enable the robot to perform a task. The imitation learning engine 150 analyzes a current configuration of a robot (e.g., at time=t) to determine a configuration of a robot at a next time step (e.g., time=t+1). The imitation learning engine 150 may be executed for one or more remotely located robots. As illustrated in FIG. 6, the input data associated with time=t includes a task label 605 associated with the task to be performed by the robot, captured images 610, object information 615, a robot state 620, and kinematic parameters 625. The task label 605 indicates the task to be performed by the robot. The captured images 610 are one or more images captured of the local area surrounding the robot. The object information 615 includes data regarding objects located in the local area surrounding the robot (e.g., objects that the robot will interact with or will avoid). The robot state 620 is the configuration of the robot (e.g., at a current time step, time=t). The kinematic parameters 625 are the kinematic parameters associated with the configuration of the robot (e.g., at a current time step, time=t).
  • Based on the input data, the imitation learning engine 150 may output kinematic parameters 630, a robot state 635, and object information 635 for the robot at the subsequent time step. These kinematic parameters 630 may include x-, y-, and z-coordinates; roll, pitch, and yaw; and joint angles for each segment and joint of the robot. The robot state 635 represents the subsequent configuration of the robot. The object information 635 may change from the previous time-step, for example, if the robot interacted with any objects in its environment or if the position or orientation of the robot changed with respect to the objects. The imitation learning engine 150 may perform this process for the next time step (e.g., time=t+2) using the kinematic parameters 630, the robot state 635, and the object information 640. The imitation learning engine 150 may repeat this process for each subsequent time step, enabling the robot to accomplish the task associated with the task label 605.
  • The foregoing description of the embodiments of the disclosure has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above disclosure.
  • Some portions of this description describe the embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are used to convey the substance of the work effectively. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
  • Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
  • Embodiments of the disclosure may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

Claims (20)

1. A method for training comprising:
receiving, by one or more processors, teleoperator data corresponding to instructions for a robot to complete a first task;
receiving, by the one or more processors, sensor data corresponding to an environment surrounding the robot; and
training, by the one or more processors using the received teleoperator data and sensor data, a machine learning algorithm to predict future instructions for one or more robots to complete the first task.
2. The method of claim 1, further comprising:
capturing, by the robot, the sensor data; and
transmitting, by the robot, the sensor data to the teleoperator.
3. The method of claim 2, further comprising generating the teleoperator data using an operator system.
4. The method of claim 3, further comprising performing the instructions, by the robot, after receiving the generated teleoperator data from the operator system.
5. The method of claim 1, wherein the machine learning algorithm is a deep learning model and/or a neural network.
6. The method of claim 1, further comprising receiving an identification of the first task, wherein training the machine learning algorithm further includes training the machine learning algorithm to predict the future instructions using the received identification of the first task.
7. The method of claim 1, wherein the sensor data includes an image of the environment surrounding the robot.
8. The method of claim 1, wherein the sensor data comprises an image of an object to be manipulated by the robot to complete the first task, and wherein training the machine learning algorithm further includes training the machine learning algorithm to predict the future instructions using the image of the object.
9. The method of claim 1, further comprising:
receiving motion trajectory information corresponding to a sequence of movements performed by the robot to complete the first task in response to the instructions, wherein training the machine learning algorithm to predict the future instructions includes training the machine learning algorithm to predict a series of movements for the one or more robots to complete the first task.
10. The method of claim 1, wherein the sensor data includes an image of one or more objects to be manipulated by the robot and an environment surrounding the robot prior to completion of the first task.
11. The method of claim 1, wherein training the machine learning algorithm comprises:
calculating a difference between the predicted future instructions and the received instructions; and
adjusting, based on the difference, one or more coefficients of the machine learning algorithm to reduce the difference.
12. A system for training a machine learning algorithm comprising:
one or more computing devices storing an imitation learning engine, the imitation learning engine configured to:
receive teleoperator data corresponding to instructions performed by a robot to complete a first task;
receive sensor data corresponding to an environment surrounding the robot; and
train, using the received teleoperator data and sensor data, a machine learning algorithm to predict future instructions for one or more robots to complete the first task.
13. The system of claim 12, further comprising the robot, wherein the robot is configured to:
capture the sensor data; and
transmit the sensor data to an operator system.
14. The system of claim 13, wherein sensor data comprises an image including one or more objects to be manipulated by the robot to complete the first task.
15. The system of claim 13, further comprising the operator system, wherein the operator system is configured to generate the teleoperator data.
16. The system of claim 15, wherein the robot is further configured to perform the instructions after receiving the teleoperator data from the operator system.
17. The system of claim 12, wherein the machine learning algorithm is a deep learning model and/or a convolutional neural network.
18. The system of claim 12, wherein training the machine learning algorithm comprises:
calculating a difference between the predicted future instructions and the received instructions; and
adjusting, based on the difference, one or more coefficients of the machine learning algorithm to reduce the difference.
19. The system of claim 12, wherein the imitation learning engine is further configured to receive an identification of the first task and to train the machine learning algorithm to predict the future instructions using the received identification of the first task.
20. The system of claim 12, wherein the sensor data comprises an image of one or more objects to be manipulated by the robot to complete the first task, and wherein the imitation learning engine is configured to train the machine learning algorithm to predict the future instructions using the image of the one or more objects.
US17/146,885 2017-05-30 2021-01-12 Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose Abandoned US20210205986A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/146,885 US20210205986A1 (en) 2017-05-30 2021-01-12 Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201762512179P 2017-05-30 2017-05-30
US15/954,532 US10919152B1 (en) 2017-05-30 2018-04-16 Teleoperating of robots with tasks by mapping to human operator pose
US17/146,885 US20210205986A1 (en) 2017-05-30 2021-01-12 Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/954,532 Continuation US10919152B1 (en) 2017-05-30 2018-04-16 Teleoperating of robots with tasks by mapping to human operator pose

Publications (1)

Publication Number Publication Date
US20210205986A1 true US20210205986A1 (en) 2021-07-08

Family

ID=74570023

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/954,532 Active 2039-03-15 US10919152B1 (en) 2017-05-30 2018-04-16 Teleoperating of robots with tasks by mapping to human operator pose
US17/146,885 Abandoned US20210205986A1 (en) 2017-05-30 2021-01-12 Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/954,532 Active 2039-03-15 US10919152B1 (en) 2017-05-30 2018-04-16 Teleoperating of robots with tasks by mapping to human operator pose

Country Status (1)

Country Link
US (2) US10919152B1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218253A1 (en) * 2017-08-17 2020-07-09 Sri International Advanced control system with multiple control paradigms
US20200387150A1 (en) * 2016-10-12 2020-12-10 Sisu Devices Llc Robotic programming and motion control
US20210200190A1 (en) * 2019-12-30 2021-07-01 Ubtech Robotics Corp Ltd Action imitation method and robot and computer readable storage medium using the same
US20210362330A1 (en) * 2020-05-21 2021-11-25 X Development Llc Skill template distribution for robotic demonstration learning
US11285607B2 (en) * 2018-07-13 2022-03-29 Massachusetts Institute Of Technology Systems and methods for distributed training and management of AI-powered robots using teleoperation via virtual spaces
US11331806B2 (en) * 2019-12-26 2022-05-17 Ubtech Robotics Corp Ltd Robot control method and apparatus and robot using the same
US11402635B1 (en) * 2018-05-24 2022-08-02 Facebook Technologies, Llc Systems and methods for measuring visual refractive error
US11529733B2 (en) * 2019-10-15 2022-12-20 Hefei University Of Technology Method and system for robot action imitation learning in three-dimensional space
US11548147B2 (en) * 2017-09-20 2023-01-10 Alibaba Group Holding Limited Method and device for robot interactions
US20230252776A1 (en) * 2020-12-18 2023-08-10 Strong Force Vcn Portfolio 2019, Llc Variable-Focus Dynamic Vision for Robotic System
US20230278201A1 (en) * 2022-03-04 2023-09-07 Sanctuary Cognitive Systems Corporation Robots, tele-operation systems, computer program products, and methods of operating the same

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210118166A1 (en) * 2019-10-18 2021-04-22 Nvidia Corporation Pose determination using one or more neural networks
US11685052B2 (en) * 2018-09-18 2023-06-27 Kinova Inc. Vision guided robot arm and method for operating the same
US11769110B1 (en) * 2019-02-01 2023-09-26 Amazon Technologies, Inc. Systems and methods for operator motion management
US11804076B2 (en) 2019-10-02 2023-10-31 University Of Iowa Research Foundation System and method for the autonomous identification of physical abuse
KR102386009B1 (en) * 2020-07-30 2022-04-13 네이버랩스 주식회사 Method for learning robot task and robot system using the same
CN112975993B (en) * 2021-02-22 2022-11-25 北京国腾联信科技有限公司 Robot teaching method, device, storage medium and equipment
CN112936282B (en) * 2021-03-08 2022-01-07 常州刘国钧高等职业技术学校 Method and system for improving motion sensing control accuracy of industrial robot
CN113199469B (en) * 2021-03-23 2022-07-08 中国人民解放军63919部队 Space arm system, control method for space arm system, and storage medium
WO2023000119A1 (en) * 2021-07-17 2023-01-26 华为技术有限公司 Gesture recognition method and apparatus, system, and vehicle
CA3230947A1 (en) * 2021-09-08 2023-03-16 Patrick McKinley JARVIS Wearable robot data collection system with human-machine operation interface
CN113900516A (en) * 2021-09-27 2022-01-07 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device, electronic equipment and storage medium
US20230109398A1 (en) * 2021-10-06 2023-04-06 Giant.Ai, Inc. Expedited robot teach-through initialization from previously trained system
WO2024054797A1 (en) * 2022-09-07 2024-03-14 Tutor Intelligence, Inc. Visual robotic task configuration system
CN116824014A (en) * 2023-06-29 2023-09-29 北京百度网讯科技有限公司 Data generation method and device for avatar, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130211592A1 (en) * 2012-02-15 2013-08-15 Samsung Electronics Co., Ltd. Tele-operation system and control method thereof
US20150331415A1 (en) * 2014-05-16 2015-11-19 Microsoft Corporation Robotic task demonstration interface
US20160243701A1 (en) * 2015-02-23 2016-08-25 Kindred Systems Inc. Facilitating device control
US20160257000A1 (en) * 2015-03-04 2016-09-08 The Johns Hopkins University Robot control, training and collaboration in an immersive virtual reality environment

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6222465B1 (en) 1998-12-09 2001-04-24 Lucent Technologies Inc. Gesture-based computer interface
US7802193B1 (en) 2001-12-19 2010-09-21 Sandia Corporation Controlling motion using a human machine interface
EP1472052A2 (en) 2002-01-31 2004-11-03 Braintech Canada, Inc. Method and apparatus for single camera 3d vision guided robotics
US20030215130A1 (en) * 2002-02-12 2003-11-20 The University Of Tokyo Method of processing passive optical motion capture data
FR2839176A1 (en) * 2002-04-30 2003-10-31 Koninkl Philips Electronics Nv ROBOT ANIMATION SYSTEM COMPRISING A SET OF MOVING PARTS
US20060223637A1 (en) 2005-03-31 2006-10-05 Outland Research, Llc Video game system combining gaming simulation with remote robot control and remote robot feedback
JP4751192B2 (en) * 2005-12-12 2011-08-17 本田技研工業株式会社 Mobile robot
US8924021B2 (en) * 2006-04-27 2014-12-30 Honda Motor Co., Ltd. Control of robots from human motion descriptors
KR100995933B1 (en) * 2008-09-01 2010-11-22 한국과학기술연구원 A method for controlling motion of a robot based upon evolutionary computation and imitation learning
US8266536B2 (en) 2008-11-20 2012-09-11 Palo Alto Research Center Incorporated Physical-virtual environment interface
WO2010102288A2 (en) * 2009-03-06 2010-09-10 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for shader-lamps based physical avatars of real and virtual people
US20120041599A1 (en) 2010-08-11 2012-02-16 Townsend William T Teleoperator system with master controller device and multiple remote slave devices
US9162720B2 (en) * 2010-12-03 2015-10-20 Disney Enterprises, Inc. Robot action based on human demonstration
US8843236B2 (en) 2012-03-15 2014-09-23 GM Global Technology Operations LLC Method and system for training a robot using human-assisted task demonstration
WO2014093367A1 (en) * 2012-12-10 2014-06-19 Intuitive Surgical Operations, Inc. Collision avoidance during controlled movement of image capturing device and manipulatable device movable arms
US9579799B2 (en) 2014-04-30 2017-02-28 Coleman P. Parker Robotic control system using virtual reality input
US10500730B2 (en) 2015-09-04 2019-12-10 Kindred Systems Inc. Systems, devices, and methods for self-preservation of robotic apparatus
US9684305B2 (en) 2015-09-11 2017-06-20 Fuji Xerox Co., Ltd. System and method for mobile robot teleoperation
US11072067B2 (en) 2015-11-16 2021-07-27 Kindred Systems Inc. Systems, devices, and methods for distributed artificial neural network computation
US10471594B2 (en) 2015-12-01 2019-11-12 Kindred Systems Inc. Systems, devices, and methods for the distribution and collection of multimodal data associated with robots
US10180733B2 (en) 2015-12-22 2019-01-15 Kindred Systems Inc. Systems, devices, and methods for foot control of robots
US10434659B2 (en) 2016-03-02 2019-10-08 Kindred Systems Inc. Systems, devices, articles, and methods for user input
US10737377B2 (en) 2016-03-15 2020-08-11 Kindred Systems Inc. Systems, devices, articles, and methods for robots in workplaces

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130211592A1 (en) * 2012-02-15 2013-08-15 Samsung Electronics Co., Ltd. Tele-operation system and control method thereof
US20150331415A1 (en) * 2014-05-16 2015-11-19 Microsoft Corporation Robotic task demonstration interface
US20160243701A1 (en) * 2015-02-23 2016-08-25 Kindred Systems Inc. Facilitating device control
US20160257000A1 (en) * 2015-03-04 2016-09-08 The Johns Hopkins University Robot control, training and collaboration in an immersive virtual reality environment

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200387150A1 (en) * 2016-10-12 2020-12-10 Sisu Devices Llc Robotic programming and motion control
US11740624B2 (en) * 2017-08-17 2023-08-29 Sri International Advanced control system with multiple control paradigms
US20200218253A1 (en) * 2017-08-17 2020-07-09 Sri International Advanced control system with multiple control paradigms
US11548147B2 (en) * 2017-09-20 2023-01-10 Alibaba Group Holding Limited Method and device for robot interactions
US11402635B1 (en) * 2018-05-24 2022-08-02 Facebook Technologies, Llc Systems and methods for measuring visual refractive error
US11285607B2 (en) * 2018-07-13 2022-03-29 Massachusetts Institute Of Technology Systems and methods for distributed training and management of AI-powered robots using teleoperation via virtual spaces
US11931907B2 (en) 2018-07-13 2024-03-19 Massachusetts Institute Of Technology Systems and methods for distributed training and management of AI-powered robots using teleoperation via virtual spaces
US11529733B2 (en) * 2019-10-15 2022-12-20 Hefei University Of Technology Method and system for robot action imitation learning in three-dimensional space
US11331806B2 (en) * 2019-12-26 2022-05-17 Ubtech Robotics Corp Ltd Robot control method and apparatus and robot using the same
US20210200190A1 (en) * 2019-12-30 2021-07-01 Ubtech Robotics Corp Ltd Action imitation method and robot and computer readable storage medium using the same
US11940774B2 (en) * 2019-12-30 2024-03-26 Ubtech Robotics Corp Ltd Action imitation method and robot and computer readable storage medium using the same
US20210362330A1 (en) * 2020-05-21 2021-11-25 X Development Llc Skill template distribution for robotic demonstration learning
US11685047B2 (en) * 2020-05-21 2023-06-27 Intrinsic Innovation Llc Skill template distribution for robotic demonstration learning
US20230252776A1 (en) * 2020-12-18 2023-08-10 Strong Force Vcn Portfolio 2019, Llc Variable-Focus Dynamic Vision for Robotic System
US20230278201A1 (en) * 2022-03-04 2023-09-07 Sanctuary Cognitive Systems Corporation Robots, tele-operation systems, computer program products, and methods of operating the same

Also Published As

Publication number Publication date
US10919152B1 (en) 2021-02-16

Similar Documents

Publication Publication Date Title
US20210205986A1 (en) Teleoperating Of Robots With Tasks By Mapping To Human Operator Pose
Krupke et al. Comparison of multimodal heading and pointing gestures for co-located mixed reality human-robot interaction
CN114080583B (en) Visual teaching and repetitive movement manipulation system
Du et al. Markerless human–robot interface for dual robot manipulators using Kinect sensor
Fritsche et al. First-person tele-operation of a humanoid robot
CN107030692B (en) Manipulator teleoperation method and system based on perception enhancement
Delmerico et al. Spatial computing and intuitive interaction: Bringing mixed reality and robotics together
Li et al. Survey on mapping human hand motion to robotic hands for teleoperation
CN113829343B (en) Real-time multitasking and multi-man-machine interaction system based on environment perception
CN110914022A (en) System and method for direct teaching of robots
CN113103230A (en) Human-computer interaction system and method based on remote operation of treatment robot
US11422625B2 (en) Proxy controller suit with optional dual range kinematics
Ben Abdallah et al. Kinect-based sliding mode control for Lynxmotion robotic arm
Li et al. Neural learning and kalman filtering enhanced teaching by demonstration for a baxter robot
Shahverdi et al. A simple and fast geometric kinematic solution for imitation of human arms by a NAO humanoid robot
Kofman et al. Robot-manipulator teleoperation by markerless vision-based hand-arm tracking
Chen et al. A human–robot interface for mobile manipulator
Lambrecht et al. Markerless gesture-based motion control and programming of industrial robots
Galbraith et al. A neural network-based exploratory learning and motor planning system for co-robots
Nguyen et al. Merging physical and social interaction for effective human-robot collaboration
Ovur et al. Naturalistic robot-to-human bimanual handover in complex environments through multi-sensor fusion
Liu et al. Virtual reality based tactile sensing enhancements for bilateral teleoperation system with in-hand manipulation
Sugiyama et al. A wearable visuo-inertial interface for humanoid robot control
Wang et al. Robot programming by demonstration with a monocular RGB camera
Pan et al. Robot teaching system based on hand-robot contact state detection and motion intention recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: QOOWA, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KALOUCHE, SIMON;REEL/FRAME:054920/0078

Effective date: 20180405

AS Assignment

Owner name: NIMBLE ROBOTICS, INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:QOOWA, INC.;REEL/FRAME:055009/0907

Effective date: 20180726

STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION