US12515352B2 - Visuotactile operators for proximity sensing and contact control - Google Patents
Visuotactile operators for proximity sensing and contact controlInfo
- Publication number
- US12515352B2 US12515352B2 US18/103,825 US202318103825A US12515352B2 US 12515352 B2 US12515352 B2 US 12515352B2 US 202318103825 A US202318103825 A US 202318103825A US 12515352 B2 US12515352 B2 US 12515352B2
- Authority
- US
- United States
- Prior art keywords
- image sensor
- flow
- bounding box
- determining
- contact
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J13/00—Controls for manipulators
- B25J13/08—Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
- B25J13/081—Touching devices, e.g. pressure-sensitive
- B25J13/082—Grasping-force detectors
- B25J13/083—Grasping-force detectors fitted with slippage detectors
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J13/00—Controls for manipulators
- B25J13/08—Controls for manipulators by means of sensing devices, e.g. viewing or touching devices
- B25J13/081—Touching devices, e.g. pressure-sensitive
- B25J13/084—Tactile sensors
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
- B25J19/02—Sensing devices
- B25J19/021—Optical sensing devices
- B25J19/023—Optical sensing devices including video camera means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Program-controlled manipulators
- B25J9/16—Program controls
- B25J9/1612—Program controls characterised by the hand, wrist, grip control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Program-controlled manipulators
- B25J9/16—Program controls
- B25J9/1694—Program controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/39—Robotics, robotics to robotics hand
- G05B2219/39507—Control of slip motion
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/40—Robotics, robotics mapping to robotics vision
- G05B2219/40567—Purpose, workpiece slip sensing
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/40—Robotics, robotics mapping to robotics vision
- G05B2219/40575—Camera combined with tactile sensors, for 3-D
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/40—Robotics, robotics mapping to robotics vision
- G05B2219/40625—Tactile sensor
Definitions
- the disclosure relates to tactile and visual sensors, and more particularly to a semitransparent tactile sensor capable of tactile and visual sensing, and a method of sensing an interaction with an object using the semitransparent sensor.
- Robots may be useful in homes and factories by grasping objects.
- Current grasping solutions use visual-only feedback and do not integrate tactile sensing effectively.
- tactile sensor technologies and complex robotic hands robotic manipulation systems largely operate with open-loop visual sensing and do not effectively process tactile sensory information. This results in an inability of robotic systems to react to errors and grasp unknown objects.
- STS See-Through-your-Skin sensors
- object slippage detection methods in the related art do not estimate a magnitude and direction of slip, but return a binary slip value.
- visual and tactile signals may be extracted that are accurate and useful for robotic grasp control, thus enabling robots to sense a location of objects prior to contact and detect object slippage.
- the identifying the location of the object may include identifying a bounding box for the object in an image that is represented by the image sensor data.
- the identifying the bounding box may include predicting coordinates of the bounding box in the image.
- the identifying the bounding box may include identifying a centroid of the bounding box and determining a distance between the centroid and a center of the image sensor.
- the determining the slippage may include measuring a deformation of a surface of the image sensor when the image sensor is in contact with the object.
- the determining the slippage may include determining a marker flow and determining an object flow, and determining a slip field as a difference between the object flow and the marker flow.
- the determining the marker flow may include identifying movement of at least one marker, and wherein determining the object flow may include determining a motion of the object in relation to the image sensor.
- the method may further include combining the marker flow and the object flow using a convolutional neural network architecture.
- an electronic device for performing image authentication includes: at least memory storing instructions; and at least one processor configured to execute the instructions to: obtain, from an image sensor, image sensor data; identify, using the image sensor data, a location of an object; control a robotic element, which includes the image sensor, to move towards the location of the object; determine a slippage based on contact between the image sensor and the object; and control a movement of the robotic element based on the determined slippage.
- the at least one processor may be further configured to identify a bounding box for the object in an image that is represented by the image sensor data.
- the at least one processor is further configured to predict coordinates of the bounding box in the image.
- the at least one processor is further configured to identify a centroid of the bounding box and determine a distance between the centroid and a center of the image sensor.
- the at least one processor is further configured to measure a deformation of a surface of the image sensor when the image sensor is in contact with the object.
- the at least one processor is further configured to determine a marker flow and determine an object flow, and determine a slip field as a difference between the object flow and the marker flow.
- the at least one processor is further configured to identify movement of at least one marker, and wherein determining the object flow may include determining a motion of the object in relation to the image sensor.
- the at least one processor is further configured to combine the marker flow and the object flow using a convolutional neural network architecture.
- a non-transitory computer readable storage medium that stores instructions to be executed by at least one processor to perform a method for identifying and manipulating objects includes: obtaining, from an image sensor, image sensor data; identifying, using the image sensor data, a location of an object; controlling a robotic element, which includes the image sensor, to move towards the location of the object; determining a slippage based on contact between the image sensor and the object; and controlling a movement of the robotic element based on the determined slippage.
- the identifying the location of the object may include identifying a bounding box for the object in an image that is represented by the image sensor data.
- the identifying the bounding box may include predicting coordinates of the bounding box in the image.
- the identifying the bounding box may include identifying a centroid of the bounding box and determining a distance between the centroid and a center of the image sensor.
- FIG. 1 is a diagram illustrating a visuotactile sensor, according to an embodiment
- FIG. 2 A is a diagram illustrating object detection and localization, according to an embodiment
- FIG. 2 B is a diagram illustrating a structure of an encoder network and a decoder network according to an embodiment
- FIG. 3 is a block diagram illustrating a method of proximity sensing, according to an embodiment
- FIG. 4 is a diagram illustrating proximity sensing results, according to an embodiment
- FIG. 5 is a diagram illustrating a method for detecting an object, according to an embodiment
- FIGS. 6 A, 6 B, and 6 C illustrate examples of detecting slip, according to one or more embodiments
- FIG. 7 is a flowchart illustrating a method for detecting slip, according to an embodiment
- FIG. 8 is a diagram illustrating a method for estimating a time to contact, according to an embodiment
- FIG. 9 is a flowchart illustrating a method of determining a time to contacting an object, according to an embodiment
- FIG. 10 is a diagram illustrating a learned slip detector, according to an embodiment.
- FIG. 11 is a diagram illustrating a method of detecting slip flow using a learned slip flow detector, according to an embodiment
- component is intended to be broadly construed as hardware, firmware, or a combination of hardware and software.
- computer vision algorithms may enable robots using optical tactile sensors to better sense and react to interactions with an object in their environment.
- Algorithms provide a unified solution that may use a single sensor to detect the location and identify of an object as a robot hand approaches the object and to characterize the contact state (stick/slip) once the robot makes contact with the object.
- Embodiments are not limited to this.
- a computer vision-based algorithm estimates the distance and relative position of the object relative to the sensor, allowing a robot to reliably approach objects prior to contact.
- the disclosure presents visuotactile operators for proximity sensing and contact control, and algorithms that use the measurements from high-resolution optical tactile sensors. This enables a robot to sense an object prior to, during, and after contact has occurred.
- the disclosure describes algorithmic solutions that use a visuotactile sensor to estimate several object properties essential to robotic manipulation, including distance to contact and object slip. These algorithms enable a robot to approach an object with precision up to the moment of contact. During contact, the algorithm senses object slippage by detecting how tactile cues move relative to the sensor's contact surface. This capability provides robots with feedback to accomplish improved robotic dexterity, such as grasping of household objects, assembling of complex parts (e.g., electronics), and natural interaction with humans.
- FIG. 1 is a diagram illustrating an operation of a visuotactile sensor, according to an embodiment.
- a feature of the visuotactile sensor 101 is to obtain an identity and a position of an object relative to the sensor.
- An object detection neural network architecture may be trained to recognize and localize the target in the visual field of the STS sensor. This may be built on object detection networks (e.g., YoloV5), and weights may be fine-tuned on a limited number of hand labelled images. Object detection may be used to infer two or more object properties.
- object detection networks e.g., YoloV5
- weights may be fine-tuned on a limited number of hand labelled images.
- Object detection may be used to infer two or more object properties.
- Image 104 is an enlarged view of the image returned by sensor 101 .
- a bounding box 103 may be identified to determine the properties of object 102 (e.g., soda can).
- an object's bounding box 103 may allow for a determination of its centroid and relative position to the center of the sensor (e.g., dy, dz).
- An area of the bounding box may be used to infer the proximity of the object relative to the sensor (e.g., dx).
- a model may be trained by collecting a dataset of (dxi;Ai) pairs, obtained by running robot trajectories where the robot gripper approaches the target object.
- a method may predict the object location (x,y,z) in a three-dimension (3D) relative to the sensor using optical tactile sensors.
- the visual modality is used to capture the position (e.g., dx, dy, dz) of the object relative to the camera by segmenting the target object with a bounding box.
- FIG. 2 A is a diagram illustrating object detection and localization, according to an embodiment.
- a visuotactile sensor 201 provides an RGB image measurement 202 , and using encoder network 203 , feature map predictions 204 , and decoder network 205 .
- the network may be trained to return an output image 207 and predict a bounding box 208 and identify the manipulated object 209 .
- FIG. 2 B illustrates an example convolutional neural network architecture 300 of the encoder network 203 and the decoder network 205 .
- the convolutional neural network architecture 300 shown in FIG. 3 represents one possible implementation of the convolutional neural network used in the object detection and localization shown in FIG. 2 A .
- the convolutional neural network architecture 300 generally operates to receive an input image (e.g., the RGB image measurement 202 of the input image) and to produce the output image 207 including a bounding box 208 that identifies the detected object 209 .
- an input image e.g., the RGB image measurement 202 of the input image
- the output image 207 including a bounding box 208 that identifies the detected object 209 .
- the convolutional neural network architecture 300 may represent a type of deep artificial neural networks, which are often applied to analyze images.
- the convolutional neural network architecture 300 is formed using an encoder network 203 and a corresponding decoder network 205 .
- the encoder network 203 is formed using multiple encoder layers, which include multiple convolutional layers 310 a - 310 d and multiple pooling layers 312 a - 312 d .
- Each of the convolutional layers 310 a - 310 d represents a layer of convolutional neurons, which apply a convolution operation that emulates the response of individual neurons to visual stimuli.
- Each neuron typically applies some function to its input values (often by weighting different input values differently) to generate output values.
- Each of the pooling layers 312 a - 312 d represents a layer that combines the output values of neuron clusters from one convolutional layer into input values for the next layer.
- the encoder network 203 here is shown as including four encoder layers having four convolutional layers 310 a - 310 d and four pooling layers 312 a - 312 d , although the encoder network 203 could include different numbers of encoder layers, convolutional layers, and pooling layers.
- each of the convolutional layers 310 a - 310 d can perform convolution with a filter bank (containing filters or kernels) to produce a set of features maps.
- These feature maps can be batch normalized, and an element-wise rectified linear unit (ReLU) function can be applied to the normalized feature map values.
- the ReLU function typically operates to ensure that none of its output values is negative, such as by selecting (for each normalized feature map value) the greater of that value or zero.
- each of the pooling layers 312 a - 312 d can perform max-pooling with a window and a stride of two (non-overlapping window), and the resulting output is sub-sampled by a factor of two. Max-pooling can be used to achieve translation invariance over small spatial shifts in the input image patch. Sub-sampling results in a large input image context (spatial window) for each pixel in the feature maps.
- the decoder network 205 is formed using multiple decoder layers, which include multiple upsampling layers 314 a - 314 d and multiple convolutional layers 316 a - 316 d .
- Each of the upsampling layers 314 a - 314 d represents a layer that upsamples input feature maps.
- Each of the convolutional layers 316 a - 316 d represents a trainable convolutional layer that produces dense feature maps, which can be batch normalized.
- the decoder network 205 here is shown as including four decoder layers having four upsampling layers 314 a - 314 d and four convolutional layers 316 a - 316 d , although the decoder network 205 could include different numbers of decoder layers, upsampling layers, and convolutional layers.
- Each encoder layer in the encoder network 203 could have a corresponding decoder layer in the decoder network 205 , so there could be an equal number of layers in the encoder network 203 and in the decoder network 205 .
- a convolutional layer 318 processes the feature maps that are output by the decoder network 205 .
- the convolutional layer 318 could perform convolution operations to produce pixel-level blending map patches for the input image patches 302 independently. This allows, for instance, the convolutional layer 318 to convert the feature maps into the blending map patches 304 .
- the blending map patches 304 are dense per-pixel representations of pixel quality measurements involving information about motion degree and well-exposedness.
- the convolutional neural network architecture 300 operates as follows.
- the initial layers in the encoder network 203 are responsible for extracting scene contents and spatially down-sizing feature maps associated with the scene contents. This enables the effective aggregation of information over large areas of the input image.
- the later layers in the encoder network 203 learn to merge the feature maps.
- the layers of the decoder network 205 simulates coarse-to-fine reconstruction of the downsized representations by gradually upsampling the feature maps and translate the feature maps into blending maps. This allows for a more reliable recovery of the details lost by the encoder network 203 .
- the convolutional neural network architecture 300 shown in FIG. 2 A can be easily tailored for use in different applications.
- sizes of the input image can be varied and have any suitable values, such as 360 pixels by 480 pixels, 256 pixels by 256 pixels, or 200 pixels by 200 pixels.
- the kernel sizes within the convolutional layers 310 a - 310 d and 316 a - 316 b can be varied and have any suitable values, such as 7 ⁇ 7, 5 ⁇ 5, or 3 ⁇ 3.
- the stride used within the convolutional layers 310 a - 310 d and 316 a - 316 b can be varied and have any suitable values, such as one or two.
- the number of layers in the encoder network 203 and in the decoder network 205 can be varied and have any suitable values, such as between four and eight layers each. Any or all of these parameters of the convolutional neural network architecture 300 can be selected to optimally fit an application's requirements on performance and computational cost.
- convolutional neural network architecture 300 it may be possible to compress and accelerate the operation of the convolutional neural network architecture 300 for real-time applications in various ways. For example, parameter pruning and parameter sharing can be used to remove redundancy in the parameters. As another example, low-rank factorization can be used to estimate informative parameters in learning-based models. As a third example, convolutional filters' utilization can be transferred or compacted by designing special structural convolutional filters to reduce storage and computation complexity.
- FIG. 2 A illustrates one example of a convolutional neural network architecture 300
- various changes may be made to FIG. 2 A .
- the convolutional neural network architecture 300 here is only convolutional.
- Other implementations of convolutional neural network architectures can be used, such as designs modeled on the Unet, SegNet, FlowNet, and FlowNet2 architectures. These architectures support various connections between encoder layers and decoders layers (such as those described below), or the architectures can be improved through the use of dropout operations, regularization terms, or data augmentation to avoid overfitting.
- FIG. 3 is a block diagram illustrating a method of proximity sensing, according to an embodiment.
- a proximity of the object to the sensor (dx) is determined based on an area of a bounding box.
- the area of the bounding box may be mapped to the distance dx using a regression network (e.g., log linear model) with supervised learning, as illustrated in FIG. 4 and described below.
- a regression network e.g., log linear model
- FIG. 4 is a diagram illustrating proximity sensing results, according to an embodiment.
- Proximal contact control on a complex Bead Maze robotic manipulation task may be useful.
- a robot may locate and move beads (e.g., sun-shaped bead, cube-shaped bead, cloud-shaped bead, moon-shaped bead) along a curved path by combining proximal sensing and dense slip detection.
- Visual proximity sensing may be used to locate and control the robot's approach.
- dense slip measurements may be used to control a robot to move in a coordinated manner with the slip vectors. This enables a robot to follow the curved Bead Maze path, without any knowledge about the object or the trajectory's shape.
- FIG. 4 illustrates proximal sensing results, in which respective root means square errors may be, e.g., 2.5 mm for the sun-shaped bead, 4.5 mm for the cube-shaped bead, 2.0 mm for the cloud-shaped bead and 4.5 mm for the moon-shaped bead.
- respective root means square errors may be, e.g., 2.5 mm for the sun-shaped bead, 4.5 mm for the cube-shaped bead, 2.0 mm for the cloud-shaped bead and 4.5 mm for the moon-shaped bead.
- FIG. 5 illustrates a method for detecting an object, according to an embodiment.
- an STS image 501 is provided to an object detection network 502 .
- an object of interest e.g., targeted object
- the bounding box of the object may be used by a proximity sensing algorithm 505 (e.g., FIG. 3 ) to estimate a location (e.g., dx, dy, dz) relative to the sensor.
- the robot arm may be moved to find an object at operation 504 .
- the robot arm is moved toward the object.
- it is determined whether the object is within a desired tolerance to the sensor.
- the robot grasps the object and modulates the sensor to switch to tactile sensing.
- the sensor may detect a dense object slippage field when the object is in contact with the sensor.
- An algorithm may be used that provides dense pixel-wise slip measurements (e.g., pixel displacements) that describe how each section of the object moves relative to contact with the visuotactile sensor.
- a feature of the slip detection algorithm may be to track the object motion relative to the sensor's gel membrane.
- the quantity may not be observable from the raw tactile measurements, as the membrane's elastic deformation may not be known and may not be determinable if the object motion in the image occurs together with the membrane (e.g., sticking); and/or independently from the membrane (e.g., slippage).
- the motion of the membrane may be compared with the motion of a moving object behind the sensor (e.g., object motion) to characterize the nature of contact.
- object motion e.g., object motion
- An additional advantage of this method is that the difference between the object motion and the marker motion renders a dense pixel-wise description of the slip field, that can be used as a rich feedback signal for tactile manipulation policies.
- FIGS. 6 A, 6 B, and 6 C illustrate examples of detecting slip, according to one or more embodiments.
- Object slippage may be quantified by comparing an object's motion (e.g., object flow) with the sensor's gel motion (e.g., marker flow).
- Object flow may be used to quantify the motion of the object relative to the sensor's camera.
- Marker flow captures the motion of the gel membrane relative to the camera.
- the slip flow may identify how the object moves relative to the sensor membrane, and may be computed by pixel-wise subtraction of the marker flow field from the object flow field.
- FIG. 6 A An example of object flow, according to an embodiment, is illustrated in FIG. 6 A .
- Object flow may be used to quantify the motion of the object relative to the sensor's camera, while the marker flow captures the motion of the gel membrane relative to the camera.
- Object flow may be computed using the same optical flow method as for marker flow, but applied to the original tactile image frames using a window size that captures the spatial neighborhood of each marker. This provides a dense field of flow vectors at locations where an object is moving behind the sensor surface, i.e., object flow, as illustrated in FIG. 6 A .
- FIG. 6 B An example of marker flow, according to an embodiment, is illustrated in FIG. 6 B .
- the tracking of markers may allow for measurement of the deformation of the sensor surface when it is in contact with an object.
- a classic strategy for color based segmentation may be used, based on adaptive thresholding.
- Detected foreground regions may be filtered by area and a measure of circularity defined by area divided by perimeter squared.
- This preliminary image processing step may give reliable marker labels.
- a segmentation model e.g., Unet
- Marker segmentation results provided by a trained U-net on image frames are shown in e.g., FIG. 6 B .
- the displacements of the embedded markers may be captured by applying frame to frame optical flow to binary segmented marker images using an algorithm for measuring optic flows, e.g., Farneback algorithm. This may result in non-zero flow estimates at locations where markers are moving and also provides a local estimate of the direction of marker motion.
- the flow obtained at the marker location of the area of the sensor may be interpolated, as shown in FIG. 6 B .
- the slip field may be defined as a difference between the object flow and the marker flow.
- the difference between object flow and marker flow may be caused by the object moving independently relative to the membrane, suggesting slip.
- the above computations may be restricted to regions that are estimated to be in contact.
- An example of the object flow, marker flow, and their difference, a ‘slip’ flow field, is illustrated in FIGS. 6 A, 6 B, and 6 C .
- the slip field shows that the object (e.g., a Lego block) is slipping at the bottom right corner.
- the vectors describe the magnitude and direction of slippage for every region where the object is in contact, providing a richer representation than traditional binary slip detectors.
- FIG. 7 illustrates a flowchart of a method for detecting slip, according to an embodiment.
- the visuotactile sensor 701 returns a tactile image that is processed by two parallel detectors: object flow detector 702 and marker flow detector 703 .
- the marker flow detector 703 identifying how a membrane deforms, may be used to detect if contact is currently occurring at contact detector 706 .
- slip detector 704 using a slip detection algorithm, subtracts a marker flow from an object flow to return a dense slip field 705 , describing how the object moves relative to the sensor.
- FIG. 8 is a diagram illustrating a method for estimating a time to contact, according to an embodiment. Based on an identity of an object and an algorithm identifying a bounding box, an area of the bounding box may be extracted. A time to contact (e.g., collision) of the object with the sensor (e.g., tc) may be determined based on the area of the bounding box. The area of the bounding box may be mapped to the time estimate dt. Using the object detection module described in FIG. 2 A , the time to contact may be estimated according to FIG. 8 and equation (1) below. The bounding box provides an area, and the change in bounding box area is estimated over a number of frames.
- a time to contact e.g., collision
- the sensor e.g., tc
- the time to contact i.e., the estimating time at which the sensor will collide with an object may be estimated based on a number of image-space properties. For example, the time to contact may be estimated under perspective projection and reasonable assumptions. An object may be visualized on a rectangular area of width w and height h with a constant depth z0. Furthermore, the size of the object including the area of the object at contact is known. If the size and area of the object is known, the estimated area At of the target at time t, along with the known area of the target at contact A touch , an estimated time to contact may be equal to:
- FIG. 9 is a flowchart illustrating a method of determining a time to contacting an object, according to an embodiment.
- the visuotactile sensor returns a visual image 901 that is processed by an object detection network 902 . While an object of interest is not found ( 903 -N), the robot arm may continue to move ( 904 ) until the object of interest is found. Once the object targeted to be manipulated is found ( 903 -Y), a bounding box is used to detect the estimated time to contact at operation 905 . This measurement may be used to control the robot speed at operation 906 , for example, by moving the robot with a velocity that is proportional to the remaining time to collision.
- the robot grasps the object and modulates the sensor to switch to tactile sensing.
- FIG. 10 is a diagram illustrating a learned slip detector, according to an embodiment.
- a method includes detecting a dense object slippage field when an object is in contact with the sensor.
- a learning system may be used to combine the object flow and marker flow presented in FIGS. 6 A, 6 B, and 6 C , rather than using a subtraction operator.
- the object flow may be provided to an object encoder and the marker flow may be provided to a marker encoder.
- An output of the object encoder and the marker encoder may be provided to a convolutional neural network (CNN) (e.g., Unet).
- CNN convolutional neural network
- the CNN architecture takes as input the encoded object flow image and the encoded marker flow image, estimates a pixel-wise dense slip field using a slip decoder to generate a slip flow image, as shown in FIG. 10 .
- a difference operator may be learned using the CNN to generate a slip flow image.
- a learned slip detector may have improved identification of difficult-to-track object features on texture-less objects by, for example, focusing on a boundary of the boundaries rather than an entire contact region.
- FIG. 11 is a diagram illustrating a method of detecting slip flow using a learned slip flow detector, according to an embodiment.
- a visuotactile sensor returns a tactile image 1101 that is processed by two parallel detectors: object flow detector 1102 and marker flow detector 1103 .
- the marker flow detector 1103 identifying how the membrane deforms, is used to detect if contact is currently occurring at contact detector 1106 .
- learned slip detector 1104 uses a learned slip detection algorithm, takes as input the object flow and marker flow to return the dense slip field 1105 , identifying how the object moves relative to the sensor.
- the methods and features described above may be used to provide robots with sensing capabilities that will allow them to grasp objects with more dexterity and increased speed.
- the number of grasps per hour that a robot can perform may be useful for identifying an efficiency of a grasping system.
- the object detection and localization described above allows a robot to maintain a line of sight with the object (i.e., no occlusions) and estimate the distance to contact. This enables grasping systems to approach objects faster by reducing the uncertainty on the location of the object and the time to collision. This may have an impact on accelerating the deployment of robots in factories when robots must compete with the effectiveness of human pickers.
- Slip aware trajectory optimization once an object is grasped by the robot, the robot can exploit Feature #2 of the patent (slip detection) to move the object to its target location as fast as possible while avoiding object slippage.
- Feature #2 of the patent slip detection
- Accuracy and resolution of a slip detection algorithm may allow the robot to reduce the acceleration in the direction of the slip vectors to prevent grasping failures.
- Cables may be bundled together and must be disentangled during a grasping phase. As such, it may be useful to recognize an identity and location of a cable to be grasped dynamically, i.e. during the grasp phase.
- Object detection and localization enables fine grasping capabilities and may open opportunities for robotics and automation in factories operated by human workers.
- a dense slip detection algorithm provides a robot with valuable feedback on the geometry and location of the hole. This enables new sensing capabilities that may help robots accomplish tasks in a large factory.
- Robots may be used to integrate a number of environments such as restaurants or customer service, where they may be expected to handover objects to humans.
- robot waiters may be in a restaurant and tasked with setting and serving a table while interacting with humans.
- Object detection and localization may provide for visually detecting possible robot grasps by integrating information from within the fingertips.
- a human that hands over an object to a human may move an object in difficult to anticipate ways that may complicate grasp planning.
- object detection according to embodiments of the present disclosure, a relative distance between an object and a robot may be inferred and used to effectively retrieve the object.
- High resolution slip detection information may be used by a robot to determine a timing for when a robot should let go of the object as it is handing over objects to a human. For example, as a human secures a stable grasp on an object, the additional constraints on the object may cause object slippage relative to a robot gripper. By detecting a magnitude and direction of such slip vectors, a robot may determine when it is safe to let go of the object.
- component is intended to be broadly construed as hardware, firmware, or a combination of hardware and software.
- the expression, “at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Orthopedic Medicine & Surgery (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (15)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/103,825 US12515352B2 (en) | 2022-02-24 | 2023-01-31 | Visuotactile operators for proximity sensing and contact control |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263313711P | 2022-02-24 | 2022-02-24 | |
| US18/103,825 US12515352B2 (en) | 2022-02-24 | 2023-01-31 | Visuotactile operators for proximity sensing and contact control |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230264367A1 US20230264367A1 (en) | 2023-08-24 |
| US12515352B2 true US12515352B2 (en) | 2026-01-06 |
Family
ID=87573502
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/103,825 Active 2043-11-16 US12515352B2 (en) | 2022-02-24 | 2023-01-31 | Visuotactile operators for proximity sensing and contact control |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12515352B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250140026A1 (en) * | 2023-10-31 | 2025-05-01 | Toyota Motor Engineering & Manufacturing North America, Inc. | Speed profile generation for vehicle range estimation |
| CN119681901B (en) * | 2025-01-27 | 2025-10-21 | 大连理工大学 | A robot grasping posture optimization method based on vision-tactile fusion |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200306980A1 (en) * | 2019-03-25 | 2020-10-01 | Dishcraft Robotics, Inc. | Automated Manipulation Of Transparent Vessels |
-
2023
- 2023-01-31 US US18/103,825 patent/US12515352B2/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200306980A1 (en) * | 2019-03-25 | 2020-10-01 | Dishcraft Robotics, Inc. | Automated Manipulation Of Transparent Vessels |
Non-Patent Citations (24)
| Title |
|---|
| A. Yamaguchi and C. G. Atkeson, "Implementing tactile behaviors using FingerVision," 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Birmingham, UK, 2017, pp. 241-248 (Year: 2017). * |
| Dong, Siyuan, et al. "Maintaining grasps within slipping bounds by monitoring incipient slip." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019 (Year: 2019). * |
| Dong, Siyuan, Wenzhen Yuan, and Edward H. Adelson. "Improved gelsight tactile sensor for measuring geometry and slip." 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017 (Year: 2017). * |
| Hogan, Francois R., et al. "Tactile dexterity: Manipulation primitives with tactile feedback." 2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020 (Year: 2020). * |
| Hogan, Francois Robert, et al. "Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor." arXiv e-prints ( 2020): arXiv-2011 (Year: 2011). * |
| Hogan, Francois Robert, et al. "Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor." arXiv e-prints ( 2020): arXiv-2011. (Year: 2020). * |
| James, Jasper Wollaston, and Nathan F. Lepora. "Slip detection for grasp stabilization with a multifingered tactile robot hand." IEEE Transactions on Robotics 37.2 (2020): 506-519 (Year: 2020). * |
| Kehl, Wadim, et al. "Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again." Proceedings of the IEEE international conference on computer vision. 2017 (Year: 2017). * |
| Yamaguchi, Akihiko, and Christopher G. Atkeson. "Combining finger vision and optical tactile sensing: Reducing and handling errors while cutting vegetables." 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE, 2016 ( Year: 2016). * |
| Yamaguchi, Akihiko. "FingerVision with whiskers: Light touch detection with vision-based tactile sensors." 2021 Fifth IEEE International Conference on Robotic Computing (IRC). IEEE, 2021 (Year: 2021). * |
| Yuan, Wenzhen, et al. "Measurement of shear and slip with a GelSight tactile sensor." 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015 (Year: 2015). * |
| Zhang, Yazhan, et al. "Fingervision tactile sensor design and slip detection using convolutional Istm network." arXiv preprint arXiv: 1810.02653 (2018) (Year: 2018). * |
| A. Yamaguchi and C. G. Atkeson, "Implementing tactile behaviors using FingerVision," 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), Birmingham, UK, 2017, pp. 241-248 (Year: 2017). * |
| Dong, Siyuan, et al. "Maintaining grasps within slipping bounds by monitoring incipient slip." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019 (Year: 2019). * |
| Dong, Siyuan, Wenzhen Yuan, and Edward H. Adelson. "Improved gelsight tactile sensor for measuring geometry and slip." 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017 (Year: 2017). * |
| Hogan, Francois R., et al. "Tactile dexterity: Manipulation primitives with tactile feedback." 2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020 (Year: 2020). * |
| Hogan, Francois Robert, et al. "Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor." arXiv e-prints ( 2020): arXiv-2011 (Year: 2011). * |
| Hogan, Francois Robert, et al. "Seeing Through your Skin: Recognizing Objects with a Novel Visuotactile Sensor." arXiv e-prints ( 2020): arXiv-2011. (Year: 2020). * |
| James, Jasper Wollaston, and Nathan F. Lepora. "Slip detection for grasp stabilization with a multifingered tactile robot hand." IEEE Transactions on Robotics 37.2 (2020): 506-519 (Year: 2020). * |
| Kehl, Wadim, et al. "Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again." Proceedings of the IEEE international conference on computer vision. 2017 (Year: 2017). * |
| Yamaguchi, Akihiko, and Christopher G. Atkeson. "Combining finger vision and optical tactile sensing: Reducing and handling errors while cutting vegetables." 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE, 2016 ( Year: 2016). * |
| Yamaguchi, Akihiko. "FingerVision with whiskers: Light touch detection with vision-based tactile sensors." 2021 Fifth IEEE International Conference on Robotic Computing (IRC). IEEE, 2021 (Year: 2021). * |
| Yuan, Wenzhen, et al. "Measurement of shear and slip with a GelSight tactile sensor." 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015 (Year: 2015). * |
| Zhang, Yazhan, et al. "Fingervision tactile sensor design and slip detection using convolutional Istm network." arXiv preprint arXiv: 1810.02653 (2018) (Year: 2018). * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230264367A1 (en) | 2023-08-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11559885B2 (en) | Method and system for grasping an object | |
| JP7581320B2 (en) | Systems and methods for augmenting visual output from robotic devices | |
| US12236340B2 (en) | Computer-automated robot grasp depth estimation | |
| Calandra et al. | More than a feeling: Learning to grasp and regrasp using vision and touch | |
| Sunil et al. | Visuotactile affordances for cloth manipulation with local control | |
| Calandra et al. | The feeling of success: Does touch sensing help predict grasp outcomes? | |
| Romero et al. | Monocular real-time 3D articulated hand pose estimation | |
| Wu et al. | Pixel-attentive policy gradient for multi-fingered grasping in cluttered scenes | |
| Stria et al. | Garment perception and its folding using a dual-arm robot | |
| Danielczuk et al. | X-ray: Mechanical search for an occluded object by minimizing support of learned occupancy distributions | |
| Cretu et al. | Soft object deformation monitoring and learning for model-based robotic hand manipulation | |
| US12515352B2 (en) | Visuotactile operators for proximity sensing and contact control | |
| Park et al. | Self-training based augmented reality for robust 3D object registration and task assistance | |
| Chen et al. | Estimating fingertip forces, torques, and local curvatures from fingernail images | |
| EP1870210A1 (en) | Evaluating visual proto-objects for robot interaction | |
| US11922667B2 (en) | Object region identification device, object region identification method, and object region identification program | |
| Hosseini et al. | Improving the successful robotic grasp detection using convolutional neural networks | |
| Gao et al. | A real-time grasping detection network architecture for various grasping scenarios | |
| Qiu et al. | Robotic fabric flattening with wrinkle direction detection | |
| Wang et al. | GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB–XYZ fusion in dense clutter: W. Wang et al. | |
| Cao et al. | Fuzzy-depth objects grasping based on FSG algorithm and a soft robotic hand | |
| WO2023090274A1 (en) | Work recognition device, work recognition method, and work recognition program | |
| CN118809616B (en) | Motion planning method and device for robot, robot and storage medium | |
| Chen et al. | Robotic grasp control policy with target pre-detection based on deep Q-learning | |
| Yan et al. | Vision-touch fusion for predicting grasping stability using self attention and past visual images |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HOGAN, FRANCOIS;TREMBLAY, JEAN-FRANCOIS;BAGHI, BOBAK HAMED;AND OTHERS;SIGNING DATES FROM 20230117 TO 20230123;REEL/FRAME:062550/0651 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |