CN112288165A

CN112288165A - Method and computing system for performing container detection and object detection

Info

Publication number: CN112288165A
Application number: CN202011186066.6A
Authority: CN
Inventors: 叶旭涛; 村瀬和都
Original assignee: Mu Jing Co Ltd
Current assignee: Mu Jing Co Ltd
Priority date: 2020-03-05
Filing date: 2020-09-29
Publication date: 2021-01-29
Anticipated expiration: 2040-09-29
Also published as: CN112288165B

Abstract

Methods and computing systems for performing container detection and object detection are disclosed. A system and method for performing object detection is presented. The system receives spatial structure information associated with an object that is or has been in a camera field of view of a spatial structure sensing camera. The spatial structure information is generated by a spatial structure sensing camera and includes depth information of an environment in a field of view of the camera. The system determines a container pose based on the spatial structure information, wherein the container pose is used to describe at least one of an orientation of the container or a depth value of at least a portion of the container. The system also determines an object pose based on the container pose, where the object pose is used to describe at least one of an orientation of the object or a depth value of at least a portion of the object.

Description

Method and computing system for performing container detection and object detection

The present application is a divisional application of patent application 202011045515.5 entitled "method and computing system for performing container detection and object detection" filed on 29/9/2020.

Cross reference to related applications

This application claims the benefit of U.S. provisional application No. 62/985,336 entitled "a rolling SYSTEM WITH OBJECT recording MECHANISM," filed on 5/3/2020, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates to computing systems and methods for container detection and object detection.

Background

As automation becomes more prevalent, robots are being used in more environments, such as warehousing and retail environments. For example, robots may be used to interact with goods or other objects in a warehouse. The movement of the robot may be fixed or may be based on input (such as information generated by sensors in the warehouse).

Disclosure of Invention

One aspect of the present disclosure relates to a computing system, method, and/or non-transitory computer-readable medium having instructions for performing object detection. The computing system may include a communication interface configured to communicate with a robot having a robotic arm with a spatial structure sensing camera disposed on the robotic arm, wherein the spatial structure sensing camera has a camera field of view. The at least one processing circuit may be configured to perform the method when an object within the container is or has been in the camera field of view while the container is in the open position. The method may involve receiving spatial structure information including depth information of an environment in a field of view of a camera, wherein the spatial structure information is generated by a spatial structure sensing camera; and determining a container pose based on the spatial structure information, wherein the container pose is used to describe at least one of an orientation of the container or a depth value of at least a portion of the container. The method may also involve determining an object pose based on the container pose, wherein the object pose is used to describe at least one of an orientation of the object or a depth value of at least a portion of the object; and outputting a movement command for causing the robot to interact with the object, wherein the movement command is generated based on the object pose.

Drawings

Fig. 1A-1E illustrate a computing system configured to receive and process spatial structure information and/or sensed object identifier information consistent with embodiments of the invention.

Fig. 2A-2C provide block diagrams illustrating computing systems configured to receive and process spatial structure information and/or sensed object identifier information consistent with embodiments of the invention.

Fig. 3A-3D illustrate an environment having multiple containers (e.g., drawers) and a robot for interacting with the containers based on spatial structure information generated by a spatial structure sensing camera, according to an embodiment of the invention.

FIG. 4 provides a flow chart illustrating a method of determining information about objects deployed within a container, according to an embodiment of the invention.

Fig. 5A to 5C illustrate a container and an object within the container according to an embodiment of the present invention.

Fig. 6A-6C illustrate spatial structure information describing a container or an object disposed within a container, according to an embodiment of the invention.

FIG. 6D illustrates a container and an object disposed within the container according to an embodiment of the invention.

FIG. 6E illustrates spatial structure information describing a container or an object disposed within a container, according to an embodiment of the invention.

Fig. 6F illustrates the relationship between the container surface and the container rim (rim) according to an embodiment of the present invention.

FIG. 7A depicts an environment having both a spatial structure sensing camera and an object identifier sensing device (or more specifically, a barcode sensing device) in accordance with an embodiment of the present invention.

Fig. 7B and 7C illustrate determining barcode positions being used to determine one or more object positions in accordance with an embodiment of the present invention.

FIG. 8 illustrates a bar code adjacent to an object being used to determine the location of the object in accordance with an embodiment of the present invention.

Fig. 9 illustrates spatial structure information and/or sensed object identifier information (or more particularly sensed barcode information) covering only a portion of a container according to an embodiment of the invention.

FIG. 10 illustrates a segmentation of a container associating different regions with different segments (segments), according to an embodiment of the invention.

Fig. 11A-11C illustrate a container being moved from a closed position to an open position according to an embodiment of the present invention.

Detailed Description

One aspect of the present disclosure relates to facilitating robot interaction with the contents of a drawer or other container, such as merchandise or any other object disposed within the container (the term "or" may be used herein to refer to "and/or"). Robotic interaction may include, for example, a robotic arm grasping or otherwise picking up an object disposed within a container. The robotic interaction may occur, for example, at a warehouse, a retail space, or any other environment. In some cases, facilitating robotic interaction involves determining a pose of an object within the container, where the pose may refer to at least one of an orientation or a depth of the object relative to a camera or some other reference point such that the robot arm may be moved appropriately to retrieve or otherwise pick up the object.

Various embodiments relate to determining a pose of an object (also referred to as an object pose) by performing open container detection (open container detection), wherein information about an open container is determined, wherein the object may be deployed within the container. These embodiments may provide a way to determine the pose of an object that is more robust and tolerant to imaging noise or other sources of measurement error, for example. Imaging noise can affect, for example, a point cloud or other spatial structure information used to measure an object. Measurement errors introduced into the point cloud may result in erroneous determinations of, for example, orientation and/or depth of the object. In some cases, errors of even a few millimeters or degrees may affect the robot interaction, which in some cases may rely on millimeter or better accuracy with respect to determining the relative position between the manipulator and the object. Because measurement errors of an object may prevent or hinder such accuracy, an aspect of the present disclosure relates to using measurements about a container in which the object is disposed, and using such measurements to infer or otherwise determine pose or other information about the object.

In some cases, imaging noise may also affect direct measurements of a portion of the container, such as the surface on which the object is disposed (the surface is also referred to as the container surface). One aspect of the present disclosure relates to compensating for measurement errors affecting the surface of a container by measuring another portion of the container, such as the rim of the container (also referred to as the container rim). In these cases, the receptacle rim may occupy less space affected by imaging noise and, therefore, may produce more reliable or trustworthy measurements. Measurements about the container rim may be used to infer or otherwise determine pose or other information about the container surface. Such a determination may be based on, for example, a known spacing separating the container surface and the container rim.

In some cases, measurements made with respect to the container may be used to perform a motion plan, such as a motion plan involved in retrieving objects from the container. For example, measurements about the container rim may provide information about where the side walls of the container are located. When the robot retrieves the object from the container, the object movement path may be planned to avoid collisions between the sidewall of the container and the robot or the object. In some cases, measurements on the container may be used to virtually divide the container into different segments, as discussed in more detail below.

In some cases, facilitating the robot's interaction with the object may rely on using information about the object identifier (if any) deployed on the object. The object identifier may include visual indicia such as a bar code, logo or symbol (e.g., alphanumeric symbols), or other visual pattern identifying the object. In some cases, the object identifier may be printed on a surface of the object. In some cases, the object identifier may be printed on a sticker or other layer of material that is affixed to or otherwise disposed on the object. If the object is a box containing one or more items, the object identifier may identify the one or more items, or more generally, the contents of the box. The information about the object identifier used to facilitate the robotic interaction may include, for example, the location of the object identifier (also referred to as the object identifier location) or information encoded to the object identifier, such as information encoded to a barcode. In some cases, the object identifier location may be used to narrow down the portion of the point cloud or other spatial structure information that should be searched to detect a particular object. For example, if the object identifier is a barcode location, such a search may be limited to a portion of the point cloud that corresponds to an area surrounding the barcode location. Such an embodiment may facilitate a more focused and efficient search for objects. In some cases, if the object size is encoded into a barcode or other object identifier, this information may be used to search for the object from a point cloud or other spatial structure information, or to plan how the robot may grab or otherwise interact with the robot.

Fig. 1A shows a system 100 for processing spatial structure information for object detection, as discussed in more detail below. In the embodiment of fig. 1A, system 100 may include a computing system 101 and a spatial structure sensing camera 151 (also referred to as spatial structure sensing device 151). In this example, spatial structure sensing camera 151 may be configured to generate spatial structure information (also referred to as spatial information or spatial structure data) that includes depth information about the environment in which spatial structure sensing camera 151 is located, or more specifically, about the environment in the field of view of camera 151 (also referred to as camera field of view). The computing system 101 in fig. 1A may be configured to receive and process spatial structure information. For example, the computing system 101 may be configured to use the depth information in the spatial structure information to distinguish between different structures in the camera field of view or, more generally, to identify one or more structures in the camera field of view. In this example, the depth information may be used to determine an estimate of how the one or more structures are spatially arranged in a three-dimensional (3D) space.

In one example, spatial structure sensing camera 151 may be located in a warehouse, retail space (e.g., a store), or other location. In such examples, the warehouse or retail space may include various merchandise or other objects. The spatial structure sensing camera 151 may be used to sense information about objects and/or about structures containing objects, such as drawers or other types of containers. As described above, the spatial structure sensing camera 151 may be configured to generate spatial structure information, which may describe, for example, how the structure of a piece of merchandise and/or the structure of a container are arranged in a 3D space. In such examples, computing system 101 may be configured to receive and process spatial structure information from spatial structure sensing camera 151. Computing system 101 may be located at the same site or may be remotely located. For example, computing system 101 may be part of a cloud computing platform hosted in a data center remote from a warehouse or retail space, and may communicate with spatial structure sensing camera 151 via a network connection.

In an embodiment, system 100 may be a robotic manipulation system for interacting with various objects in the environment of spatial structure sensing camera 151. For example, fig. 1B illustrates a robot operating system 100A, which may be an embodiment of the system 100 of fig. 1A. The robot operating system 100A may include a computing system 101, a spatial structure sensing camera 151, and a robot 161. In an embodiment, robot 161 may be used to interact with one or more objects in the environment of spatial structure sensing camera 151, such as interacting with merchandise or other objects in a warehouse. For example, the robot 161 may be configured to pick items from a drawer or other container and move the items from the container to another location (e.g., a conveyor belt outside the drawer).

In an embodiment, the computing system 101 of fig. 1A and 1B may form or be part of a robot control system (also referred to as a robot controller), which is part of the robot operating system 100A. The robot control system may be a system configured to generate movement commands or other commands, for example, for robot 161. In such embodiments, computing system 101 may be configured to generate such commands based on spatial structure information generated, for example, by spatial structure sensing camera 151. In embodiments, the computing system 101 may form or be part of a vision system. The vision system may be a system that generates, for example, visual information describing the environment in which the robot 161 is located, or more specifically, the environment in which the spatial structure sensing camera 151 is located. The visual information may comprise spatial structure information, which may also be referred to as 3D information or 3D imaging information, as it may indicate how the structures are laid out or otherwise arranged in 3D space. In some cases, the robot 161 may include a robotic arm having a manipulator or other end effector forming one end of the robotic arm, and the spatial structure information may be used by the computing system 101 to control placement of the robotic arm. In some cases, if computing system 101 forms a vision system, the vision system may be part of the robotic control system described above, or may be separate from the robotic control system. If the vision system is separate from the robot control system, the vision system may be configured to output information about the environment in which the robot 161 is located. The robot control system in such an example may receive such information and control the movement of the robot 161 based on the information.

In an embodiment, the system 100 may include an object identifier sensing device 152, such as a barcode sensing device (also referred to as a barcode reader). More specifically, fig. 1C depicts a system 100B (which is an embodiment of system 100/100 a), the system 100B including a computing system 101, a spatial structure sensing camera 151, a robot 161, and further including an object identifier sensing device 152. In some cases, the object identifier sensing device 152 may be configured to detect an object identifier disposed on or near the object. As described above, the object identifier may be a visual indicia identifying the object. If the object is a box or other object for containing merchandise or some other item, in an embodiment, the object identifier may identify the item or other content of the box. As also described above, in some examples, the object identifier may be a barcode. In some cases, the barcode may have a spatial pattern, such as a series of dark stripes or a dark square array (e.g., a QR code), or any other barcode in the field of view of the object identifier sensing device 152 (e.g., a barcode sensing device). For example, a barcode may be deployed on a piece of merchandise or other object in a warehouse. The object identifier sensing device 152 may be configured to sense information about the object identifier. This information (which may also be referred to as sensed object identifier information) may include information encoded into the object identifier, the location of the object identifier (also referred to as the object identifier location), or any other information about the object identifier. If the object identifier is a bar code, the information encoded into the bar code may include, for example, a Stock Keeping Unit (SKU) code or a Universal Product Code (UPC).

In embodiments, object identifier sensing device 152 and/or spatial structure sensing camera 151 may be attached to a fixed mounting point, such as a fixed mounting point within a warehouse or retail space. In an embodiment, the spatial structure sensing camera 151 and/or the object identifier sensing device 152 may be attached to a robotic arm of the robot 161. In a more specific example, the object identifier sensing device 152 and/or the spatial structure sensing camera 151 may be attached to or disposed on (or disposed near) a robotic arm or other end effector forming one end of the robotic arm. In such examples, object identifier sensing device 152 and spatial structure sensing camera 151 may be referred to as an on-hand object identifier sensing device (e.g., an on-hand barcode reader) and an on-hand spatial structure sensing camera, respectively. In some cases, the computing system 101 may be configured to control the robot 161 to move the on-hand spatial structure sensing camera and/or the on-hand object identifier sensing device to an optimal position for sensing the environment of the robot 161, as discussed in more detail below.

In an embodiment, if the computing system 101 is part of a robot control system, the computing system 101 may be configured to generate one or more movement commands for controlling movement of the robot 161, as discussed in more detail below. These movement commands may include, for example, object movement commands, sensor movement commands, and container movement commands. The sensor movement commands may be used to move the spatial structure sensing camera 151 and/or the object identifier sensing device 152. The container movement command may be used to move a container containing goods or other objects, such as a movement command to open or close the container. The object movement commands may be used to move goods or other objects in a warehouse or other location, or more specifically, objects disposed in a container.

In an embodiment, the components of system 100 may be configured to communicate via a network and/or a storage device. More specifically, fig. 1D depicts a system 100C, the system 100C being an embodiment of the system 100/100a/100B of fig. 1A-1C. System 100C includes computing system 101, spatial structure sensing camera 151, robot 161, object identifier sensing device 152, and further includes network 199 and data storage device 198 (or any other type of non-transitory computer-readable medium) separate from computing system 101. In some cases, storage device 198 may be configured to store information generated by object identifier sensing device 152, spatial structure sensing camera 151, and/or robot 161 and make the stored information available to computing system 101. In such examples, computing system 101 may be configured to access the stored information by retrieving (or more generally, receiving) the information from data storage 198.

In FIG. 1D, storage 198 may include any type of non-transitory computer-readable medium (or media), which may also be referred to as a non-transitory computer-readable storage device. Such non-transitory computer-readable media or storage devices may be configured to store and provide access to stored information (also referred to as stored data). Examples of a non-transitory computer-readable medium or storage device may include, but are not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination thereof, such as, for example, a computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a solid state drive, a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), and/or a memory stick.

In embodiments, network 199 may facilitate communication between computing system 101, spatial structure sensing camera 151, object identifier sensing device 152, and/or robot 161. For example, computing system 101 and/or storage 198 may receive information (e.g., spatial structure information or sensed object identifier information) generated by spatial structure sensing camera 151 and/or object identifier sensing device 152 via network 199. In such an example, the computing system 101 may be configured to provide one or more commands (e.g., movement commands) to the robot 161 via the network 199. Consistent with embodiments herein, network 199 may provide a single network connection or a series of network connections to allow computing system 101 to receive information and/or output commands.

In fig. 1D, network 199 may be connected via a wired or wireless link. The wired link may include a Digital Subscriber Line (DSL), coaxial cable, or fiber optic line. The wireless link may include

Bluetooth Low Energy (BLE), ANT/ANT +, ZigBee, Z-Wave, Thread, Bluetooth ™ B,

Worldwide interoperability for microwave Access: (

) And move

-Advanced、NFC, SigFox, LoRa, Random Phase Multiple Access (RPMA), weight less-N/P/W, infrared channel, or satellite frequency band. The wireless link may also include any cellular network standard for communication between mobile devices, including standards compliant with 2G, 3G, 4G, or 5G. The wireless standard may use various channel access methods, such as FDMA, TDMA, CDMA, OFDM, or SDMA. In some embodiments, different types of information may be sent via different links and standards. In other embodiments, the same type of information may be sent via different links and standards. Network communications may be via any suitable protocol, including, for example, http, tcp/ip, udp, ethernet, ATM, etc.

Network 199 may be any type and/or form of network. The geographic extent of the network may vary widely, and network 199 may be a Body Area Network (BAN), a Personal Area Network (PAN), a Local Area Network (LAN) (e.g., intranet, Metropolitan Area Network (MAN)), a Wide Area Network (WAN), or the internet. The topology of network 199 may be of any form, and may include, for example, any of the following: point-to-point, bus, star, ring, mesh, or tree. Network 199 may be any such network topology known to one of ordinary skill in the art capable of supporting the operations described herein. The network 199 may utilize different technologies and protocol layers or protocol stacks including, for example, an ethernet protocol, internet protocol suite (TCP/IP), ATM (asynchronous transfer mode) technology, SONET (synchronous optical network) protocol, or SDH (synchronous digital hierarchy) protocol. The TCP/IP internet protocol suite may include an application layer, a transport layer, an internet layer (including, for example, IPv4 and IPv4), or a link layer. Network 199 may be a broadcast network, a telecommunications network, a data communications network, or a computer network.

In embodiments, computing system 101, spatial structure sensing camera 151, object identifier sensing device 152, and/or robot 161 may be able to communicate with each other via a direct connection rather than a network connection. For example, in such embodiments, the computing system 101 may be configured to receive information from the spatial structure sensing camera 151 and/or the object identifier sensing device 152 via a dedicated wired communication interface, such as an RS-232 interface, a Universal Serial Bus (USB) interface, and/or via a local computer bus, such as a Peripheral Component Interconnect (PCI) bus.

In embodiments, the spatial structure information generated by spatial structure sensing camera 151 may refer to any type of information that describes how structures are laid out or otherwise deployed in a space, such as a three-dimensional (3D) space. More specifically, the spatial structure information may describe a 3D layout of the structure, or a 3D pose or deployment of the structure in a 3D space. The structure may belong to a container in the environment or field of view of the spatial structure sensing camera 151, for example, or to an object disposed within the container. In some cases, the spatial structure information may indicate how the structure is oriented in 3D space. In some cases, the spatial structure information may include depth information indicating one or more respective depth values for one or more locations on the structure, or more particularly on one or more surfaces of the structure. The depth value for a particular location may be relative to the spatial structure sensing camera 151, or relative to some other frame of reference (e.g., a ceiling or wall of a warehouse or retail space). In some cases, the depth values may be measured along an axis orthogonal to an imaginary plane in which the spatial structure sensing camera 151 is located. For example, if the spatial structure sensing camera 151 has an image sensor, the imaginary plane may be an image plane defined by the image sensor. In an embodiment, the spatial structure information may be used to determine an outline of the structure, or more generally, a boundary of the structure. The contour may be, for example, a contour of the container or a portion of the container or an object in the container. For example, spatial structure information may be used to detect one or more locations where there is a significant discontinuity in depth values, where such locations may indicate a boundary (e.g., edge) of a structure. In some cases, the boundaries of the structure may be used to determine its shape or size.

In some cases, the spatial structure information may include or form a depth map (depth map). The depth map may be a bitmap (bitmap) having a plurality of pixels that represent or otherwise correspond to various locations in the camera field of view, such as locations on one or more structures in the camera field of view. In such a case, some or all of the pixels may each have a respective depth value that indicates a depth represented by or otherwise corresponding to the respective location of the pixel. In some cases, the depth map may include 2D image information that describes the 2D appearance of one or more structures in the camera field of view. For example, the depth map may comprise a 2D image. In such an example, each of the pixels of the depth map may also include a color intensity value or a grayscale intensity value that indicates an amount of visible light reflected from a location represented by or otherwise corresponding to the pixel.

In embodiments, the spatial structure information may be or include a point cloud. The point cloud may identify a plurality of locations that describe one or more structures, such as the structure of the container and/or the structure of objects in the container. In some cases, the plurality of points may be respective locations on one or more surfaces of one or more structures. In some cases, the point cloud may include a plurality of coordinates (e.g., 3D coordinates) that identify or otherwise describe the plurality of points. For example, the point cloud may include a series of cartesian or polar coordinates (or other data values) that specify respective locations or other features of the one or more structures. The respective coordinates may be expressed with respect to a frame of reference (e.g., a coordinate system) of the spatial structure sensing camera 151 or with respect to some other frame of reference. In some cases, the respective coordinates are discrete and spaced apart from each other, but may be understood as one or more continuous surfaces representing one or more structures. In embodiments, the point cloud may be generated (e.g., by computing system 101) from a depth map or other information.

In some embodiments, the spatial structure information may also be stored according to any suitable format, such as polygonal or triangular mesh models, non-uniform rational basis spline models, CAD models, parameterization of primitives (e.g., rectangles may be defined according to center and spread in the x, y, and z directions, cylinders may be defined by center, height, upper and lower radii, etc.).

As described above, spatial structure information is captured or otherwise generated via spatial structure sensing camera 151. In an embodiment, the spatial structure sensing camera 151 may be or comprise a 3D camera or any other 3D image sensing device. The 3D camera may be a depth sensing camera, such as a time of flight (TOF) camera or a structured light camera, or any other type of 3D camera. In some cases, the 3D camera may include an image sensor, such as a Charge Coupled Device (CCD) sensor and/or a Complementary Metal Oxide Semiconductor (CMOS) sensor. In an embodiment, the spatial structure sensing camera 151 may include a laser, a LIDAR device, an infrared device, a light/dark sensor, a motion sensor, a microwave detector, an ultrasound detector, a RADAR detector, or any other device configured to capture spatial structure information.

As described above, the object identifier sensing device 152 may be configured to sense an object identifier and generate sensed object identifier information, such as sensed barcode information describing a barcode. The sensed object identifier information may describe, for example, the location of the object identifier (e.g., barcode location), information encoded to the object identifier, or some other object identifier information. If the object identifier sensing device 152 is a barcode sensing device, the barcode sensing device may, in some cases, include a laser or photodiode configured to emit light or other signals toward an area of the barcode (such as the area occupied by dark stripes or dark squares of the barcode), and may include a sensor configured to measure the amount of light or other signals reflected from the area. In some cases, as shown in fig. 1E, the object identifier sensing device 152 may include a 2D camera 153. The 2D camera 153 may include, for example, a grayscale camera or a color camera. The 2D camera 153 may be configured to capture or otherwise generate 2D imaging information that describes or otherwise represents a visual appearance of the environment in the field of view of the 2D camera 153, including the appearance of a barcode or any other object identifier (if any) on an object in the field of view. Such a 2D camera 153 may include, for example, an image sensor, such as a Charge Coupled Device (CCD) sensor and/or a Complementary Metal Oxide Semiconductor (CMOS) sensor. In some cases, the 2D image information may include a plurality of pixels forming a 2D image. Each pixel of the 2D image information may represent, for example, an intensity or other property of light reflected from a location corresponding to the pixel. In some cases, the 2D camera 153 may include processing circuitry configured to detect a barcode or other object identifier within the 2D image and generate sensed object identifier information based on the object identifier. In some cases, if the spatial structure information includes a depth map with 2D image information, the 2D image information may be generated by the 2D camera 153.

In an embodiment, the spatial structure sensing camera 151 and the object identifier sensing device 152 may be integrated into a single device. For example, they may be surrounded by a single housing and may have a fixed relative position and relative orientation. In some cases, they may share a single communication interface and/or a single power supply. In an embodiment, the spatial structure sensing camera 151 and the object identifier sensing device 152 may be two separate devices that are mounted or otherwise attached to the robot 161, for example to a robotic arm of the robot 161, as discussed in more detail below.

As described above, the spatial structure information and/or the sensed object identifier information may be processed by the computing system 101. In embodiments, the computing system 101 may include or be configured as a server (e.g., having one or more server blades, processors, etc.), a personal computer (e.g., a desktop computer, a laptop computer, etc.), a smartphone, a tablet computing device, and/or any other computing system. In embodiments, any or all of the functionality of computing system 101 may be performed as part of a cloud computing platform. Computing system 101 may be a single computing device (e.g., a desktop computer), or may include multiple computing devices.

Fig. 2A provides a block diagram illustrating an embodiment of computing system 101. The computing system 101 includes at least one processing circuit 110 and a non-transitory computer-readable medium (or media) 120. In an embodiment, the processing circuitry 110 includes one or more processors, one or more processing cores, a programmable logic controller ("PLC"), an application specific integrated circuit ("ASIC"), a programmable gate array ("PGA"), a field programmable gate array ("FPGA"), any combination thereof, or any other processing circuitry. In an embodiment, the non-transitory computer-readable medium 120 may be a storage device, such as an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof, such as, for example, a computer disk, a hard disk, a Solid State Drive (SSD), a Random Access Memory (RAM), a Read Only Memory (ROM), an erasable programmable read only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, any combination thereof, or any other storage device. In some cases, a non-transitory computer-readable medium may include multiple storage devices. In some cases, the non-transitory computer-readable medium 120 is configured to store spatial structure information generated by the spatial structure sensing camera 151 and/or sensed object identifier information generated by the object identifier sensing device 152. In some cases, the non-transitory computer-readable medium 120 also stores computer-readable program instructions that, when executed by the processing circuit 110, cause the processing circuit 110 to perform one or more methods (methodologies) described herein, such as the operations described with reference to fig. 4.

Fig. 2B depicts a computing system 101A, the computing system 101A being an embodiment of the computing system 101 and including a communication interface 130. The communication interface 130 may be configured to receive spatial structure information generated by the spatial structure sensing camera 151 and/or sensed object identifier information (e.g., sensed barcode information) generated by the object identifier sensing device 152, for example (such as via the storage device 198 and/or network 199 of fig. 1D, or via a more direct connection from the spatial structure sensing camera 151 or from the object identifier sensing device 152). In an embodiment, the communication interface 130 may be configured to communicate with the robot 161 of fig. 1B. If the computing system 101 is not part of a robot control system, the computing system 101 is not part ofThe communication interface 130 may be configured to communicate with a robotic control system. The communication interface 130 may include, for example, communication circuitry configured to perform communications via wired or wireless protocols. By way of example, the communication circuit may include an RS-232 port controller, a USB controller, an Ethernet controller,

a controller, a PCI bus controller, any other communication circuit, or a combination thereof.

In an embodiment, the processing circuit 110 may be programmed by one or more computer readable program instructions stored on the non-transitory computer readable medium 120. For example, fig. 2C illustrates a computing system 101B, which computing system 101B is an embodiment of a computing system 101/101a, wherein the processing circuitry 110 is programmed by one or more modules, including a container detection module, an object detection module 204, and/or a motion planning module 206, which will be discussed in more detail below.

In embodiments, the container detection module 202 may be configured to detect containers (such as drawers), or more specifically, determine information about how drawers are arranged in 3D space, such as their orientation and/or depth. As discussed in more detail below, the container detection module 202 may be configured to make such a determination using at least spatial structure information. In some cases, the container detection module 202 may determine how particular portions of the drawer (such as their bottom interior surfaces) are arranged in the 3D space. In some implementations, the container detection module 202 may be configured to determine how the bottom interior surface is arranged in 3D space based on how another portion of the drawer (such as its bezel) is arranged in 3D space.

In an embodiment, the object detection module 204 may be configured to detect an object (such as a piece of merchandise disposed on a bottom interior surface of a drawer) within a container, or more specifically, determine how the object is arranged in a 3D space, such as an orientation and/or depth of the object. In some cases, the object detection module 204 may make such a determination based on information generated by the container detection module 202 regarding how drawers or other containers are arranged in the 3D space. In some cases, the object detection module 204 may be configured to identify the object and/or the size of the object using sensed object identifier information (e.g., sensed barcode information), as discussed in more detail below.

In an embodiment, the motion planning module 206 may be configured to determine robot motions for interacting with the container, for interacting with objects within the container, and/or for moving the spatial structure sensing camera 151 and/or the object identifier sensing device 152. For example, robotic movement may be part of a robotic operation to grasp or otherwise pick up objects from a container and move the objects elsewhere. The robot motion may be determined based on, for example, information generated by the object detection module 204 regarding how the objects are arranged in the 3D space and/or generated by the container detection module 202 regarding how the containers are deployed in the 3D space. It is to be understood that the functionality of the modules discussed herein is representative and not limiting.

In various embodiments, the terms "computer-readable instructions" and "computer-readable program instructions" are used to describe software instructions or computer code that are configured to perform various tasks and operations. In various embodiments, the term "module" broadly refers to a collection of software instructions or code configured to cause the processing circuit 110 to perform one or more functional tasks. When a processing circuit or other hardware component is executing a module or computer readable instructions, the module and computer readable instructions may be described as performing various operations or tasks. In some cases, the modules and computer readable instructions may implement methods for performing container detection and planning robotic interactions based on container detection.

Fig. 3A-3C illustrate environments in which methods for container detection and/or robotic interaction may occur. More specifically, fig. 3A depicts a system 300 (which may be an embodiment of system 100/100a/100B/100C of fig. 1A-1E), the system 300 including a computing system 101, a robot 361, and a spatial structure sensing camera 351 (which may be an embodiment of spatial structure sensing camera 151). In an embodiment, robot 361 may include a base 362 and a robotic arm 363. The base 362 may be used to mount the robot 361, and the robotic arm 363 may be used to interact with the environment of the robot 361. In an embodiment, the mechanical arm 363 may include a plurality of arm portions that are movable relative to each other. For example, fig. 3A shows

arm portions

363A, 363B, 363C, and 363D that are rotatable and/or extendable relative to each other. For example, the robotic arm 363 may include one or more motors or other actuators configured to rotate the arm portion 363A relative to the base 362, the arm portion 363B relative to the arm portion 363A, and the arm portion 363C relative to the arm portion 363B. In this example, the

arm portions

363A, 363B, and 363C may be links of a robot arm 363, and the arm portion 363D may be an end effector such as a robot arm. In some cases, the robot may include a gripper configured to grip or otherwise pick up an object to enable the robot arm 363 to interact with the object.

In the embodiment of fig. 3A, the spatial structure sensing camera 351 is mounted on or otherwise attached to the robot 361, or more specifically, to the robotic arm 363 of the robot 361, at a location that is at or near a portion of the end effector 363D. The spatial structure sensing camera 351 may be part of an imaging system. In some cases, the imaging system may also include an object identifier sensing device, as will be discussed below with respect to fig. 7A. In some cases, an object identifier sensing device may also be attached to the mechanical arm 363, such as at a location on or near the end effector 363D.

As shown in FIG. 3A, the system 300 may also include one or more containers 384, such as containers 384A-384L. In some cases, the containers 384A-384L may be located in a warehouse or retail space and may be used to contain items such as merchandise or other objects. In the example of FIG. 3A, the containers 384A-384L may be housed in a cabinet 380, and the cabinet 380 may provide a housing 381 that arranges the containers 384A-384L in a stacked arrangement. In this example, the containers 384A-384L may each be a drawer movable between a closed position and an open position. Each of containers 384A-384L may be attached to cabinet 380 via one or more links. For example, fig. 3A shows a pair of

guide rails

382A, 383A that are attached to an inner surface of the housing 381 and allow the container 382A to slide between an open position and a closed position. In some cases, at least one of the containers 384A-384L may hold one or more objects. For example, the container 384A may include

objects

371, 373, which may be merchandise in a warehouse or retail space. As an example, the

objects

371, 373 may each be a box that is or contains an item, such as a commodity to be shipped or sold.

As described above, the spatial structure sensing camera 351 (and/or the object identifier sensing device) may be a hand-held camera device mounted or otherwise placed on the robotic arm 363. Such placement may allow flexibility in the location and/or orientation of the spatial structure sensing camera 351 (and/or the object identifier sensing device). More specifically, fig. 3A to 3C illustrate an embodiment in which the robot arm 363 is able to move the spatial structure sensing camera 351 (and/or the object identifier sensing device) to various positions and/or orientations, instead of mounting the spatial structure sensing camera 351 (and/or the object identifier sensing device) at a fixed mounting point. For example, such embodiments allow the robotic arm 363 to adjust the distance between the sensed object and the spatial structure sensing camera 351 (and/or object identifier sensing device) in order to adjust the focus and/or resolution level in the resulting spatial structure information (and/or sensed object identifier information).

FIG. 3B also shows a situation in which the containers 384A-384L of FIG. 3A (such as container 384A) are each in a closed position. When the container 384A is in the closed position, its contents (e.g., objects 371, 373) may not be accessible by the robot 361, or more specifically, by the robotic arm 363. For example, when the container 384A is in the closed position, an interior surface (e.g., a bottom interior surface) of the container 384A may not be substantially exposed to the environment outside of the housing 381 of the cabinet 380. Further, the contents of the container 384A (e.g., objects 371, 373) may be hidden from view. More specifically, they may be blocked from the camera view 353 of the spatial structure sensing camera 351 (and/or the view of the object identifier sensing device). In such an example, an exterior portion of one or more containers (such as the handle of container 384A) may be within camera field of view 353. As discussed in more detail below, in some embodiments, the robotic arm 363 may be configured to grasp and pull, for example, a handle of the container 384A in order to slide the container 384A (e.g., via the

rails

382A, 383A of fig. 3A) to an open position.

Fig. 3C shows the container 384A in an open position (also referred to as an open position). When the container 384A is in the open position, its contents are accessible by the robot 361, or more specifically by the mechanical arm 363. For example, the container 384A may be slid via the

guide rails

382A, 383A to a position where some or all of the bottom interior surface of the container 382A is exposed to the environment outside of the housing 381 of the cabinet 380. In such a case, at least a portion of the bottom interior surface may be in the camera field of view 353 of the spatial structure sensing camera 351 (and/or in the field of view of the object identifier sensing device). Further, when the container 384A is in the open position, the contents of the container 384A (such as the

objects

371, 373 disposed on the bottom interior surface of the container 384A) may also be within the camera field of view 353. As described above, the spatial structure sensing camera 351 may be configured to generate spatial structure information describing the container 384A and/or the

objects

371, 373 contained therein. The spatial structure information may be used to detect the pose of the

object

371, 373 and facilitate interaction between the robot arm 363 and the

object

371, 373, such as interaction in which the end effector 363D of the robot arm 363 picks up the

object

371, 373 and removes the

object

371, 373 from the container 384A.

Fig. 3D provides a view of the container 384A of fig. 3A-3C. As shown in fig. 3D, in an embodiment, container 384A may have a surface 384A-1 (also referred to as container surface 384A-1) on which one or more objects (such as objects 371, 373) are disposed within container 384A. For example, container surface 384A-1 can be a bottom interior surface of container 384A. In an embodiment, container 384A may have a border 384A-2 (also referred to as container border 384A-2) that is offset from container surface 384A-1. Container rim 384A-2 can be formed from one or more side walls 384A-3, 384A-4, and 384A-5, each of which can have a common height h (as shown in FIG. 3D) or different respective heights. In this example, container rim 384A-2 can include a top surface of one or more side walls 384A-3, 384A-4, and 384A-5. Container bounding box 384A-2 and container surface 384A-1 may be separated by a distance equal to or based on height h, which may be a known value that computing system 101 is configured to receive or determine. In some cases, computing system 101 of fig. 3A-3C may determine information describing container surface 384A-1 and/or container rim 384A-2 and use this information to determine additional information describing one or more objects (e.g., 371, 373) disposed within container 384A, as discussed in more detail below. As further depicted in fig. 3D, in some cases, container 384A may include handles 384A-6. In some embodiments, the computing system 101 may be configured to cause the robotic arm 363 (of fig. 3A-3C) to move the container 384A to the open position by pulling on or otherwise interacting with the handle 384A-6.

Fig. 4 depicts a flow diagram of a method 400 for facilitating robot interaction with objects contained in a container. The method 400 may involve determining information describing how the objects are arranged in space so that the robot may move in an appropriate manner to, for example, grasp the objects. In one example, the information may describe a pose of the object (also referred to as an object pose), which may describe at least one of an orientation or a depth value of the object (e.g., relative to the spatial structure sensing camera 351 of fig. 3A-3C). As discussed in more detail below, the method 400 may determine how objects are arranged in space based on information describing how containers (e.g., 384A) are arranged in space. In embodiments, the method 400 may be performed by the computing system 101 or more specifically by the processing circuitry 110. In some cases, the method 400 may be performed when the processing circuit 110 executes instructions stored on the non-transitory computer-readable medium 120 of fig. 2A-2C.

In an embodiment, the method 400 may begin with the container (e.g., 384A) in the closed position, and may involve the computing system 101 controlling a robot (e.g., 361) to move the container to the open position, as discussed in more detail below. Such movement may involve, for example, the container 384A sliding along the

guide rails

382A, 383A shown in fig. 3A and 3C. As discussed with respect to fig. 3A and 3C, in an embodiment, the

rails

382A, 383A can be attached to an inside surface of the housing 381 in which the container 384A is received, and can allow the container 384A to slide in and out of the housing 381.

In an embodiment, the method 400 may begin with the container (e.g., 384A) in or already in an open position, such as shown in fig. 3C and 5A. Similar to the example in fig. 3C, container 384A in fig. 5A contains

objects

371 and 373, which objects 371 and 373 are disposed within container 384A, and more particularly on container surface 384A-1 of container 384A. Because the container 384A is in an open position, objects 371, 373 within the container 384A may be within the camera field of view 353 of the spatial structure sensing camera 351.

In an embodiment, the method 400 may include step 402, where the computing system 101 receives spatial structure information generated by a spatial structure sensing camera (e.g., 351 of fig. 5A). The spatial structure information may include depth information of the environment in the camera field of view 353. More specifically, the spatial structure information may describe how various structures in the camera field of view 353 are spatially arranged (i.e., how they are arranged in space). The various structures may include, for example, a container 384A and objects 371, 373 disposed within the container 384A.

In embodiments, the spatial structure information may be used to detect the tilt, or more specifically, the amount of tilt and/or tilt orientation, of one or more structures. More specifically, the container 384A may have one or more linkages, such as

rails

382A, 383A that attach the container 384A to the housing 381. As the container 384A moves toward the open position, it may tilt downward relative to the one or more links and relative to the housing 381. For example, when the container 384A slides along the

guide rails

382A, 383A in fig. 5A from a closed position to an open position, the weight of the container 384A may cause it to tilt downward relative to the

guide rails

382A, 383A. An example of the tilt is shown in fig. 5B. More specifically, fig. 5B depicts an axis 582 that represents the orientation of the

guide rails

383A, 383A of fig. 5A, and more specifically parallel to the

guide rails

382A, 383A. The figure also depicts another axis 582P that is perpendicular to axis 582. In some cases, the axis 582P may be parallel to a vertical wall of the housing 381 for the cabinet 380. Fig. 5B also depicts an axis 584 and an axis 584P, both of which may represent the orientation of the container 384A. More specifically, axis 584 may be parallel to container 384A, or more specifically, parallel to container surface 384A-1. The axis 584P may be a normal axis (normal axis) of the container surface 384A-1, and may be perpendicular to the axis 584. When the container 384A is in the closed position, an axis 584 associated with the container 384 can be parallel to an axis 582 associated with the

guide rails

382A, 383A. Further, axis 584P may be parallel to axis 582P. As described above, as container 384A slides from the closed position to the open position, container 384A may tilt downward, causing axis 584 associated with container 384A to be offset from axis 582, and causing normal axis 584P to be offset from axis 582P, as shown in fig. 5B. In other words, the axis 584 may become inclined (be oblique to) the axis 582, and the axis 584P may become inclined to the axis 582P.

In an embodiment, tilting of the container 384A may cause the container 384A and any objects (e.g., objects 371 or 373) within the container 384A to drift in depth and/or orientation relative to, for example, the robotic arm 363 and/or the spatial structure sensing camera 351 of fig. 3A-3C. For example, if the container 384 is not tilted downward when it is in the open position, the

objects

371, 373 may have a first depth value and a first orientation relative to the spatial structure sensing camera 351 and/or relative to the robotic arm 363. The tilting of the container 384A may cause the

object

371, 373 to have a second depth value and a second orientation relative to the spatial structure sensing camera 351 and/or relative to the robotic arm 363. In some cases, the second depth value may be only a few millimeters greater than the first depth value, and the second orientation may differ from the first orientation by only a few degrees or a fraction of a degree, but such differences may be sufficient to affect the ability of the robotic arm 363 to grasp or otherwise interact with the

object

371, 373 in a precise manner, particularly if the computing system 101 assumes that the

object

371, 373 is arranged in space according to the first depth value or the first orientation. Furthermore, the computing system 101 may need to determine how the

objects

371, 373 are arranged in space with an accuracy of a millimeter or better to ensure proper interaction between the robot 361 and the

objects

371, 373 of fig. 3A-3C.

In some cases, the amount or effect of tilting may be difficult to predict with millimeter accuracy because the container (e.g., 384A) may have multiple degrees of freedom as it moves from the closed position to the open position. Accordingly, one aspect of the present application relates to using spatial structure information, such as received in step 402 of method 400, to determine how an object (e.g., 371/373) is arranged in space in order to facilitate the ability to control the proper interaction of robotic arm 363 with the object (e.g., 371/373).

As described above, the spatial structure information of step 402 may include depth information of the environment in the camera field of view (e.g., 353). The depth information may include one or more depth values, each of which may indicate a depth of a particular location in the camera field of view 353 relative to a spatial structure sensing camera (e.g., 351) or relative to some other reference point or frame of reference. In some cases, the location associated with the depth value may be a location on the surface of a structure in the camera field of view 353. For example, FIG. 5C illustrates the inclusion of a depth value d_{Object A, position 1}、d_{Frame, position 1}、d_{Surface, position 1}And d_{Floor, position 1}The depth information of (a). In this example, the depth value d_{Object A, position 1}The position on the object 371 may be indicated as a depth value relative to the spatial structure sensing camera 351, or more specifically relative to an image plane 354 formed by an image sensor or other sensor of the spatial structure sensing camera 351. More specifically, the depth value d_{Object A, position 1}A distance between a location on the object 371 (e.g., on a surface of the object 371) and the image plane 354 may be indicated. This distance may be measured along an axis perpendicular to the image plane 354. In an embodiment, the depth information may include one or more corresponding depth values for one or more portions of the container (such as container surface 384A-1 and container border 384A-2). For example, the depth value d_{Surface, position 1}A depth value for container surface 384A-1 (or more specifically, a location on container surface 384A-1) may be indicated. Depth value d_{Frame, position 1}Container border 384A-2 (or more specifically, on container border 384A-2) may be indicatedThe position of) relative to the image plane 354. In addition, the depth value d_{Floor, position 1}May indicate a depth value of a floor or other surface (or more specifically, a location on a floor) on which the housing 381 of the cabinet 380 is disposed.

Fig. 6A depicts a representation of the spatial structure information received in step 402. In this example, the spatial structure information may include or may identify a plurality of depth values for a plurality of locations in the camera field of view (e.g., 353), respectively. More specifically, the figure illustrates

various sets

610 and 660 of locations (also referred to as points) for which spatial structure information identifies respective depth values. The set of locations 610 (identified as striped hexagons) may correspond to a floor or other surface on which the enclosure 381 of fig. 5C is disposed. For example, collection 610 may include location 610₁Which may correspond to depth value d of fig. 5C_{Floor, position 1}. The set of locations 620 (identified as white circles) may belong to a container surface 384A-1 (e.g., a bottom interior surface) of the container 384A. For example, collection 620 may include location 620₁Which may correspond to depth value d of fig. 5C_{Surface, position 1}. The set of locations 630 (identified as dark circles) may belong to container border 384A-2. For example, the collection 630 may include locations 630₁Which may correspond to depth value d of fig. 5C_{Frame, position 1}. The set of locations 640 (identified as dark ovals) may belong to the handle 384A-6 of the container 384A. Further, the set of locations 650 (identified as shaded rectangles) can belong to the object 371 of fig. 5C. For example, the collection 650 may include locations 650₁Which may correspond to depth value d of fig. 5C_{Object A, position 1}. Additionally, a set of locations 660 (identified as white rectangles) may belong to the object 373 of fig. 5C.

In an embodiment, the spatial structure information may comprise a depth map and/or a point cloud. The point cloud may include, for example, respective coordinates of locations on one or more structures in a camera field of view (e.g., 353) of a spatial structure sensing camera (e.g., 351). For example, the point cloud may include 3D coordinates, such as [ x y z ] coordinates in a frame of reference (e.g., coordinate system) or some other frame of reference of the spatial structure sensing camera. In such an example, the coordinates of the location may indicate a depth value of the location. For example, the depth value of the location may be equal to or based on the z-component of the coordinates.

In embodiments, the spatial structure information may be affected by or include measurement errors or other errors. For example, location 650₁May have d equal to_{Object A, position 1}But the point cloud or other spatial structure information may indicate a location 650 on the object 371₁Having [ x y z ]]Coordinates where z ═ d_{Object A, position 1}+ε_{Object A, position 1}In which epsilon_{Object A, position 1}Is referred to and located at 650₁The associated error. In this case, the spatial structure information may erroneously indicate the position 650₁With depth value d_{Object A, position 1}+ε_{Object A, position 1}. This error may be due to, for example, imaging noise or some other error source. The error may be based on various factors. In some cases, the object 371 or other structure may have a shape that interferes with the principles of operation of the spatial structure sensing camera (e.g., 351). In some cases, the object 371 may be formed of a material (e.g., a transparent or translucent material) that interferes with the principle of operation of the spatial structure sensing camera. In some cases, light or other signals may reflect from another object 373 (of fig. 5C) or the inner surface of the container 384A, and such reflected signals from another object 373 may act as imaging noise that interferes with the ability of the spatial structure sensing camera (e.g., 351) to accurately measure the depth values of the object 371.

Fig. 6B provides an example of a location (represented by a shaded triangle) corresponding to a portion of spatial structure information that is substantially affected by noise or other error sources. In the example of FIG. 6B, location 620 of location set 620 (corresponding to container surface 384A-1)₂To 620₅May be substantially affected by noise and the spatial structure information corresponding to these locations may include a significant amount of error. Further, the locations 650 of the set of locations 650 (corresponding to the object 371)₁To 650₃And a position 660 of the set of positions 660 (corresponding to the object 373)₁To 660₃May be substantially affected by noise, and corresponds theretoThe spatial structure information at these locations may also include a significant amount of error. In such examples, directly using spatial structure information of position set 650 or position set 660 to determine how object 371 or object 373 is arranged in space may lead to inaccurate or unreliable results, as a significant portion of these positions may be affected by noise or other error sources. Accordingly, an aspect of the present disclosure relates to determining how containers (e.g., 384A) are arranged in space, and determining how objects (e.g., 371/373) within containers (e.g., 384A) are arranged in space based on how the containers are arranged in space.

Returning to FIG. 4, in embodiments, method 400 may include step 404, where computing system 101 determines a container pose based on the spatial structure information. In some implementations, step 404 may be performed by the container detection module 202 of fig. 2C. In embodiments, a container pose may refer to a pose of a container, such as container 384A, and may be used to describe at least one of an orientation of the container (e.g., 384A) or a depth value of at least a portion of the container (e.g., 384A). In some cases, the portion of the container (e.g., 384A) may refer to a component of the container, such as a container rim (e.g., 384A-2) or a container surface (e.g., 384A-1). In some cases, the portion of the container (e.g., 384A) may refer to an area on the container, or more generally, a location (e.g., 620 of FIG. 6A) on a surface of the containers (such as a surface of the container on which container contents are disposed (e.g., 384A-1) or a surface of a container rim (e.g., 384A-2)) (e.g., a surface of the container on which container contents are disposed)₁Or 630₁)。

In some cases, the container pose may describe the pose or deployment of the container (e.g., 384A), or more generally how the container (e.g., 384A) is arranged in 3D space. For example, the container pose may describe an orientation of the container (e.g., 384A), which may describe an amount (if any) that the container (e.g., 384) or a portion thereof is tilted downward. As described above, a container pose may describe a depth value that may indicate, for example, how far a container (e.g., 384A) or a portion thereof is from a spatial structure sensing camera (e.g., 351 of fig. 3A-3C) or from other portions of a robotic arm (e.g., 363) or robot (e.g., 361).

In some cases, the container pose may describe the orientation of the container and a location (e.g., 620A) on the container (e.g., 384A)₁Or 630₁) Both depth values of (a). The depth value may be equal to or indicate, for example, a component of the 3D coordinate of the location. For example, the 3D coordinates may be 2D components or 2D coordinates (e.g., [ x y ]]Coordinates) and depth component (e.g., z component or z coordinate) [ x y z]And (4) coordinates. The z-component or z-coordinate may be equal to or based on a depth value of the location relative to a spatial structure sensing camera (e.g., 351) or some other frame of reference. In such an example, the container pose may describe both the orientation of the container and the 3D coordinates of the location on the container.

In an embodiment, the container pose determined in step 404 may be a container surface pose, which may be a pose of a container surface (e.g., 384A-1). The container surface may be a bottom interior surface or other surface on which an object or other content, such as a container (e.g., 384A), is disposed within the container. The container surface pose may describe, for example, at least one of: at least one location (e.g., 620) on a container surface (e.g., 384A-1) or a facing of a container surface (e.g., 384A-1) or a container surface (e.g., 384A-1)₁) The depth value of (2).

In embodiments, determining the container surface pose may involve directly using a portion of the spatial structure information corresponding to a location on the container surface (e.g., 384A-1). For example, computing system 101 in an embodiment may be based directly on including location 620 corresponding to FIG. 6B₁To 620_nThe set of locations 620 to determine the container surface pose. The corresponding spatial structure information may include, for example, location 620₁To 620_nTo the corresponding depth value. In some cases, computing system 101 may be configured to assign location 620₁-620_nIdentified as belonging to a container surface (e.g., 384A-1), or more generally, location 620₁-620_nIdentified as belonging to a common layer in order to distinguish these locations from those representing other layers in the camera field of view (e.g., 353). For example, computing system 101 may be configured to associate location 620 with₁To 620_nMarked as having substantial continuityAnd there is no significant discontinuity between these depth values. In some cases, computing system 101 in this embodiment may determine best-fit location 620 by determining₁To 620_nThe plane of all or some of the locations in the container surface pose. The computing system 101 may determine that the orientation of the container surface (e.g., 384A-1) is equal to or based on a characteristic of the plane, such as its slope or normal vector. In some cases, computing system 101 may estimate or otherwise determine depth values for locations on the container surface (e.g., 384A-1) based directly on the spatial structure information. For example, if the spatial structure information provides 3D coordinates (such as [ x y z ] for the location]Coordinates), the depth value may be equal to or based on the z-component of the 3D coordinates. In some cases, computing system 101 may use the plane to estimate a depth value for a location on the container surface (e.g., 384A-1) because the location may fall on or substantially near the plane. For example, if computing system 101 receives a 2D component (e.g., [ x y ]) of a location on a container surface (e.g., 384A-1)]Component), it may be configured to determine the 3D coordinates that belong to a plane that also has the 2D component. In such an example, the 3D coordinates on the plane may indicate or approximate a location on a container surface (e.g., 384A-1). Thus, the z-component of the 3D coordinate on the plane may be equal to or approximate the depth value of the location on the container surface (e.g., 384A-1).

In embodiments, determining the container surface pose may involve indirectly using spatial structure information corresponding to another portion of the container (e.g., 384A), such as spatial structure information corresponding to a container border (e.g., 384A-2). More specifically, embodiments of step 404 may involve determining a container rim pose and determining a container surface pose based on the container rim pose. The container rim pose may describe an orientation of a container rim (e.g., 384A-2) or at least one location (e.g., 630A-2) on the container rim (e.g., 384A-2)₁) At least one of the depth values of (a).

In some cases, determining the container surface pose based on the container bounding box pose may provide a more robust determination of noise or other error sources. More specifically, noise may not only affect deploymentA location (e.g., 650) on an object (e.g., 371) within a container (e.g., 384)₁To 650₃) But may also affect the location on the container surface (e.g., 384A-1) on which the object is deployed. Thus, depth information or other spatial structure information corresponding to these locations may be unreliable. For example, FIG. 6C illustrates a situation in which a location 620 on container surface 384A-1₂、620₃、620₄、620₅、620₆、620₇、620₈、620₉、620₁₀,...620_k(it can be all locations 620 identified by spatial structure information)₁To 620_nA subset of) is affected by imaging noise, which may introduce errors into the corresponding part of the spatial structure information of these locations, and more specifically into the depth information of these locations. Although FIG. 6B also shows an example with noise (at location 620)₂To 620₆At), but fig. 6C shows an example of a more noisy environment at container surface 384A-1. In the example of FIG. 6C, the locations affected by noise (620)₂To 620_k) Possibly all locations for which spatial structure information is available (620)₁To 620_n) A large percentage of. The noise may originate, for example, from the presence of signals reflected from a container surface (e.g., 384A-1) or from objects (e.g., 371/373) on the container surface (e.g., 384A-1), where the reflected signals may interfere with each other and with direct measurements of depth values for locations on the container surface (e.g., 384A-1).

In some cases, crowded containers (i.e., containers that include many objects that may obscure the container surface) may also interfere with the direct measurement of depth values for locations on the container surface (e.g., 384A-1). For example, FIG. 6D depicts a situation in which many objects (such as objects 371-375) are deployed on container surface 384A-1. Objects 371-375 may cover a substantial portion of container surface 384A-1. More specifically, fig. 6E shows spatial structure information of the example depicted in fig. 6D. As shown in FIG. 6E, objects 371-375 may cover or otherwise occupy regions 652-692 on container surface 384A-1. Although portions of container surface 384A-1 are not covered by object 371-375, these portions may still be affected by noise, thus limiting the ability to make accurate depth measurements directly on container surface 384A-1 using spatial structure sensing camera 371.

Accordingly, one aspect of the present disclosure relates to indirectly determining information about a container surface (e.g., 384A-1) using spatial structure information corresponding to another portion of the container, such as a container border (e.g., 384A-2). In embodiments, as described above, the computing system 101 may determine a container rim pose and use the container rim pose to determine a container surface pose. In some cases, the container rim (e.g., 384A-2) is less affected by noise or measurement error sources. That is, these measurement error sources may affect direct measurements of a container surface (e.g., 384A-1) or an object disposed on the container surface (e.g., 384A-1). However, the container rim (e.g., 384A-2) may be offset from the container surface (e.g., 384A-1) by one or more sidewalls of the container (e.g., sidewalls 384A-3 through 384A-5 in fig. 3D) having a height h that may cause the container rim (e.g., 384A-2) to be positioned entirely over the object (e.g., 371, 373). Thus, the container rim (e.g., 384A-2) may be much less affected by measurement error sources, and the direct depth measurement of the container rim (e.g., 384A-2) may be more accurate than the direct depth measurement of the container surface (e.g., 384A-1).

In an embodiment, a direct measurement of a container border (e.g., 384A-2) may include spatial structure information corresponding to a location on the container border (e.g., 384A-2), and a container border pose may be determined based on such spatial structure information. For example, as shown in FIG. 6C, the spatial structure information may include information indicating a location 630 on a container border (e.g., 384A-2)₁、630₂、630₃、630₄,...630_nDepth information of the corresponding depth value. In some cases, computing system 101 may be configured by associating location 630 with a location₁-630_nIdentified as having no significant discontinuity in depth therebetween, and thus position 630₁-630_nIdentified as belonging to a common layer separate from other layers in the camera field of view (e.g., 353)The positions 630 are combined₁-630_nAnd indicates the position of another component of the container (e.g., and 620)₁To 620_n) Are distinguished. In embodiments, computing system 101 may be configured to search for a location in an estimation region (e.g., 630) by determining an estimation region where a container bounding box (e.g., 384A-2) should be located and searching for a location in the estimation region₁-630_n) Will place location 630₁-630_nIdentified as belonging to the container border (e.g., 384A-2). For example, computing system 101 may access established or otherwise known information about the structure of a container (e.g., 384A) and the structure of a cabinet (e.g., 380) or enclosure (e.g., 381) in which the container (e.g., 384A) is located. The information may identify, for example, the size (e.g., dimensions), physical configuration, shape, and/or geometry of the container (e.g., 384A) or cabinet 380. The computing system 101 may be configured to determine the estimated area based on this information. For example, the computing system may estimate that the container bounding box (e.g., 384A-2) should have a depth value of about 600mm, and have an error range of about 10 mm. Computing system 101 may then search for container bounding boxes (e.g., 384A-2) in the estimated area that occupies space having depth values in the range of 590mm to 610 mm.

As described above, the container border gesture can indicate at least one of an orientation of the container border (e.g., 384A-2) or a depth value of at least one location on the container border (e.g., 482A-2). In embodiments, computing system 101 may determine the location (e.g., 630) on the container rim (e.g., 384A-2)₁-630_n) The difference, if any, between the corresponding depth values of some or all of the locations in the container bounding box pose. For example, if location 630₁-630_nMay be the same or substantially the same, then computing system 101 may determine that container bounding box 384A-2 has a substantially flat orientation relative to spatial structure sensing camera 351 or other reference frame. If the corresponding depth value changes according to location, the computing system 101 may determine a slope that represents the change. The orientation of the container rim pose may be equal to or based on the slope.

In an embodiment, computing system 101 may determine a substantially-fitting container bounding box by determining a substantially-fitting container bounding box(e.g., 384A-2) at a location (e.g., 630) for which spatial structure information is provided₁To 630_n) The plane of some or all of the positions in the container frame. For example, computing system 101 may be a location (e.g., 630)₁To 630_n) Determines a 3D coordinate (e.g., [ x ]) for each of the locations or a subset of the locations_ny_nz_n]) Wherein the 3D coordinates may include a depth component (e.g., z) derived from the spatial structure information_n). The computing system 101 may determine the plane that best fits the corresponding 3D coordinates. For example, the plane may be represented by equation a (x-x)₀)+b(y-y₀)+c(z±z₀) Is represented by 0, wherein [ x₀,y₀,z₀]May be one of the locations (e.g., 630) on the container rim (e.g., 384A-2)₁) And [ x y z ]]May represent some or all of the remaining locations (e.g., 630) on the container's rim (e.g., 384A-2)₂To 630_n) 3D coordinates of (a). The computing system 101 may generate a set of simultaneous equations based on the coordinates and solve them for the coefficients a, b, c that best satisfy the simultaneous equations. In such an example, computing system 101 may determine that the orientation of the container rim pose is equal to or based on a planar characteristic, such as its slope or normal vector (e.g., parallel to the normal vector)<a b c>The vector of (d).

As described above, the computing system 101 may determine the container surface pose based on the container bounding box pose. In some cases, such a determination may be based on an established distance between a container border (e.g., 384A-2) and a container surface (e.g., 384A-1). The predetermined distance may be, for example, the height h of one or more side walls (e.g., 384A-3 through 384A-3 of fig. 3D) that form a container rim (e.g., 384A-2). In some cases, the established distance may be a known value stored on a non-transitory computer-readable medium (e.g., 120) accessible to computing system 101.

As further described above, the container surface pose may describe at least one of an orientation of the container surface (e.g., 384A-1) or a depth value of a location on the container surface (e.g., 384A-1). In some cases, computing system 101 may determine the orientation of the container surface pose based on the orientation of the container bezel pose. More specifically, computing system 101 may determine that the orientation of the container surface (e.g., 384A-1) is equal to or based on the orientation of the container border (e.g., 384A-2). Such a determination may be based on the assumption that the container rim (e.g., 384A-2) is parallel to the container surface (e.g., 384A-1).

As an example, as described above, computing system 101 may determine a first plane defining an orientation of a container rim pose and use the first plane to determine a second plane defining an orientation of a container surface pose. For example, FIG. 6F illustrates a first plane 684A-2 that may be based on a location (e.g., 630A) for on container rim 384A-2₁To 630_n) And may define the orientation of container border 384A-2. Computing system 101 may be configured to determine second plane 684A-1 based on first plane 684A-2, where second plane 684A-1 may define an orientation of container surface 384A-1. In some cases, first plane 684A-2 and second plane 684A-1 may be parallel to each other and offset by a given distance h. For example, as described above, if the first plane 684A-2 is formed by equation a (x-x)₀)+b(y-y₀)+c(z-z₀) If the definition is 0, then computing system 101 may determine that second plane 684A-1 is to be defined by equation a (x-x)₀)+b(y-y₀)+c(z-z₀-h) is defined as 0.

In an embodiment, computing system 101 may use second plane 684A-1 to determine or represent a corresponding depth value for a location on container surface 384A-1 because the location on container surface 384A-1 may fall on plane 684A-1 or be substantially close to plane 684A-1. As an example, if computing system 101 receives a 2D component or 2D coordinates of a location on container surface 384A-1 (e.g., [ x y ]]Coordinates) that are determined to be the 3D coordinates on plane 684A-1 that correspond to this 2D component, and a depth value for this location is determined based on the depth component (e.g., z component) of this 3D coordinate. More specifically, computing system 101 may determine that equation a (x-x) is satisfied₀)+b(y-y₀)+c(z-z₀-h) 3D coordinates [ x y z ] of 0]Where x and y may belong to the received 2D component, and where z may be the depth component of the 3D coordinate. Thus, information determined from container rim 384A-2 can be used to make information about the containerReliable determination of the orientation and/or depth of the device surface 384A-1.

Returning to FIG. 4, in an embodiment, method 400 may include step 406, where computing system 101 determines an object pose based on the container pose. This step may be performed by, for example, the object detection module 204 of fig. 2C. In this embodiment, the object pose may describe at least one of an orientation of an object (e.g., 371/373 of fig. 3A-3C) disposed within the container (e.g., 384A) or a depth value of at least a portion of the object. In some cases, the object may be a target object that requires robotic interaction. A portion of an object may refer to, for example, a location on a surface (e.g., top surface) of the object and/or a physical feature of the object, such as a corner or edge of the top surface.

In an embodiment, the container pose used to determine the pose of the object may be a container surface pose. For example, the computing system 101 may determine that the orientation of the object pose is equal to or based on the orientation of the container surface pose. More specifically, the computing system 101 may determine that the orientation of the object (e.g., 371) is equal to the orientation of the container surface (e.g., 384A-1).

In some cases, computing system 101 may determine that the orientation of the object (e.g., 371/373) is based on the orientation of the container surface (e.g., 384A-1). For example, the computing system 101 may determine that the orientation of the object (e.g., 371/373) is equal to the orientation of the container surface (e.g., 384A-1) on which the object is deployed. Such a determination may be based on the assumption that an object, such as a merchandising box, is sitting upright (sitting flush) on the surface of the container (e.g., 384A-1).

In some cases, computing system 101 may determine a depth value for a location on the object (also referred to as an object location) based on a depth value for a corresponding location on the container surface (e.g., 384A-1). The corresponding location may be where the object is located on the container surface (e.g., 384A-1), or more generally have the same 2D component or 2D coordinates as the object location. In some cases, the determination may be based on a given size (such as a given height h) of the object (e.g., 371)_Object) The given size may be stored in a non-transitory computer readable medium (e.g., 120) accessible to computing system 101. If the object is located atOn the top surface of the object (e.g., 371), then the computing system 101 can determine that the object position and the position on the container surface (e.g., 384A-1) are at a given height h_ObjectAnd (4) separating. For example, if a location on a container surface (e.g., 384A-1) has 3D coordinates [ x y z ]_{Surface of}]Then computing system 101 may determine that the object position has 3D coordinates [ x y z ]_{Surface of}±h_Object]. In such an example, computing system 101 may initially determine a 2D component of the object position and use the 2D component to determine the 3D coordinates [ x y z [ ]_{Surface of}]The 3D coordinate [ x y z_{Surface of}]May be determined, for example, based on the equations of the solution plane 684A-1 or, more generally, based on the pose of the container surface, as described above. Then, the computing system 101 may determine the depth value of the object location to be equal to or based on z_{Surface of}±_{h object}. Such techniques provide a robust way of accurately determining the orientation and/or depth value of an object (e.g., 371) even in environments with a large amount of imaging noise. Such imaging noise may prevent the spatial structure sensing camera (e.g., 351) from directly measuring the depth of the object in an accurate manner. However, the computing system 101 may make indirect measurements by using the spatial structure information of the container to determine the container surface pose and to determine the depth value of the corresponding location on the container surface. The depth value of the object may then be inferred based on the depth value of the corresponding location on the container surface.

In some cases, step 406 may be performed in an environment that includes an object identifier sensing device, such as barcode sensing device 352 of fig. 7A. Barcode sensing device 352 (which may be an embodiment of object identifier sensing device 152) may be mounted to a robotic arm (e.g., 363) or a fixed mounting point. The barcode sensing device 352 may have a field of view 355 (also referred to as a reader field of view) and may be configured to sense a barcode or some other object identifier (if any) disposed on an object (e.g., 371/373) disposed within a container (e.g., 384A). For example, fig. 7B provides an example in which a barcode 711 is disposed on an object 371 and a barcode 713 is disposed on an object 373. As described above, the object identifier information may be configured to generate sensed object identifier information. In the example of fig. 7B, barcode sensing device 352 may be configured to generate sensed barcode information, which may describe the location of barcode 711/713, information encoded into barcode 711/713, or any other information about barcode 711/713. The information encoded into the barcode 711/713 may describe the object 371/373 over which the barcode is disposed, such as the identity of the object 711/713 or the size of the object 711/713.

In some cases, the computing system 101 may be configured to determine whether information encoded to a barcode (e.g., 711/713) or other object identifier matches object identification information received by the computing system 101. For example, the computing system 101 may receive object identification information, such as a Stock Keeping Unit (SKU) number and/or a Universal Product Code (UPC), that identifies a particular item, such as a good, for retrieval by a robot (e.g., 361). In such a case, the computing system 101 may be configured to determine whether any of the objects in the container (e.g., 384A) have a barcode (e.g., 711/713) or other object identifier disposed thereon that matches the object identification information, or more specifically, whether the information encoded into the barcode (e.g., 711) or other object identifier matches the object identification information. If there is a barcode (e.g., 711) or other object identifier for which the encoded information matches the object identification information, the computing system 101 may use the barcode to, for example, determine the 2D components of one or more locations associated with the object (e.g., 371) on which the barcode (e.g., 711) or other object identifier is deployed.

In embodiments, the computing system 101 may be configured to determine an object identifier location, such as a barcode location. The barcode location may describe a 2D location of a barcode (e.g., 711/713), such as a barcode that matches object identification information. In some cases, the 2D location of the object identifier may be represented by 2D coordinates (also referred to as 2D object identifier coordinates). If the object identifier is a barcode, the 2D object identifier coordinates may be 2D barcode coordinates. For example, the computing system 101 may be configured to determine the 2D barcode coordinate [ x ] representing the location of the barcode 711 in fig. 7B_{Bar shapeCode A},y_{Bar code A}]And determines 2D barcode coordinates x representing the position of barcode 713_{Bar code B},y_{Bar code B}]. In some cases, the two-dimensional barcode coordinates may be generated by barcode sensing device 352 of fig. 7A or by some other object identifier sensing device. As described above, the barcode position may be determined for a barcode that matches the above object identification information. For example, if the computing system 101 determines that the information in the barcode 711 encoded onto the object 371 matches the SKU number received, the computing system 101 may determine the 2D barcode coordinate [ x ] of the location of the barcode 711_{Object A, Bar code},y_{Object A, Bar code}]。

In embodiments, the computing system 101 may use the information encoded to the barcode (e.g., 711) or other object identifier to determine the size or other information about the object (e.g., 371) on which the barcode is disposed. For example, the barcode 711 or other object identifier may be specific to the height h of the object 371_ObjectThe length or width of the object 371, or any other information about the size of the object 371 (also referred to as the object size). Height h_ObjectMay be used to determine a depth value for a location on the object 371, as described above.

In embodiments, the computing system 101 may be configured to use sensed object identifier information associated with the object identifier (e.g., sensed barcode information associated with the barcode 711) to determine one or more 2D locations of the object (e.g., 371) on which the object identifier is disposed, or more particularly, to determine one or more 2D coordinates (e.g., 371) of one or more respective locations on the object (e.g., 371). The 2D object coordinates may be combined with the above depth values to plan the robot's interaction with the object (also referred to as 2D object coordinates). In some cases, the 2D object coordinates may approximate the contour of an object (e.g., 371), such as a 2D boundary of the top surface of the object.

In an embodiment, 2D object coordinates of the object position may be determined based on the spatial structure information. For example, the spatial structure information may be a ring representing a field of view (e.g., 353) from the cameraA point cloud of a plurality of locations on one or more surfaces sensed by the environment. For example, the locations represented by the point cloud may be those shown in FIGS. 6A-6C and 7C, such as the various location sets 610 and 660 on the surfaces of the container surface (e.g., 384A-1), the object (371, 373), and the container bounding box (e.g., 384A-2). In such embodiments, the computing system 101 may be configured to search the point cloud using sensed object identifier information (such as sensed barcode information) to determine at least one set of one or more 2D object coordinates. The 2D object coordinates may be, for example, [ x y ] representing the position of the respective object]And (4) coordinates. For example, the 2D object coordinates may be representative of object position 650 on object 371 in FIG. 7C₁To 650₄Or object position 660 on object 373 or some or all of the positions in₁To 660₅The respective 2D coordinates of some or all of the locations. The 2D object coordinates may be combined with corresponding depth values or orientations of the objects to generate movement commands for interacting with the objects (e.g., 371/373), as discussed in more detail below.

In an embodiment, the 2D object coordinates of the object position may be determined based on the spatial structure information and based on the position of the object identifier (or more specifically its 2D object identifier coordinates). If the object identifier is a barcode, the 2D object coordinates may be determined based on the spatial structure information and the location of the barcode. More specifically, the position of the barcode (e.g., 2D barcode coordinates) may be used to reduce the portion of the spatial structure information in which to search for 2D object coordinates. More specifically, the search may be limited to a portion of the spatial structure information that corresponds to an area surrounding the location of the object identifier (or more specifically, the location of the barcode). For example, to search for a 2D location on the object 371, the computing system 101 may determine the region 721 of fig. 7C, which region 721 surrounds the barcode location of the barcode 711 of fig. 7B. That is, region 721 surrounds the 2D barcode coordinate [ x ]_{Object A, Bar code},y_{Object A, Bar code}]. In some cases, the region 721 may be a 2D region or a 3D region.

In embodiments, the computing system 101 may be configured to search for corresponding regions of spatial structure informationA portion of field 721 to identify an object location corresponding to object 711 or, more specifically, to search for a location that falls on object 711. Thus, rather than searching all of the locations represented by the spatial structure information depicted in fig. 7C, computing system 101 may search a subset of the locations represented by the spatial structure information. More specifically, the subset of locations may be those locations in region 721. The computing system 101 may be configured to search for a location on the object 371 by, for example, identifying a location that has a sufficiently significant difference in depth from surrounding locations on the container surface 384A-1. In some cases, the computing system 101 may determine 2D object coordinates for a location on the object. For example, if the spatial structure information provides a location 650 that falls on the object 371₁Is [ x y z ]]Coordinates, then computing system 101 may compare x y]The 2D object coordinates are determined as the object position. Although in this example, the spatial structure information may also provide the z-component of the location, as described above, the z-component may be unreliable due to noise. More specifically, the z-component may have sufficient accuracy for the computing system 101 to distinguish between locations on the object 371 and locations on the surrounding container surface 384A-1, but may lack sufficient accuracy to plan the robot's interaction with the object 371. Thus, as discussed further above, the computing system 101 may use the container pose to determine the object position 650₁(or more generally have 2D object coordinates or 2D components [ x y)]Object position) of the image. In some cases, such as object position 650₁May be combined with the corresponding depth value for the location to form a more reliable 3D coordinate of the object location.

FIG. 7C also depicts the 2D barcode coordinate [ x ] around the 2D object identifier coordinate (or more specifically, the 2D barcode coordinate of barcode 713)_{Object B, bar code},y_{Object B, bar code}]) The region 723. The computing system 101 may be configured to search the region 723 to determine an object position of the object 373. In an embodiment, the area 721/723 may have a given fixed size. In an embodiment, region 721/723 may have a size based on the object size of object 371/373.

In an embodiment, the object position of an object (e.g., 371) may be determined based on the object size of the object, which may be encoded, for example, into a barcode (e.g., 711) or other object identifier disposed on the object. For example, the object size may indicate the length and width of the object (e.g., 371). The computing system 101 may be configured to estimate 2D coordinates representing, for example, edges or other boundaries of an object (e.g., 371) based on the object size. As an example, the computing system 101 may estimate that a particular edge of an object is some distance away from a barcode location or other object identifier location based on the length or width of the object. The computing system 101 may use the distance to determine 2D coordinates indicating where the particular bezel is located.

In an embodiment, 2D object identifier coordinates (such as 2D barcode coordinates of a barcode location) may be determined based on information sensed by an object identifier sensing device (e.g., barcode sensing device 352 of fig. 7A). For example, a barcode sensing device (e.g., 352) may generate [ x y ] coordinates as 2D barcode coordinates and transmit [ x y ] coordinates to computing system 101. If necessary, computing system 101 may be configured to convert the [ x y ] coordinates from being represented in the coordinate system of the object identifier sensing device (e.g., barcode sensing device 352) to being represented in another coordinate system, such as the coordinate system of the spatial structure sensing camera (e.g., 351). As described above, in some cases, the object identifier sensing device (e.g., barcode sensing device 352) may include a 2D camera (e.g., 153 of fig. 1E). In such a case, the object identifier sensing device (e.g., barcode sensing device 352) may be configured to capture a 2D image. For example, fig. 7B may represent a 2D image, the 2D image representing a field of view (e.g., 355) of the barcode sensing device 352. The object identifier sensing device (e.g., barcode sensing device 352) and/or computing system 101 may be configured to detect a barcode (e.g., 711/713) or other object identifier from the 2D image and determine 2D object identifier coordinates based on where the object identifier (e.g., barcode 711/713) appears in the 2D image.

In an embodiment, if a 2D image is generated, it may be used to determine 2D object coordinates. For example, if computing system 101 receives a 2D image, it may be configured to detect an edge or other boundary of an object (e.g., 371) appearing in the 2D image and determine 2D object coordinates representing the object based on where the edge appears in the 2D image. In some cases, the computing system 101 may be configured to limit its search for edges or other boundaries to only a portion of the 2D image. In such a case, the portion of the 2D image in which the search is performed may be based on an object identifier location, such as a barcode location of a barcode (e.g., 711) disposed on the object, or based on a location in the 2D image at which the barcode appears.

In embodiments, the computing system 101 may be configured to estimate the 2D position of the object based on the object identifier position (such as the barcode position of a neighboring barcode). The adjacent barcode is not disposed on the object and may be disposed on an adjacent object. For example, FIG. 8 illustrates a case where an object 377 is deployed within a container 384A and no barcode is deployed on the object. In this example, the computing system 101 may be configured to triangulate (triangulate) or otherwise determine the 2D position of the object 377 using the 2D barcode positions of the

barcodes

711, 713, 716 disposed on the neighboring

objects

371, 373, 376, respectively. For example, the computing system 101 may be configured to determine the 2D barcode coordinate [ x ] whose boundaries are defined by the

barcodes

711, 713, 716, respectively_{Object A, Bar code}Y object_{A, bar code}]、[x_{Object B, bar code},y_{Object B, bar code}]、[x_{Object C, Bar code},y_{Object C, Bar code}]A defined area and searching for an object 377 in the area. More specifically, the computing system 101 may search for a location falling on the object 377 in a portion of the spatial structure information corresponding to the region.

In an embodiment, if a barcode sensing device (or some other object identifier sensing device) and/or spatial structure sensing camera is attached to the robotic arm (e.g., 353 of fig. 3A), the computing system 101 may be configured to control the placement of the device/camera (e.g., 352/351) by causing the robotic arm to move. For example, the computing system 101 may be configured to generate and output sensor movement commands to cause the robotic arm 363 to move an object identifier sensing device (e.g., barcode sensing device 352) and/or a spatial structure sensing camera (e.g., 351) to a desired position and/or orientation. The sensor movement command may, for example, cause a device (e.g., 352/351) to be moved to a location within a given proximity. In some cases, the intended proximity may be based on a focal length of the object identifier sensing device. More specifically, the sensor movement command may cause the object identifier sensing device to be moved close enough to the object in the container (e.g., 354A) that any barcode (e.g., 711) on the object will be within the focal distance of the object identifier sensing device. In an embodiment, the spatial structure information received in step 402 and the sensed barcode information or other object identifier information may be generated after the device (e.g., 352/351) has been moved due to the sensor movement command.

In embodiments, the sensor movement command may cause the spatial structure information and/or barcode sensing device (or any other object identifier sensing device) to be moved within a proximity range such that the spatial structure information and/or sensed barcode information represents or covers only a portion of the container surface (e.g., 384A-1). For example, FIG. 9 illustrates a situation in which only a portion of the container surface (e.g., 384A-1), or more generally a portion of one side of the container (e.g., 384A), is captured by the spatial structure sensing camera 351 and/or the barcode sensing device 352. That is, only a portion of the container surface (e.g., 384A-1) may be in the camera view (e.g., 353) or the reader view (e.g., 355) in such proximity. It may not be necessary to capture information for the entire container surface (e.g., 384A-1) or the entire container (e.g., 384A). Rather, capturing only a portion of the container (e.g., 384A) may allow the computing system 101 to focus on a particular portion of the container (e.g., 384), such as the right half thereof, and more particularly, on detecting objects in this portion. In some cases, computing system 101 may limit how many times a spatial structure sensing camera (e.g., 351) and/or an object identifier sensing device (e.g., barcode sensing device 352) is moved or how many locations the camera/device (e.g., 351/352) is moved to capture information about a particular container (e.g., 384A). For example, a camera/device (e.g., 351/352) may be moved to a single location only once to capture a snapshot of a particular container.

In embodiments, the computing system 101 may be configured to perform segmentation of a particular container (e.g., 384A) by associating different regions on the container surface (e.g., 384A-1) with different segments. For example, FIG. 10 depicts a case where container surface 384A-1 may be virtually divided into segments 1001 through 1006. In this case, the computing system 101 may be configured to receive a container segment identifier associated with the object. In one example, computing system 101 may receive a container segment identifier that identifies segment 1006 or, more specifically, indicates a desire to interact with a robot of an object (e.g., 371) deployed within segment 1006. In an embodiment, computing system 101 may be configured to determine a location on a container surface (e.g., 384A-1) associated with a container segment identifier. In some cases, determining these locations may include determining their depth values, which may involve using at least one of a container bounding box pose or a container surface pose, as described above.

In an embodiment, the method 400 may include step 408 in which the computing system 101 outputs movement commands to cause the robot to interact with an object (e.g., 371/373), such as a robotic arm (e.g., 363) grabbing or otherwise picking up the object. Such movement commands may also be referred to as object movement commands. In some cases, step 408 may be performed by the motion planning module 206 of fig. 2C, which motion planning module 206 may be configured to, for example, generate object movement commands, the above-described sensor movement commands, and container movement commands (discussed below). The object movement command may be generated by the computing system 101 based on, for example, the object pose (such as the orientation or depth value of the object (e.g., 371)) determined in step 406. For example, object movement commands may be determined to cause a robot arm or other end effector on a robot arm (e.g., 363) to be moved into range for manipulating or otherwise interacting with an object (e.g., 371), and to an orientation that matches the orientation of the object (e.g., 371). In embodiments, the movement command may cause a rotation or other actuation, for example, to deploy the end effector in such a position and/or orientation. In some cases, the move command may be generated based on the 2D object coordinates and their corresponding depth values provided by the object poses. For example, object movement commands may be generated to bring the end effector into proximity with the 2D object coordinates to a given proximity, which allows the end effector to manipulate or otherwise interact with the object (e.g., 371).

In an embodiment, an object movement command may be determined to avoid a collision event. A collision event may represent a collision between an object being moved (e.g., 371) and, for example, a container sidewall (e.g., 384A-5 of fig. 3D) or other container boundary forming a container border (e.g., 384A-2). In some cases, the computing system 101 may be configured to determine a path of object movement that avoids such a collision event. The object movement path may be a movement path of an object (e.g., 371) being moved. The object movement command may be generated based on the object movement path. In some cases, accurate determination of the object pose in step 406 may facilitate such collision avoidance.

As described above, in some embodiments, the method 400 may begin with the container (e.g., 384A) already in an open position (such as shown in fig. 3C). In an embodiment, the method 400 may begin with the container (e.g., 384A) in a closed position (such as shown in fig. 3A and 11A). In such embodiments, the method 400 may include the step of the computing system 101 controlling the robotic arm (e.g., 363) to move the container (e.g., 384A) to the open position. Such a container opening step may occur prior to receiving the spatial structure information in step 402.

For example, fig. 11A and 11B illustrate the container 384A in a closed position. In such a case, the spatial structure sensing camera 351 may be configured to generate spatial structure information describing the outer surface 384A-7 of the container 384A, or more particularly, describing a location on the outer surface 384A-7. In this example, the spatial structure information may be different from the spatial structure information of step 402, the spatial structure information of step 402 relating to an open container condition. Computing system 101 may be configured to determine one or more locations of a handle 384A-6 representing container 384A based on spatial structure information describing outer surface 384A-7. Further, the computing system 101 may be configured to generate and output a container movement command to cause the robotic arm 363 to move the container 384A from the closed position to the open position. The container movement commands may be generated based on one or more positions representing handles 384A-6 (also referred to as container handle positions). More specifically, as shown in fig. 11B and 11C, the container movement command may cause a robot arm 363D or other end effector of the robotic arm 363 to pull on the handles 384A-6 to slide the container 384A to an open position. After the container 384A is in the open position, in some cases, the spatial structure sensing camera 351 may be moved to another position (e.g., via a sensor movement command) to capture spatial structure information about the container surface and the objects disposed thereon, after which an object pose may be determined, and the objects may be moved based on the object pose (e.g., via an object movement command).

Additional discussion of various embodiments:

embodiment 1 relates to a computing system comprising a communication interface and at least one processing circuit. The communication interface is configured to communicate with a robot having a robotic arm with a spatial structure sensing camera disposed on the robotic arm, wherein the spatial structure sensing camera has a camera field of view. At least one processing circuit is configured to perform the following operations when an object within a container is or has been in the camera field of view when the container is in an open position: receiving spatial structure information including depth information of an environment in the camera field of view, wherein the spatial structure information is generated by the spatial structure sensing camera; determining a container pose based on the spatial structure information, wherein the container pose is used to describe at least one of an orientation of the container or a depth value of at least a portion of the container; determining an object pose based on the container pose, wherein the object pose is used to describe at least one of an orientation of the object or a depth value of at least a portion of the object; outputting a movement command for causing the robot to interact with the object, wherein the movement command is generated based on the object pose.

Embodiment 2 includes the computing system of embodiment 1. In this embodiment, the at least one processing circuit is configured to determine the container pose as a container surface pose describing at least one of: an orientation of a surface of a container on which the object is disposed, or a depth value of at least one location on the surface of the container.

Embodiment 3 includes the computing system of embodiment 2, wherein, when the container is a drawer having a container bezel offset from the container surface, the at least one processing circuit is configured to determine a container bezel pose describing at least one of: an orientation of the container bounding box or a depth value of at least one location on the container bounding box, wherein the container bounding box pose is determined based on the spatial structure information, and wherein the container surface pose is determined based on the container bounding box pose and based on an established distance between the container bounding box and the container surface.

Embodiment 4 includes the computing system of

embodiment

2 or 3. In this embodiment, the at least one processing circuit is configured to: receiving a container segment identifier associated with the object, wherein the container segment identifier is used to identify a segment of the container surface, and determining a location associated with the container segment identifier, wherein the determination is based on the container surface pose.

Embodiment 5 includes the computing system of

embodiment

3 or 4. In this embodiment, the at least one processing circuit is configured to determine an object movement path that avoids a collision event, wherein the collision event represents a collision between the object and a container boundary that forms the container rim, and wherein the movement command is generated based on the object movement path.

Embodiment 6 includes the computing system of any of embodiments 1-5. In this embodiment, when the object identifier is disposed on the object, the at least one processing circuit is configured to: determining an object identifier location describing a 2D location of the object identifier; and determining a set of one or more object positions representing one or more positions of the object based on the object identifier position and the spatial structure information, wherein the movement command is generated based on the set of one or more object positions.

Embodiment 7 includes the computing system of embodiment 6. In this embodiment, the at least one processing circuit is configured to determine an area surrounding the object identifier location, and to determine the set of one or more object locations by searching a portion of the spatial structure information, the portion of the spatial structure information corresponding to the determined area surrounding the object identifier location.

Embodiment 8 includes the computing system of embodiment 7. In this embodiment, the spatial structure information comprises a point cloud representing a plurality of locations on one or more surfaces sensed from the environment in the camera field of view, wherein the portion of the spatial structure information from which the set of one or more object locations is searched comprises a subset of the plurality of locations within the determined area surrounding the object identifier location.

Embodiment 9 includes the computing system of any of embodiments 6-8. In this embodiment, the at least one processing circuit is configured to: determining at least one 2D object identifier coordinate, the at least one 2D object identifier coordinate being a 2D coordinate for representing a location of the object identifier; and determining at least a set of one or more 2D object coordinates based on the 2D object identifier coordinates, wherein the one or more 2D object coordinates are one or more respective 2D coordinates for representing the one or more object locations, wherein the move command is generated based on the set of one or more 2D object coordinates and based on an orientation and a depth value of the object.

Embodiment 10 includes the computing system of any of embodiments 6-9. In this embodiment, the at least one processing circuit is configured to determine an object size based on information encoded in the object identifier, wherein the one or more object locations represent a boundary of the object and are determined based on the object size.

Embodiment 11 includes the computing system of any of embodiments 6-10. In this embodiment, the at least one processing circuit is configured to receive object identification information associated with the object and determine whether information encoded in the object identifier matches the object identification information.

Embodiment 12 includes the computing system of any of embodiments 6-11. In this embodiment, when an object identifier sensing device is disposed on the robotic arm, the at least one processing circuit is configured to determine the object identifier position based on information sensed by the object identifier sensing device.

Embodiment 13 includes the computing system of embodiment 12. In this embodiment, the movement command is an object movement command for causing the robot to move the object, wherein the at least one processing circuit is configured to output a sensor movement command to cause the robotic arm to move the object identifier sensing device into a given proximity to the container, wherein the object identifier position is determined after outputting the sensor movement command.

Embodiment 14 includes the computing system of embodiment 13. In this embodiment, the sensor movement command is also for causing the robotic arm to move the spatial structure sensing camera to within the established proximity to the container, wherein the spatial structure information is generated when the spatial structure sensing camera is within the established proximity to the container and represents a portion of a container surface on which the object is disposed.

Embodiment 15 includes the computing system of embodiment 13 or 14. In this embodiment, when the container is in the closed position and includes a handle, the at least one processing circuit is configured to: receiving additional spatial structure information describing a position on an outer surface of the container, determining one or more handle positions representing the handle based on the additional spatial structure information, and outputting a container movement command for the robotic arm to move the container from the closed position to the open position, wherein the container movement command is generated based on the one or more handle positions, and wherein the sensor movement command and the object movement command are output after the container movement command.

It will be apparent to those of ordinary skill in the relevant art that other suitable adaptations and modifications of the methods and applications described herein may be made without departing from the scope of any of the embodiments. The above-described embodiments are illustrative examples and should not be construed as limiting the invention to these particular embodiments. It should be understood that the various embodiments disclosed herein may be combined in different combinations than those specifically presented in the description and drawings. It will also be understood that certain actions or events of any of the processes or methods described herein can be performed in a different order, may be added, merged, or left out entirely (e.g., all described acts or events may not be necessary to perform the methods or processes), according to examples. Additionally, although certain features of the embodiments herein are described as being performed by a single component, module, or unit for clarity, it should be understood that the features and functions described herein can be performed by any combination of components, units, or modules. Accordingly, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

Claims

1. A computing system, comprising:

a communication interface configured to communicate with a robot having a robotic arm with a spatial structure sensing camera disposed on the robotic arm, wherein the spatial structure sensing camera has a camera field of view;

at least one processing circuit configured to perform the following operations when an object within a container is or has been in the camera field of view when the container is in an open position:

receiving spatial structure information including depth information of an environment in the camera field of view, wherein the spatial structure information is generated by the spatial structure sensing camera;

determining a container pose based on the spatial structure information, wherein the container pose is used to describe at least one of an orientation of the container or a depth value of at least a portion of the container;

determining an object pose based on the container pose, wherein the object pose is used to describe at least one of an orientation of the object or a depth value of at least a portion of the object;

outputting a movement command for causing the robot to interact with the object, wherein the movement command is generated based on the object pose.

2. The computing system of claim 1, wherein the at least one processing circuit is configured to determine the container pose as a container surface pose describing at least one of: an orientation of a surface of a container on which the object is disposed, or a depth value of at least one location on the surface of the container.

3. The computing system of claim 2, wherein when the container is a drawer having a container bezel offset from the container surface, the at least one processing circuit is configured to determine a container bezel pose describing at least one of: an orientation of the container rim or a depth value of at least one location on the container rim, wherein the container rim pose is determined based on the spatial structure information, and

wherein the container surface pose is determined based on the container rim pose and based on an established distance between the container rim and the container surface.

4. The computing system of claim 3, wherein the at least one processing circuit is configured to:

receiving a container segment identifier associated with the object, wherein the container segment identifier is used to identify a segment of the container surface, an

Determining a location associated with the container segment identifier, wherein the determination is based on the container surface pose.

5. The computing system of claim 3, wherein the at least one processing circuit is configured to determine an object movement path that avoids a collision event, wherein the collision event represents a collision between the object and a container boundary that forms the container rim, and wherein the movement command is generated based on the object movement path.

6. The computing system of claim 1, wherein, when an object identifier is disposed on the object, the at least one processing circuit is configured to:

determining an object identifier location describing a 2D location of the object identifier; and

determining a set of one or more object positions based on the object identifier position and the spatial structure information, the one or more object positions being one or more positions representing the object,

wherein the movement command is generated based on the set of one or more object positions.

7. The computing system of claim 6, wherein the at least one processing circuit is configured to determine an area surrounding the object identifier location, and to determine the set of one or more object locations by searching a portion of the spatial structure information corresponding to the determined area surrounding the object identifier location.

8. The computing system of claim 7, wherein the spatial structure information comprises a point cloud representing a plurality of locations on one or more surfaces sensed from the environment in the camera field of view, wherein the portion of the spatial structure information from which the set of one or more object locations is searched comprises a subset of the plurality of locations that are within the determined area surrounding the object identifier location.

9. The computing system of claim 6, wherein the at least one processing circuit is configured to:

determining at least one 2D object identifier coordinate, the at least one 2D object identifier coordinate being a 2D coordinate for representing a location of the object identifier; and

based on the 2D object identifier coordinates, at least a set of one or more 2D object coordinates is determined, wherein the one or more 2D object coordinates are one or more respective 2D coordinates for representing the one or more object locations, wherein the move command is generated based on the set of one or more 2D object coordinates and based on an orientation and a depth value of the object.

10. The computing system of claim 6, wherein the at least one processing circuit is configured to determine an object size based on information encoded in the object identifier, wherein the one or more object locations represent a boundary of the object and are determined based on the object size.

11. The computing system of claim 6, wherein the at least one processing circuit is configured to receive object identification information associated with the object, and to determine whether information encoded in the object identifier matches the object identification information.

12. The computing system of claim 6, wherein when an object identifier sensing device is disposed on the robotic arm, the at least one processing circuit is configured to determine the object identifier location based on information sensed by the object identifier sensing device.

13. The computing system of claim 12, wherein the movement command is an object movement command for causing the robot to move the object,

wherein the at least one processing circuit is configured to output a sensor movement command to cause the robotic arm to move the object identifier sensing device into a given proximity to the container,

wherein the object identifier position is determined after outputting the sensor movement command.

14. The computing system of claim 13, wherein the sensor movement commands are further to cause the robotic arm to move the spatial structure sensing camera to within the established proximity to the container,

wherein the spatial structure information is generated when the spatial structure sensing camera is located within the established proximity to the container and represents a portion of a container surface on which the object is disposed.

15. The computing system of claim 13, wherein when the container is in a closed position and includes a handle, the at least one processing circuit is configured to:

receiving additional spatial structure information describing a location on an exterior surface of the container,

determining one or more handle positions for representing the handle based on the additional spatial structure information, an

Outputting a container movement command for causing the robotic arm to move the container from the closed position to the open position, wherein the container movement command is generated based on the one or more handle positions, and wherein the sensor movement command and the object movement command are output subsequent to the container movement command.

16. A non-transitory computer-readable medium having instructions that, when executed by at least one processing circuit of a computing system, cause the at least one processing circuit to:

receiving spatial structure information, wherein the computing system is configured to communicate with a robot having a robotic arm with a spatial structure sensing camera having a camera field of view disposed thereon, wherein the spatial structure information is generated by the spatial structure sensing camera, wherein the spatial structure information includes depth information of an environment in the camera field of view and is generated when an object within a container is or has been in the camera field of view when the container is in an open position;

determining an object pose based on the container pose, wherein the object pose is used to describe at least one of an orientation of the object or a depth value of at least a portion of the object; and

17. The non-transitory computer-readable medium of claim 16, wherein the instructions, when executed by the at least one processing circuit, cause the at least one processing circuit to determine the container pose as a container surface pose describing at least one of: an orientation of a surface of a container on which the object is disposed, or a depth value of at least one location on the surface of the container.

18. The non-transitory computer-readable medium of claim 17, wherein the instructions, when executed by the at least one processing circuit, and when the container is a drawer having a container bezel offset from the container surface, cause the at least one processing circuit to determine a container bezel pose describing at least one of: an orientation of the container rim or a depth value of at least one location on the container rim, wherein the container rim pose is determined based on the spatial structure information, an

19. A method for object detection, comprising:

receiving, by a computing system, spatial structure information, wherein the computing system is configured to communicate with a robot having a robotic arm with a spatial structure sensing camera having a camera field of view disposed on the robotic arm, wherein the spatial structure information is generated by the spatial structure sensing camera, wherein the spatial structure information includes depth information of an environment in the camera field of view and is generated when an object within a container is or has been in the camera field of view when the container is in an open position;

20. The method of claim 19, wherein determining the container pose comprises determining a container surface pose describing at least one of: an orientation of a surface of a container on which the object is disposed, or a depth value of at least one location on the surface of the container.