US12228388B2 - Systems and methods for volumetric sizing - Google Patents
Systems and methods for volumetric sizing Download PDFInfo
- Publication number
- US12228388B2 US12228388B2 US18/330,248 US202318330248A US12228388B2 US 12228388 B2 US12228388 B2 US 12228388B2 US 202318330248 A US202318330248 A US 202318330248A US 12228388 B2 US12228388 B2 US 12228388B2
- Authority
- US
- United States
- Prior art keywords
- pixels
- computer system
- ground plane
- executable
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01B—MEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
- G01B11/00—Measuring arrangements characterised by the use of optical techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Definitions
- aspects of embodiments of the present invention relate to systems and methods for estimating physical dimensions of objects.
- Measuring or estimating the dimensions of objects, including the volumes of objects, is a common task in fields such as resource planning and logistics. For example, when loading boxes into one or more trucks, estimates of the sizes and shapes of the boxes can help in the efficient distribution of the boxes among the different trucks to reduce or minimize empty space in the trucks. As another example, freight or shipping companies may bill their customers in accordance with the dimensions (and mass or weight) of the packages to be shipped.
- mail order retailers may be interested in identifying the correctly sized box for shipping various retail goods. While many of these goods may be cuboidal in shape (e.g., because they are sold in boxes), many other goods (such as a bottle of laundry detergent or a gardening trowel) may have irregular shapes. To reduce shipping costs, these mail order retailers may desire to find the minimal sized box that will contain the items to be shipped as part of a particular customer's order.
- aspects of embodiments of the present invention relate to systems and methods for automatically estimating the dimensions of an object, including the volume of an object.
- a system includes: a depth camera system configured to capture color information and depth information of a scene; a processor configured to control the depth camera system; a memory storing instructions that, when executed by the processor, cause the processor to: control the depth camera system to capture at least a frame of the scene, the frame including a color image and a depth image arranged in a plurality of pixels; detect an object in the frame; determine a ground plane in the frame, the object resting on the ground plane; compute a rectangular outline bounding a projection of a plurality of pixels of the object onto the ground plane; compute a height of the object above the ground plane; and output computed dimensions of the object in accordance with a length and a width of the rectangular outline and the height.
- the memory may further store instructions that, when executed by the processor, cause the processor to segment the object from the scene by: identifying one or more initial pixels of the object; and performing an iterative flood fill operation, starting with the initial points of the object, each iteration of the flood fill operation including adding a plurality of neighboring pixels of the frame to the pixels of the object when distances between the neighboring pixels and the pixels of the object are within a threshold distance.
- the system may further include a display device coupled to the processor, wherein the memory may further include instructions that, when executed by the processer, cause the processor to: control the display device to display a view of the color image of the scene captured by the depth camera; and overlay a reticle on the view in the display device, and wherein the one or more initial pixels of the object may correspond to the pixels under the reticle.
- the system may further include a trigger, wherein the processor may be configured to control the depth camera system to capture the frame in response to detecting an activation of the trigger, and wherein the one or more initial pixels of the object may correspond to the pixels under the reticle when the trigger is activated.
- the memory may further store instructions that, when executed by the processor, cause the processor to segment the object from the scene by: defining a graph, wherein each vertex of the graph corresponds to a vertical projection of the pixels of the scene onto the ground plane and wherein two vertices are connected by an edge if their distance is smaller than threshold; detecting connected components of the vertical projection of the pixels; and identifying the largest connected component of the graph as the pixels of the object.
- the system may further include a display device coupled to the processor, wherein the instructions configured to output the computed dimensions may include instructions to display the computed dimensions on the display device.
- the system may further include an inertial measurement unit rigidly connected to the depth camera system and configured to detect an orientation of the depth camera system
- the memory may further include instructions that, when executed by the processer, cause the processor to determine the ground plane by: identifying, when capturing the frame, an orientation of the depth camera system based on data from the inertial measurement unit; identifying a plurality of bottom pixels of the frame based on the orientation; computing a partial plane from the bottom pixels of the frame; and extending the partial plane in the depth image to define the ground plane.
- a width of a strip of the bottom pixels is calculated in accordance with noise characteristics of the depth camera system.
- the memory may further store instructions that, when executed by the processor, cause the processor to further determine the computed dimensions in accordance with a box mode, the object including two vertical faces, the instructions corresponding to the box mode including instructions that, when executed by the processor, cause the processor to: identify a corner of the object, wherein the corner of the object is located at an intersection of two lines formed in the projection of the pixels of the vertical faces of the object onto the ground plane; compute the height of the object above the ground plane by computing heights of top edges of the two vertical faces; and compute dimensions of vertical planes of the object.
- the memory may further store instructions that, when executed by the processor, cause the processor to compute the heights of the top edges of the two vertical faces by: dividing the ground plane into a plurality of cells; selecting a plurality of cells including the lines; computing a maximum height of each cell based on the pixels of the object in each of the cells; and computing a height of the box based on the maximum heights of the cells.
- the memory may further store instructions that, when executed by the processor, cause the processor to activate or deactivate the box mode based on a user interface switch.
- the memory may further store instructions that, when executed by the processor, cause the processor to activate the box mode in response to detecting that the object includes two vertical planes arranged at right angles to the ground plane and at right angles to each other.
- the depth camera system may include: a color camera; a plurality of infrared cameras; and an infrared illuminator configured to emit light in a wavelength interval that is detectable by the plurality of infrared cameras.
- the memory may store instructions that, when executed by the processor, cause the processor to capture the frame of the scene by: controlling the color camera and the plurality of infrared cameras to concurrently capture images while controlling the infrared illuminator to emit light; computing a disparity map from the images captured by the infrared cameras; calculating the depth image of the frame from the disparity map; and mapping the image captured by the color camera onto the disparity map as the color image of the frame, wherein the images are captured from substantially the same pose with respect to the scene.
- the color camera, the infrared cameras, and the infrared illuminator may be fixed on a stationary frame, and the color camera and the infrared cameras may have fields of view directed a scale.
- the color camera, the infrared cameras, and the infrared illuminator may be mounted on a handheld scanning device.
- the computed dimensions of the object may correspond to dimensions of a box tightly fitting the object.
- a method for computing dimensions of an object in a scene includes: controlling, by a processor, a depth camera system to capture at least a frame of the scene, the frame including a color image and a depth image arranged in a plurality of pixels; detecting, by the processor, an object in the frame; determining, by the processor, a ground plane in the frame, the object resting on the ground plane; computing, by the processor, a rectangular outline bounding a projection of a plurality of pixels of the object onto the ground plane; computing, by the processor, a height of the object above the ground plane; and outputting, by the processor, computed dimensions of the object in accordance with a length and a width of the rectangular outline and the height.
- the method may further include segmenting the object from the scene by: identifying one or more initial pixels of the object; and performing an iterative flood fill operation, starting with the initial points of the object, each iteration of the flood fill operation including adding a plurality of neighboring pixels of the frame to the pixels of the object when distances between the neighboring pixels and the pixels of the object are within a threshold distance.
- the method may further include: controlling a display device coupled to the processor to display a view of the color image of the scene captured by the depth camera; and overlaying a reticle on the view in the display device, and wherein the one or more initial pixels of the object correspond to the pixels under the reticle.
- the method may further include controlling the depth camera system to capture the frame in response to detecting an activation of a trigger coupled to the processor, and wherein the one or more initial pixels of the object correspond to the pixels under the reticle when the trigger is activated.
- the method may further include segmenting the object from the scene by: defining a graph, wherein each vertex of the graph corresponds to a vertical projection of the pixels of the scene onto the ground plane and wherein two vertices are connected by an edge if their distance is smaller than threshold; detecting connected components of the vertical projection of the pixels; and identifying the largest connected component of the graph as the pixels of the object.
- the method may further include displaying the computed dimensions on a display device coupled to the processor.
- the method may further include determining the ground plane by: identifying, when capturing the frame, an orientation of the depth camera system based on data from an inertial measurement unit rigidly connected to the depth camera system; identifying a plurality of bottom pixels of the frame based on the orientation; computing a partial plane from the bottom pixels of the frame; and extending the partial plane in the depth image to define the ground plane.
- a width of a strip of the bottom pixels may be calculated in accordance with noise characteristics of the depth camera system.
- the method may further include determining the computed dimensions in accordance with a box mode, the object including two vertical faces, by: identifying a corner of the object, wherein the corner of the object is located at an intersection of two lines formed in the projection of the pixels of the vertical faces of the object onto the ground plane; computing the height of the object above the ground plane by computing heights of top edges of the two vertical faces; and computing dimensions of vertical planes of the object.
- the method may further include computing the heights of the top edges of the two vertical faces by: dividing the ground plane into a plurality of cells; selecting a plurality of cells including the lines; computing a maximum height of each cell based on the pixels of the object in each of the cells; and computing a height of the box based on the maximum heights of the cells.
- the method may further include activating or deactivating the box mode based on a user interface switch.
- the method may further include activating the box mode in response to detecting that the object includes two vertical planes arranged at right angles to the ground plane and at right angles to each other.
- the depth camera system may include: a color camera; a plurality of infrared cameras; and an infrared illuminator configured to emit light in a wavelength interval that is detectable by the plurality of infrared cameras.
- the method may further include: controlling the color camera and the plurality of infrared cameras to concurrently capture images while controlling the infrared illuminator to emit light; computing a disparity map from the images captured by the infrared cameras; calculating the depth image of the frame from the disparity map; and mapping the image captured by the color camera onto the disparity map as the color image of the frame, wherein the images are captured from substantially the same pose with respect to the scene.
- the color camera, the infrared cameras, and the infrared illuminator may be fixed on a stationary frame, and the color camera and the infrared cameras may have fields of view directed a scale.
- the color camera, the infrared cameras, and the infrared illuminator may be mounted on a handheld scanning device.
- the computed dimensions of the object may correspond to dimensions of a box tightly fitting the object.
- FIG. 1 A is a schematic depiction of the measurement of an object a system according to one embodiment of the present invention.
- FIGS. 1 B and 1 C are schematic depictions of user interfaces of a system according to one embodiment of the present invention when measuring the dimensions of a box-like object ( FIG. 1 B ) and a non-box-like (or arbitrary) object ( FIG. 2 B ).
- FIG. 2 is a block diagram of a stereo depth camera system according to one embodiment of the present invention.
- FIG. 3 is a flowchart of a method for measuring dimensions of object according to one embodiment of the present invention.
- FIG. 4 A is a depiction of a depth map of a scene depicting a bottle of laundry detergent on a table.
- FIG. 4 B is an orthogonal view of the depth map shown in FIG. 4 A with the ground plane aligned perpendicular to the optical axis of the virtual camera.
- FIG. 4 C depicts the vertically projected points of the object 10 in white and the rest of the image in black, with a red rectangle on the ground plane that contains all the vertical projections of the object's surface points according to one embodiment of the present invention.
- FIG. 4 D is a color image of the scene including a bottle as depicted in the depth map of FIG. 4 A , with a bounding box computed in accordance with embodiments of the present invention overlaid on the view of the bottle.
- FIG. 5 A is a schematic illustration of noise in a depth sensing system according to one embodiment of the present invention.
- FIG. 5 B is a schematic illustration of interactions between objects in a scene and noise in a depth sensing system according to one embodiment of the present invention.
- FIG. 5 C is a flowchart of a method for computing a virtual ground plane according to one embodiment of the present invention.
- FIG. 6 is a flowchart of a method for measuring dimensions of a box-like object in accordance with one embodiment of the present invention.
- FIG. 7 A is a color photograph of a scene containing a box in the foreground and some clutter in the background.
- FIG. 7 B is a depth map of the scene, where the box in the foreground is shown in red, indicating that it is closer to the depth camera system and with the background clutter in blue, indicating that the clutter is farther from the depth camera system.
- FIG. 7 C is an example of the projection of the visible points of the box shown in FIG. 7 B onto the ground plane when viewed from “above” (e.g., along the direction of gravity).
- FIG. 7 D is a pictorial representation of a method for estimating the vertical surfaces extent according to one embodiment of the present invention.
- FIGS. 8 A, 8 B, and 8 C are histograms of colors computed from the RGB image for possible candidates of vertical sides extents (thin green lines of FIG. 7 D ) according to one embodiment of the present invention.
- aspects of embodiments of the present invention relate to systems and methods for automatically estimating physical dimensions of objects in a scene. Some aspects of embodiments of the present invention relate to “contactless” measurements of physical objects, wherein a depth camera captures one or more depth images of an object and the dimensions of an object (e.g., length, width, height, and volume), or a bounding box thereof are estimated from the one or more depth images.
- a depth camera captures one or more depth images of an object and the dimensions of an object (e.g., length, width, height, and volume), or a bounding box thereof are estimated from the one or more depth images.
- FIG. 1 A is a schematic depiction of the measurement of an object a system according to one embodiment of the present invention.
- a depth camera system 100 captures images of an object 10 .
- the object 10 may be, for example, a substantially cuboidal object (e.g., a rectangular cardboard box), as shown in FIG. 1 , or may have a more arbitrary shape (e.g., a bottle of laundry detergent or a gardening trowel).
- the depth camera system 100 may include a display device 122 for displaying the measurements captured by the depth camera system 100 .
- the display device 122 may be physically separate from the cameras of the depth camera system 100 , such as in the case of a separate reporting or monitoring system.
- FIGS. 1 B and 1 C are schematic depictions of user interfaces of a system according to one embodiment of the present invention when measuring the dimensions of a box-like object ( FIG. 1 B ) and a non-box-like (or arbitrary) object ( FIG. 1 C ).
- the display device 122 of a system 100 displays a two-dimensional (2D) view 210 of an object 10 (a rectangular prism or a “box” in FIG. 1 B or a box with a long handle extending therefrom in FIG. 1 C ) being measured by the system.
- the view 210 may also include a reticle or crosshairs 212 .
- the system computes a three-dimensional (3D) bounding box 220 around the object 10 having a length (L), a width (W), and a height (H). Accordingly, the dimensions 230 of the object 10 (e.g., a minimal bounding box around the object 10 ) can be automatically computed and displayed to a user on the display device 122 .
- the system 100 may be in communication with an electronic scale or electronic balance that the object 10 is resting on, and the measured mass or weight 240 of the object 10 may also be shown on the display 122 of the system 100 .
- the weight or mass of the object may have been previously measured and stored in a memory (e.g., in a database) and retrieved for display on the display device 122 .
- Measuring the dimensions of a cuboidal or box-shaped object is of particular interest in fields such as shipping and logistics, where boxes of various sizes are encountered much more frequently than other shapes. Furthermore, the geometrically regular shape of a cuboidal object allows for optimizations to be made based on assumptions of characteristics of the object 10 . These optimizations will be described in more detail below.
- the depth camera system 100 is able to acquire color information (e.g., information about the colors of the surface of an object or its surface “texture”) and geometric information (e.g., information about the size and shape of an object), such as an RGB-D (red, green, blue, and depth) camera.
- color information e.g., information about the colors of the surface of an object or its surface “texture”
- geometric information e.g., information about the size and shape of an object
- RGB-D camera is used to refer to such a system that can acquire color and geometric information, without loss of generality.
- an RGB-D camera takes “pictures” of a scene by means of central optical projection. Whereas regular cameras can only measure the color of the light reflected by any visible point on the surface of an object, RGB-D cameras can also measure the distance (“depth”) to the same points on the surface. By measuring the depth of a surface point visible at a pixel p, an RGB-D camera is able to compute the full 3-D location of this point. This is because a pixel p characterizes the single line of sight to the surface point; the depth along a line of sight determines the location where the line of sight intersects the surface point. The line of sight through pixel p can be computed from the camera's intrinsic parameters, which can be calibrated using standard procedures.
- the RGB-D camera can produce a “depth map” (or “point cloud”) from the disparity maps generated from the individual images captured by each of the 2-D cameras of the RGB-D camera.
- a depth map or depth image includes a set of 3-D locations (which may be defined with respect to the camera's reference frame) of the surface points of the scene that are visible from the depth camera.
- Each pixel in the depth map may be associated with a color (e.g., represented by a triplet of red (R), green (G), and blue (B) values) as captured for the particular pixel by the color camera.
- the scanning system 100 is implemented in a handheld device.
- the term “handheld device” refers to a device that can be comfortably held and manipulated with one or two hands, such as a smartphone, a tablet computer, or a purpose-specific scanner similar in size and shape to a portable barcode scanner with an attached display (or, alternatively, a smartphone with an attached handle and trigger).
- the scanning system 100 is implemented as stationary device, such as one or more depth cameras rigidly mounted to a frame or other support structure and arranged to image objects on a conveyor belt or at a scanning station (e.g., a weighing location), and processing of the images captured by the one or more depth cameras may be performed by a processor and memory that are connected to the one or more depth cameras over a communication network (e.g., a local area network).
- a communication network e.g., a local area network
- aspects of embodiments of the present invention relate to systems and methods to compute the dimensions of a bounding box or minimal bounding box that would encompass an object.
- This may be thought of as a box that could be used to package the object, where the dimensions of the box are computed from observations of the object taken with a depth camera system 100 .
- the dimensions of a box minimize some particular characteristic, such as the volume, area, or perimeter of the bounding box such that encompasses the entirety of the object.
- Some systems and methods in accordance with embodiments of the present invention automatically compute the size of a box (e.g., a rectangular cuboid) lying on the ground, that can tightly contain an object in a scene captured by an RGB-D camera.
- a box e.g., a rectangular cuboid
- RGB-D camera e.g., a three-dimensional camera
- embodiments of the present invention are in the measurement of box-shaped objects. Measuring the sizes of boxes is a frequent task in the context of logistics, where, for example, users may be interested in determining the total amount of space needed to contain a particular given set of boxes.
- embodiments of the present invention can obtain very reliable results by combining color information with geometric information.
- the contact-less approach e.g., computer vision-based approach using visible and invisible light
- embodiments of the present invention reduces the amount of time needed to measure each object, thereby improving, for example, logistics processes, by increasing throughput of boxes during physical measurement operations.
- the range cameras 100 include at least two standard two-dimensional cameras that have overlapping fields of view.
- these two-dimensional (2-D) cameras may each include a digital image sensor such as a complementary metal oxide semiconductor (CMOS) image sensor or a charge coupled device (CCD) image sensor and an optical system (e.g., one or more lenses) configured to focus light onto the image sensor.
- CMOS complementary metal oxide semiconductor
- CCD charge coupled device
- the optical axes of the optical systems of the 2-D cameras may be substantially parallel such that the two cameras image substantially the same scene, albeit from slightly different perspectives. Accordingly, due to parallax, portions of a scene that are farther from the cameras will appear in substantially the same place in the images captured by the two cameras, whereas portions of a scene that are closer to the cameras will appear in different positions.
- a range image or depth image captured by a range camera 100 can be represented as a “cloud” of 3-D points, which can be used to describe the portion of the surface of the object (as well as other surfaces within the field of view of the depth camera).
- FIG. 2 is a block diagram of a stereo depth camera system according to one embodiment of the present invention.
- the depth camera system 100 shown in FIG. 2 includes a first camera 102 , a second camera 104 , a projection source 106 (or illumination source or active projection system), and a host processor 108 and memory 110 , wherein the host processor may be, for example, a graphics processing unit (GPU), a more general purpose processor (CPU), an appropriately configured field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
- the first camera 102 and the second camera 104 may be rigidly attached, e.g., on a frame, such that their relative positions and orientations are substantially fixed.
- the first camera 102 and the second camera 104 may be referred to together as a “depth camera.”
- the first camera 102 and the second camera 104 include corresponding image sensors 102 a and 104 a , and may also include corresponding image signal processors (ISP) 102 b and 104 b .
- the various components may communicate with one another over a system bus 112 .
- the depth camera system 100 may include additional components such as a network adapter 116 to communicate with other devices, an inertial measurement unit (IMU) 118 such as a gyroscope to detect acceleration of the depth camera 100 (e.g., detecting the direction of gravity to determine orientation), and persistent memory 120 such as NAND flash memory for storing data collected and processed by the depth camera system 100 .
- IMU inertial measurement unit
- persistent memory 120 such as NAND flash memory for storing data collected and processed by the depth camera system 100 .
- the IMU 118 may be of the type commonly found in many modern smartphones.
- the image capture system may also include other communication components, such as a universal serial bus (USB) interface controller.
- the depth camera system 100 further includes a display device 122 and one or more user input devices 124 (e.g., a touch sensitive panel of the display device 122 and/or one or more physical buttons or triggers).
- FIG. 2 depicts a depth camera 100 as including two cameras 102 and 104 coupled to a host processor 108 , memory 110 , network adapter 116 , IMU 118 , and persistent memory 120
- the three depth cameras 100 shown in FIG. 6 may each merely include cameras 102 and 104 , projection source 106 , and a communication component (e.g., a USB connection or a network adapter 116 ), and processing the two-dimensional images captured by the cameras 102 and 104 of the three depth cameras 100 may be performed by a shared processor or shared collection of processors in communication with the depth cameras 100 using their respective communication components or network adapters 116 .
- the image sensors 102 a and 104 a of the cameras 102 and 104 are RGB-IR image sensors.
- Image sensors that are capable of detecting visible light (e.g., red-green-blue, or RGB) and invisible light (e.g., infrared or IR) information may be, for example, charged coupled device (CCD) or complementary metal oxide semiconductor (CMOS) sensors.
- CCD charged coupled device
- CMOS complementary metal oxide semiconductor
- a conventional RGB camera sensor includes pixels arranged in a “Bayer layout” or “RGBG layout,” which is 50% green, 25% red, and 25% blue.
- Band pass filters are placed in front of individual photodiodes (e.g., between the photodiode and the optics associated with the camera) for each of the green, red, and blue wavelengths in accordance with the Bayer layout.
- a conventional RGB camera sensor also includes an infrared (IR) filter or IR cut-off filter (formed, e.g., as part of the lens or as a coating on the entire image sensor chip) which further blocks signals in an IR portion of electromagnetic spectrum.
- IR infrared
- An RGB-IR sensor is substantially similar to a conventional RGB sensor, but may include different color filters.
- one of the green filters in every group of four photodiodes is replaced with an IR band-pass filter (or micro filter) to create a layout that is 25% green, 25% red, 25% blue, and 25% infrared, where the infrared pixels are intermingled among the visible light pixels.
- the IR cut-off filter may be omitted from the RGB-IR sensor, the IR cut-off filter may be located only over the pixels that detect red, green, and blue light, or the IR filter can be designed to pass visible light as well as light in a particular wavelength interval (e.g., 840-860 nm).
- An image sensor capable of capturing light in multiple portions or bands or spectral bands of the electromagnetic spectrum (e.g., red, blue, green, and infrared light) will be referred to herein as a “multi-channel” image sensor.
- the image sensors 102 a and 104 a are conventional visible light sensors.
- the system includes one or more visible light cameras (e.g., RGB cameras) and, separately, one or more invisible light cameras (e.g., infrared cameras, where an IR band-pass filter is located across all over the pixels).
- the image sensors 102 a and 104 a are infrared (IR) light sensors.
- the image sensors 102 a and 104 a are infrared light (IR) sensors.
- the depth camera 100 may include a third camera 105 including a color image sensor 105 a (e.g., an image sensor configured to detect visible light in the red, green, and blue wavelengths, such as an image sensor arranged in a Bayer layout or RGBG layout) and an image signal processor 105 b.
- a color image sensor 105 a e.g., an image sensor configured to detect visible light in the red, green, and blue wavelengths, such as an image sensor arranged in a Bayer layout or RGBG layout
- an image signal processor 105 b an image signal processor 105 b.
- the color image data collected by the depth cameras 100 may supplement the color image data captured by the color cameras 150 .
- the color cameras 150 may be omitted from the system.
- a stereoscopic depth camera system includes at least two cameras that are spaced apart from each other and rigidly mounted to a shared structure such as a rigid frame.
- the cameras are oriented in substantially the same direction (e.g., the optical axes of the cameras may be substantially parallel) and have overlapping fields of view.
- These individual cameras can be implemented using, for example, a complementary metal oxide semiconductor (CMOS) or a charge coupled device (CCD) image sensor with an optical system (e.g., including one or more lenses) configured to direct or focus light onto the image sensor.
- CMOS complementary metal oxide semiconductor
- CCD charge coupled device
- the optical system can determine the field of view of the camera, e.g., based on whether the optical system is implements a “wide angle” lens, a “telephoto” lens, or something in between.
- the image acquisition system of the depth camera system may be referred to as having at least two cameras, which may be referred to as a “master” camera and one or more “slave” cameras.
- the estimated depth or disparity maps computed from the point of view of the master camera but any of the cameras may be used as the master camera.
- terms such as master/slave, left/right, above/below, first/second, and CAM 1 /CAM 2 are used interchangeably unless noted.
- any one of the cameras may be master or a slave camera, and considerations for a camera on a left side with respect to a camera on its right may also apply, by symmetry, in the other direction.
- a depth camera system may include three cameras.
- two of the cameras may be invisible light (infrared) cameras and the third camera may be a visible light (e.g., a red/blue/green color camera) camera. All three cameras may be optically registered (e.g., calibrated) with respect to one another.
- a depth camera system including three cameras is described in U.S. patent application Ser. No.
- Such a three camera system may also include an infrared illuminator configured to emit light in a wavelength interval that is detectable by the infrared cameras (e.g., 840-860 nm).
- an infrared illuminator configured to emit light in a wavelength interval that is detectable by the infrared cameras (e.g., 840-860 nm).
- the depth camera system determines the pixel location of the feature in each of the images captured by the cameras.
- the distance between the features in the two images is referred to as the disparity, which is inversely related to the distance or depth of the object. (This is the effect when comparing how much an object “shifts” when viewing the object with one eye at a time—the size of the shift depends on how far the object is from the viewer's eyes, where closer objects make a larger shift and farther objects make a smaller shift and objects in the distance may have little to no detectable shift.)
- Techniques for computing depth using disparity are described, for example, in R. Szeliski. “Computer Vision: Algorithms and Applications”, Springer, 2010 pp. 467 et seq.
- the magnitude of the disparity between the master and slave cameras depends on physical characteristics of the depth camera system, such as the pixel resolution of cameras, distance between the cameras and the fields of view of the cameras. Therefore, to generate accurate depth measurements, the depth camera system (or depth perceptive depth camera system) is calibrated based on these physical characteristics.
- the cameras may be arranged such that horizontal rows of the pixels of the image sensors of the cameras are substantially parallel.
- Image rectification techniques can be used to accommodate distortions to the images due to the shapes of the lenses of the cameras and variations of the orientations of the cameras.
- camera calibration information can provide information to rectify input images so that epipolar lines of the equivalent camera system are aligned with the scanlines of the rectified image.
- a 3-D point in the scene projects onto the same scanline index in the master and in the slave image.
- Let u m and u s be the coordinates on the scanline of the image of the same 3-D point p in the master and slave equivalent cameras, respectively, where in each camera these coordinates refer to an axis system centered at the principal point (the intersection of the optical axis with the focal plane) and with horizontal axis parallel to the scanlines of the rectified image.
- the difference u s ⁇ u m is called disparity and denoted by d; it is inversely proportional to the orthogonal distance of the 3-D point with respect to the rectified cameras (that is, the length of the orthogonal projection of the point onto the optical axis of either camera).
- Block matching is a commonly used stereoscopic algorithm. Given a pixel in the master camera image, the algorithm computes the costs to match this pixel to any other pixel in the slave camera image. This cost function is defined as the dissimilarity between the image content within a small window surrounding the pixel in the master image and the pixel in the slave image. The optimal disparity at point is finally estimated as the argument of the minimum matching cost. This procedure is commonly addressed as Winner-Takes-All (WTA). These techniques are described in more detail, for example, in R. Szeliski.
- the projection source 106 may be configured to emit visible light (e.g., light within the spectrum visible to humans and/or other animals) or invisible light (e.g., infrared light) toward the scene imaged by the cameras 102 and 104 .
- the projection source may have an optical axis substantially parallel to the optical axes of the cameras 102 and 104 and may be configured to emit light in the direction of the fields of view of the cameras 102 and 104 .
- the projection source 106 may include multiple separate illuminators, each having an optical axis spaced apart from the optical axis (or axes) of the other illuminator (or illuminators) and spaced apart from the optical axes of the cameras 102 and 104 .
- An invisible light projection source may be better suited to for situations where the subjects are people (such as in a videoconferencing system) because invisible light would not interfere with the subject's ability to see, whereas a visible light projection source may shine uncomfortably into the subject's eyes or may undesirably affect the experience by adding patterns to the scene.
- Examples of systems that include invisible light projection sources are described, for example, in U.S. patent application Ser. No. 14/788,078 “Systems and Methods for Multi-Channel Imaging Based on Multiple Exposure Settings,” filed in the United States Patent and Trademark Office on Jun. 30, 2015, the entire disclosure of which is herein incorporated by reference.
- Active projection sources can also be classified as projecting static patterns, e.g., patterns that do not change over time, and dynamic patterns, e.g., patterns that do change over time.
- one aspect of the pattern is the illumination level of the projected pattern. This may be relevant because it can influence the depth dynamic range of the depth camera system. For example, if the optical illumination is at a high level, then depth measurements can be made of distant objects (e.g., to overcome the diminishing of the optical illumination over the distance to the object, by a factor proportional to the inverse square of the distance) and under bright ambient light conditions. However, a high optical illumination level may cause saturation of parts of the scene that are close-up. On the other hand, a low optical illumination level can allow the measurement of close objects, but not distant objects.
- Depth computations may fail in some region areas due to multiple factors, including: the mechanism used to compute depth (triangulation, with or without an active illuminator, or time of flight); the geometry of the scene (such as the angle between each surface element and the associated line of sight, or the presence of partial occlusion which may impede view by either sensor in a stereo system); and the reflectivity characteristics of the surface (such as the presence of a specular component which may hinder stereo matching or reflect away light from a projector, or a very low albedo causing insufficient light reflected by the surface). For those pixels of the depth image where depth computation fails or is unreliable, only color information may be available.
- embodiments of the present invention are described herein with respect to stereo depth camera systems, embodiments of the present invention are not limited thereto and may also be used with other depth camera systems such as structured light time of flight cameras and LIDAR cameras.
- DTAM Dense Tracking and Mapping in Real Time
- SLAM Simultaneous Localization and Mapping
- depth data or a combination of depth and color data
- An approximation to the minimum volume bounding box can be computed in linear time (Barequet, G., & Har-Peled, S. (2001). Efficiently approximating the minimum-volume bounding box of a point set in three dimensions. Journal of Algorithms, 38(1), 91-109.) using an appropriate “coreset” (a small set of points with approximately the same bounding box as the original point set (Agarwal, P. K., Har-Peled, S., & Varadarajan, K. R. (2005). Geometric approximation via coresets. Combinatorial and computational geometry, 52, 1-30). Both algorithms require prior computation of the convex hull of the point set (Chang, C.
- Bounding boxes can be split into box trees (see, e.g., Gottschalk, S., Lin, M. C., & Manocha, D. (1996).
- OBBTree A hierarchical structure for rapid interference detection. In Proceedings of the 23 rd annual conference on Computer graphics and interactive techniques (pp. 171-180). ACM.), to generate tight fitting parameterizable models, which can be useful in applications such as robot grasping (Huebner, K., Ruthotto, S., & Kragic, D. (2008). Minimum volume bounding box decomposition for shape approximation in robot grasping. In Robotics and Automation, 2008 . ICRA 2008 . IEEE International Conference on (pp. 1628-1633). IEEE).
- Bounding boxes fitted around individual objects, as computed from RGB-D data, have also been used to study support and stability of a scene (see, e.g., Jia, Z., Gallagher, A., Saxena, A., & Chen, T. (2013). 3D-based reasoning with blocks, support, and stability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ).
- aspects of embodiments of the present invention make an assumption that the object or box to be measured lies on the ground and determines the size of an enclosing cuboid (e.g., rectangular prism) that itself has one face that lies on the ground (e.g., has one face that is parallel to and in contact with the ground).
- an enclosing cuboid e.g., rectangular prism
- ground or “ground plane”
- embodiments of the present invention are equally applicable in situations in which the object or box lies on an elevated horizontal surface, such as a table, an elevated weighing scale, the bed of a truck, and the like.
- embodiments of the present invention do not require the ground or ground plane to be completely horizontal (e.g., perpendicular to the direction of gravity), but may also be applicable in circumstances where the ground or ground plane is slightly tilted with respect to the horizontal plane.
- embodiments of the present invention speed up computation considerably with respect to comparative techniques. This allows embodiments of the present invention to provide rapid measurements of the dimensions of an object (e.g., on the order of seconds or less, rather than minutes), thereby providing easy usability in dynamic working conditions such as a warehouse or shipping center.
- aspects of embodiments of the present invention relate to use of color information in addition to depth information. Color information is useful in situations in which depth cannot be computed reliably over the whole surface of the object.
- modules for computing the dimensions of a box enclosing an object seen by an RGB-D camera.
- the first module operates on generic objects (e.g., without making assumptions about the shape of the object).
- the second module is specialized for objects that have a cuboidal (e.g., box) shape.
- Both modules return the parameters (e.g., length, width, and height) of a tight (e.g., minimal) box lying on the ground that encloses the object imaged by the RGB-D camera.
- Objects can typically be characterized by both specific surface colors (e.g., different colors on different portions of the surface of the object) and geometry (although these may be subject to variation between different instances of the same object, such as variations in the surface shape of a soft handbag or duffel bag based on the locations and depth of folds in the material). This type of information can be used to estimate the size and dimensions of the objects themselves, as described in more detail below.
- the color and geometry of an object can be obtained using specialized hardware such as an RGB-D camera of a depth camera system 100 , as described above.
- An RGB-D camera includes one or more color cameras (e.g., color camera 105 ), which acquire the color information of a scene imaged by the one or more color cameras and by one or more depth cameras (e.g., cameras 102 and 104 ), which acquire the geometry information (e.g., using infrared light).
- the RGB-D camera includes one or more color cameras and one or more Infra-Red (IR) cameras, which, coupled with an IR structured-light illuminator (e.g., projection source 106 ), constitute the depth camera.
- IR structured-light illuminator e.g., projection source 106
- the color camera and the depth camera can be synchronized and geometrically calibrated, allowing it to capture sequences of frames that are constituted by color images and corresponding depth maps, which can be geometrically aligned (e.g., each pixel or location of a depth map can be correlated with a corresponding color from a color image, thereby allowing capture of the surface colors of the scene).
- the combination of a depth map and a color image captured at substantially the same time as the depth map may be referred to as a “frame” of data.
- a color image with a depth map may be called an RGB-D frame, which contains color (RGB) and depth (D) information, as if both were acquired by a single camera with a single shutter and a single vantage point (even though the individual cameras 102 , 104 , and 105 are physically located in slightly different locations).
- RGB color
- D depth
- the depth camera system 100 may include an Inertial Measurement Unit (IMU) 118 , which include an accelerometer (e.g., a 3-axis accelerometer) that is synchronized with the RGB-D camera at either a software level or at a hardware level and that can be optionally calibrated with the RGB-D camera in terms of their relative spatial locations (e.g., the IMU 118 may be rigidly connected to the cameras 102 , 104 , and 105 ). Accordingly, the IMU 118 can provide information about the acceleration and/or orientation of the depth camera system 100 , and thereby provide information about the orientation of the depth camera system 100 relative to the captured depth frames. For example, the IMU 118 can be used to identify which direction in the captured depth frame is “down” (in the direction of gravity).
- IMU Inertial Measurement Unit
- the various operations according to embodiments of the present invention may be performed using one or more computing devices configured to receive the depth frames captured by the depth camera system 100 .
- all of the operations are performed in a single computing device (e.g., the host processor 108 and the memory 110 of the depth camera system 100 ).
- the computed RGB-D frames from the depth camera system are analyzed by a processor and memory of a separate computing device or a separate processor and memory physically coupled to the depth camera system.
- various operations may be implemented using one or more of general-purpose or specific-purpose processing units such as a general purpose central processing unit (CPU), a graphical processing unit (GPU), a field programmable gate array (FPGA), and/or an application specific integrated circuit (ASIC), which may store data in memory (e.g., dynamic memory and/or static memory) and receive and/or transmit data through input/output (I/O) interfaces (e.g., universal serial bus or USB, serial) and networking interfaces (e.g., wireless local area networks such as IEEE 802.11b/g/n/ac WiFi, wired local area networks such as IEEE 802.3 Ethernet, 3G/4G cellular connectivity, and Bluetooth®) to execute a set of instructions in order to perform volumetric box fitting in accordance with embodiments of the present invention.
- I/O input/output
- networking interfaces e.g., wireless local area networks such as IEEE 802.11b/g/n/ac WiFi, wired local area networks such as IEEE 802.3 Ethernet, 3G/4
- an electronic scale may provide measurements of the weight of the object
- a barcode decoding system may provide an identifier (e.g., a Universal Product Code or UPC) of the object in order to allow metadata about the object to be retrieved from a database or other data store.
- the barcode decoding system may use an image of a barcode captured by a color camera of the depth cameras system (e.g., applying image rectification to a barcode appearing in a portion of the color image).
- FIG. 3 is a flowchart of a method for measuring dimensions of object according to one embodiment of the present invention.
- the process begins with a depth map of a scene including an object and proceeds with segmenting 310 the object from the scene, detecting 330 the ground plane that the object is resting on, detecting a 350 rectangular outline of the object, projected onto the ground plane, computing 370 a height of the object above the ground plane, and outputting 390 the computed dimensions of the bounding box surrounding the object.
- the depth map of the scene may be captured using a depth camera system 100 as described above (e.g., an RGB-D camera).
- the operations will be described herein as being performed by the host processor 108 of the depth camera system 100 , but embodiments of the present invention are not limited thereto and, in some embodiments, various operations may be performed by one or more other computing devices such as a CPU, a GPU, an FPGA, and/or an ASIC, where the one or more other computing devices may be integrated into the same physical device as the depth camera system 100 (e.g., enclosed in the same housing and/or located on the same circuit board) and/or from the depth camera system 100 (e.g., in communication with the depth camera system through one or more of the I/O interfaces and/or the network interfaces 116 ).
- the depth camera system 100 e.g., enclosed in the same housing and/or located on the same circuit board
- the depth camera system 100 e.g., in communication with the depth camera system through one or more of the I/O interfaces and/or the network interfaces 116 .
- a scene 8 captured by a depth camera system 100 may include an object of interest 10 in the foreground along with clutter 12 in the background.
- the depth camera system 100 is controlled to capture a depth frame when a trigger (e.g., a software button shown on a display device or a physical trigger button) is activated.
- a trigger e.g., a software button shown on a display device or a physical trigger button
- Embodiments of the present invention will be described below primarily in the context of analyzing a depth map corresponding to a single view of the object. Computing dimensions of an object from a single view increases the usability of a hand-held scanning device.
- a hand-held scanning device may be more adaptable to different situations and may be more cost effective than a stationary scanning device fixed to a particular location.
- embodiments of the present invention are not limited thereto and may also be applied in circumstances where multiple views of the object (from multiple different poses with respect to the object) are combined to generate a 3-D model of the object from multiple sides (e.g., a “point cloud” representing the scene including the object).
- the processor 108 segments the object from the scene.
- the object is separated or “segmented” from the other objects in the scene (e.g., the pixels corresponding to the clutter 12 may be ignored in the following operations or deleted from the captured depth map).
- the object may be resting on a ground (or horizontal surface) 14 .
- the portion of the 3-D model e.g., the pixels of the RGB-D frame or the points of the point cloud
- the portion of the 3-D model corresponding to the object 10 are identified by selecting the points of the point cloud (or vertices of the 3-D model) or the pixels of the RGB-D frame that are closest to the viewpoint of the depth camera system (in some embodiments, this determination is also be weighted in accordance with how close the points are to the center of the image, in order to remove nearby clutter at the edges of the image). This is based on the assumption that the object of interest 10 will generally be the object in the scene that is closest to the camera (e.g., in the foreground).
- a reticle 250 (or crosshairs) may be shown in the view, and the pixels under the crosshairs are selected as initial points corresponding to the object of interest 10 .
- the reticle 250 can improve usability of the system by providing the user with a visual cue for specifying which particular portions of the view correspond to the object of interest 10 , rather than relying on a heuristic by the system.
- a “flood fill” operation may be performed to select the remaining portions of the object that are visible in the scene. This is similar to a flood fill operation in 2-D graphics, where an initial pixel may be selected and neighboring pixels that are within a threshold distance in color space (e.g., similarly colored pixels) are added to the set of selected pixels, and the process iteratively adds neighboring pixels that satisfy the condition, until no more pixels can be added to the selection.
- a threshold distance in color space e.g., similarly colored pixels
- the 3-D flood fill operation begins by identifying initial points of the object, and then adding pixels that are close enough to be considered “continuous” and adjacent to currently selected pixels in 3-D space.
- the corner of the box may be the identified as initial points of the object in view of being the closest to the camera and closest to the center of the image. Points or pixels near the corner of the box closest to the camera will be close to (and considered “continuous” with) the point corresponding to the corner of the box.
- pixels along the top, front, and side surface of the box will be considered “continuous” and close to their adjacent pixels in the scene.
- the 3-D position of points of the clutter 12 behind the box 10 will be “discontinuous” with the top surface of the box, because there will be a large change in the range (e.g., distance from the depth camera system 100 ) when transitioning from the top surface of the box 10 to a surface of the clutter 12 .
- FIG. 4 A is a depiction of a depth map of a scene depicting a bottle of laundry detergent on a table.
- blue pixels represent longer distances
- green and yellow pixels represent mid-range distances
- red pixels depict shorter distances.
- the bottle shown in FIG. 4 A can be segmented from the background based on discontinuity between the edges of the bottle in red and the adjacent pixels (corresponding to the table) in yellow and green.
- the processor 108 detects a ground plane of the scene.
- the ground plane is assumed to be the substantially planar surface of the scene that the object of interest 10 is resting on a ground surface 14 .
- computing the ground plane uses data from a 3-axis accelerometer (or IMU 118 ) of the depth camera system 100 , and geometrically calibrated with the depth camera system 100 .
- IMU 118 When the IMU 118 is kept in a static position, it produces a triplet of numbers that represents the direction of the gravity vector (orthogonal to the ground plane). This automatically determines the orientation of the ground plane.
- the actual location of the ground plane can then be estimated from the captured 3-D depth map.
- the processor is controlled to select the closest plane to the camera that is consistent with the expected orientation of the ground plane determined by the IMU 118 , such that all 3-D points measured from the depth camera system 100 are above this selected closest plane.
- the points or pixels of the scene corresponding to the ground plane can be detected by following the pixels corresponding to the object downward (e.g., based on the “down” direction as detected by the IMU 118 ), and identifying all of the pixels that are at the same height (e.g., along a plane corresponding to the points on pixels around the base of the object 10 ), within a threshold value.
- Some aspects of embodiments of the present invention relate to calculating a virtual ground plane based on an idealized ground plane, as estimated from the captured depth map.
- depth camera systems such as the RGB-D camera system described above are subject to noise (e.g., errors), where the magnitude of the noise increases super-linearly with distance from the sensor (e.g., the noise may increase with the square of the distance from the sensor).
- FIG. 5 A is a schematic illustration of noise in a depth sensing system according to one embodiment of the present invention.
- a depth camera system 100 images a scene that includes a ground plane 14 .
- the dotted lines 510 depicting the variance in the computed or estimated positions of the ground plane 14 as the distance between the depth camera system 100 and surfaces in the scene increases, the amount of noise 520 in the computed positions of the surfaces also increases. This may correspond to the estimated positions of the ground plane as being above or below the actual height of the ground plane 14 .
- FIG. 5 B is a schematic illustration of interactions between objects in a scene and noise in a depth sensing system according to one embodiment of the present invention.
- FIG. 5 B is substantially similar to FIG. 5 A , but further includes a cross-section of an object 10 in the scene.
- the height 530 of the object 10 is taken to be the distance between the top of the object 10 and the ground plane.
- noise 510 in the measured position 540 of the ground plane can cause errors in the measured height 550 of the object 10 (between the top of the object 10 and the measured position of the ground plane) to be different from the actual height 530 .
- the depiction in FIG. 5 B ignores the noise in the measured location of the top of the object 10 , and that this additional noise causes further inaccuracies in calculating the height of the object 10 ).
- some aspects of embodiments of the present invention relate to systems and methods for defining a virtual ground plane that is more accurate than the ground plane extracted directly from a depth image.
- the noise in the depth map increases with distance from the depth camera system 100 .
- the points or pixels at the bottom of the depth frame generally correspond to the ground plane 14 that the object 10 is resting on. See, for example, FIG. 4 A , in which the orange pixels at the bottom of the image are part of the ground (e.g., the table) that the bottle is resting on.
- FIG. 5 C is a flowchart of a method 330 for computing a virtual ground plane according to one embodiment of the present invention.
- the processor analyzes the input depth map of the scene (e.g., with the object of interest 10 segmented) and identifies an orientation of the depth map to determine which direction corresponds to the direction of gravity (informally, the “down” direction) As noted above, the orientation information may be recorded from the IMU 118 at the time that the depth map is captured.
- the “bottom” pixels or points of the depth map are identified, where “bottom” refers to the portion of the image in the “down” direction identified in operation 331 .
- the bottom of the depth map is assumed to correspond to the closest part of the ground plane 14 , which extends away from the depth camera and “up” in the depth map (e.g., toward the top of the image).
- the “down” direction corresponds to the direction perpendicular to the ground plane an parallel to the vertical axis of the bottle, and the portion of the depth map corresponding to the “bottom” pixels or points are the orange strip at the lower edge of the image.
- the processor controls the width of the strip of bottom pixels that are identified in operation 333 based on known noise characteristics of the depth camera system 100 (e.g., noise as a function of distance or range of a pixel).
- the noise characteristics of the depth camera system 100 may include parameters that are stored in the memory of the depth camera system 100 and previously computed by measuring differences between depth maps captured by the depth camera system 100 (or substantially equivalent depth camera systems) and/or parameters computed based on, for example, theoretical predictions noise in the camera image sensors (e.g., image sensors 102 a , 104 a , and 105 a ), characteristics of the pattern emitted by the projection source 106 , the image resolutions of the image sensors, and constraints of the disparity matching technique.
- pixels from the bottom edge of the depth map up until the pixels represent distances that exceed the acceptable error threshold are selected as part of the ground plane (subtracting the points or pixels corresponding to the segmented object, if any such pixels were included in this process).
- the processer uses the bottom points or pixels, which are assumed to lie on the same ground plane 14 that is supporting the object 10 , to define a partial ground plane or partial plane.
- linear regression is applied to the selected bottom points (or depth pixels) along two directions (e.g., two horizontal directions perpendicular to the direction of gravity) to define a virtual ground plane (or an “ideal” virtual ground plane) in accordance with a linear function.
- outlier points or pixels are removed from the bottom points or pixels before computing the plane.
- the virtual ground plane defined by the selected ones of the bottom pixels of the depth map is extended to the region under the object of interest 10 .
- aspects of embodiments of the present invention relate to defining a virtual ground plane based on portions of the captured depth map (or 3-D model) that exhibit lower noise (e.g., a portion of the ground 14 that is closer to the depth camera system 100 ). Based on the assumption that the ground 14 is substantially planar or flat between the low noise portion of the ground 14 closest to the depth camera system 100 and the parts of the ground 14 at the object 10 , this virtual ground plane can be extended to the region under the object 10 . This increases the accuracy of the measurements of the dimensions of the object in later operations 350 and 370 , as described in more detail below.
- the processor detects a rectangular outline of the object on the ground plane.
- FIG. 4 B is an orthogonal view of the depth map shown in FIG. 4 A with the ground plane aligned perpendicular to the optical axis of the virtual camera.
- the large region of lighter red represents the portion of the ground plane that was visible to the depth camera system 100 in FIG. 4 A .
- the darker red portion of FIG. 4 B corresponds to the portions of the ground plane that were occluded by the bottle when the depth map was captured.
- the brighter colored portions of the depth map near the center of FIG. 4 B correspond to the bottle (these portions are depicted in yellow and blue because this particular orthogonal view is taken from “underneath” the ground plane), and these brighter colored portions represent the projection of the points of the object 10 onto the virtual ground plane.
- This process is equivalent to “smashing” all of the points of the depth map corresponding to the object 10 down to the ground plane (e.g., assuming that ground plane extends along the x-z axes of the 3-D model at the y coordinate of zero (0), this is equivalent to setting the y coordinates of all of the points of the object 10 to zero (0)).
- FIG. 4 C depicts the vertically projected points of the object 10 in white and the rest of the image in black, with a red rectangle on the ground plane that contains all the vertical projections of the object's surface points according to one embodiment of the present invention.
- the processor computes the connected components of a graph defined on the ground plane, where the vertical projections of measured 3-D points of the surfaces in the scene, including the surfaces of the object, form the vertices of the graph, and two such vertices are connected by an edge if their distance is smaller than a threshold.
- some embodiments keep the largest connected component, under the assumption that the object of interest occupies a larger portion in the image than other visible surfaces, thereby providing an alternative and/or additional technique for segmenting the object 10 from the clutter 12 in the scene (e.g., in additional to segmentation performed in operation 310 ).
- an enclosing box for an object can be determined by determining a rectangle on the ground plane 14 that contains all the vertical projections of the object's surface points and extending the rectangle vertically to the top of the object.
- the enclosing box is a minimum volume enclosing box or minimum bounding box—in other words, the smallest box that encloses all of the points of the object, where “smallest” may refer to volume, area, or perimeter of the box, in accordance with particular application requirements (e.g., minimizing area to reduce the amount of packing material consumed versus minimizing volume to reduce the amount of space used to store or transport the object).
- the minimum volume enclosing box can be computed by first determining, in operation 350 , the minimum area rectangle enclosing the points of the object 10 projected onto the virtual ground plane.
- a two-dimensional rotating calipers approach is used to compute the minimum area rectangle in linear time.
- the processor determines the height of this box in operation 370 as being equal to the maximum distance of any surface point of the object to the virtual ground plane.
- the minimum area rectangle can be computed in a time linear to the number of enclosed points using standard rotating caliper methods. It is also possible to compute (again in linear time) the minimum surface enclosing box, by finding the minimum perimeter enclosing rectangle on the ground plane.
- aspects of embodiments of the present invention are able to compute a three-dimensional bounding box of an object in linear time with respect to the number of points; as opposed to in cubic time of the comparative techniques described above, thereby also enabling faster response (e.g., real-time or substantially real-time computations of three-dimensional bounding boxes).
- the dimensions of a box enclosing the object 10 are computed in operation 350 and the height is computed in operation 370 .
- the processor outputs the computed dimensions as shown, for example, as dimensions 230 in FIGS. 1 B and 1 C and as the outline 220 of a bounding box overlaid on a color image view of a scene as shown in FIGS. 1 B and 1 C .
- FIG. 4 D is a color image of the scene depicted in the depth map of FIG. 4 A with a bounding box computed in accordance with embodiments of the present invention overlaid on the view of the bottle.
- a volume of a bounding box of an object is estimated from a single view of the object from an RGB-D camera. While this single view is useful and convenient to use in circumstances such as handheld scanning systems, this single view (or other incomplete collections of views) can only represent the portions of the surface of the object that are visible to the depth camera or depth cameras. In such circumstances, in some embodiments of the present invention, some assumptions are made about the invisible or occluded portions of the object when estimating its volume. For example, some embodiments of the present invention assume the shape of the object is approximately symmetric, such that the occluded surface is similar (in reverse) to the visible surface. This is assumption generally holds for objects that are shaped as boxes.
- no prior assumption on the shape of the invisible or occluded surfaces of the object is made.
- an appropriate criterion is used to fit a bounding box to the set of points that are the projection onto the ground plane of visible surfaces.
- an embodiment of the present invention fits a rectangular bounding box such that the sum of the distances of the points projected onto the ground plane to the closest point in the bounding box is minimized.
- While embodiments of the present invention are described herein for application on depth maps obtained from a single view, the same techniques can be applied to data collected from multiple overlapping views, acquired by a single camera capturing multiple depth images over multiple poses with respect to the object or by multiple cameras having different poses with respect to the object (in a single shot or multiple shots).
- techniques such as Iterated Closest Point (ICP) can be used to simultaneously register two or more depth maps and to compute the relative poses (position and orientations) of the cameras with respect to one of the cameras' frames of reference.
- ICP Iterated Closest Point
- the resulting 3-D model e.g., point cloud and/or a 3-D model including vertices defining polygons specifying surfaces of objects in a scene
- substantially similar techniques to compute the volumes of objects.
- Some aspects of embodiments of the present invention relate to estimating the volumes of box-shaped or cuboidal objects that have one face laying on the ground in a “box mode.”
- Computing the volume of boxes generally follows a process similar to that described above with respect to “arbitrary” objects, such as the embodiments shown in FIGS. 3 and 5 C , with additional modifications based on the assumption of box-shaped objects.
- some embodiments of the present invention exploit geometrical characteristics of box-shaped objects such as the fact that, from any viewpoint of a box, at most three of its faces are visible, at most two of which are vertical. (See, e.g., FIG. 7 A , which depicts a color image or photograph of a box, three sides of which are visible.)
- FIG. 6 is a flowchart of a method 600 for measuring dimensions of a box-like object in accordance with one embodiment of the present invention.
- the general structure of estimating dimensions of a box is similar to that for an arbitrary object. Accordingly, for the sake of clarity, operations that are substantially the same or substantially similar to those described with respect to FIGS. 3 and 5 C will not be repeated in detail.
- the object of interest 10 is segmented from a scene in a received depth map.
- FIG. 7 A is a color photograph of a scene containing a box in the foreground and some clutter in the background.
- FIG. 7 B is a depth map of the scene, where the box in the foreground is shown in red, indicating that it is closer to the depth camera system 100 and with the background clutter in blue, indicating that the clutter is farther from the depth camera system.
- the top corner of the box is shown in dark red, indicating that it is the point of the box that is closest to the depth camera system 100 .
- Portions of the top of the box are shown in dark blue in FIG. 7 B , indicating failures of the depth reconstruction process in those regions.
- the processor detects or computes the visible ground plane in the depth map in a manner similar to that described in operation 330 .
- the processor projects the visible surface points of the segmented object are vertically projected down to the ground plane (“smashed” onto the ground plane).
- the vertical projection of points from the vertical faces would form two segments at a square angle.
- Other points from to the top face, if visible, as well as from other visible surfaces in the scene) will project on the ground plane as well.
- points in the top faces are expected to have sparse density, points from other surfaces (in particular, other vertical box faces) may also look like line segments when projected onto the ground plane. In case other box corners are visible, these additional segments will also form square angles with each other.
- FIG. 7 C is an example of the projection of the visible points of the box shown in FIG. 7 B onto the ground plane when viewed from “above” (e.g., along the direction of gravity). As seen in FIG. 7 C , the vertical surfaces of the box form two lines arranged at right angles to one another.
- the processor identifies segments of points that intersect at square angles, as they are likely to characterize a box corner. In some embodiments of the present invention, this operation is performed using random sample consensus (RANSAC).
- RANSAC random sample consensus
- One embodiment identifies individual lines one by one, where points that support a line (“inliers”) are removed before computing the next line.
- the processor then builds a graph from the lines thus found, where the nodes of the graph represent lines found by RANSAC, and two nodes in the graph are linked by an edge if the lines they represent form an approximately square angle (e.g., approximately 90° angle). Each node i in the graph also stores the number of inliers, I(i) supported by the associated line. Then, the processor finds the two nodes, i and j, connected by an edge, with the highest value of I(i)+I(j).
- the two lines found in this way represent the traces of two planes, orthogonal to the ground plane, that contain the two visible vertical faces of the box. These two lines intersect at the trace of the corner joining the two visible vertical faces (e.g., the vertical edge closest to the camera in FIGS. 7 A and 7 B ). Some embodiments only consider the two joined semi-lines, obtained from the original two lines by removing all points that are closer to the camera than the intersection point.
- embodiments of the present invention characterize the height of the box and the extents of the two visible vertical faces.
- the box height determines the location of the top face of the box, which is parallel to the ground plane. Note that the top face of the box, if it is seen at all, is often seen at a large slant angle, which can make depth computation of the top face less reliable. Furthermore, depending on the viewing angle, only a small portion of the top face may be visible. However, the entirety of the top edges of the two visible vertical faces is visible, and thus the points corresponding to the edges can be used to compute the box's height. Unfortunately, depth measurements at the edges of the top face are generally noisy and unreliable. Accordingly, some aspects of embodiments of the present invention relate to techniques for the processor to compute a robust estimate of the height of a box in operation 670 based on a depth map of a scene.
- the process of computing the height of the box in operation 670 generally relates to computing the height of the box at multiple locations along the top edges of the box and applying statistical techniques to determine the most likely actual height of the box.
- the processor 108 defines a grid in the virtual ground plane, computed in operation 630 , that the box rests on.
- the processor selects only the cells of the grid that contain either of the two semi-lines (e.g., the bright portions of the lines shown in FIG. 7 C ), representing the traces of the semi-planes containing the visible vertical faces described above.
- the processor stores the largest distance h(i) to the virtual ground plane among all measured 3-D points that project orthogonally onto that cell. Note that an individual cell in either semi-line may collect points coming from not only the top edge of the box, but also from different locations on the vertical faces (at different heights in the face).
- the cells may fail to include points from the top edges of the box.
- the recorded value h(i) will be smaller than the actual height of the box. In some circumstances, this can be overcome by computing the maximum value of all of the h(i) among all cells in the semi-lines (e.g., h(i)).
- this strategy may fail when points from “spurious” measurements of surfaces (e.g., points from background clutter or another box that is stacked on top of the box under consideration, or from another nearby box on the ground) project onto cells in either semi-line.
- points from “spurious” measurements of surfaces e.g., points from background clutter or another box that is stacked on top of the box under consideration, or from another nearby box on the ground
- the processor computes the median or other higher percentile of the values ⁇ h(i) ⁇ .
- the processor computes the mode of the distribution of the values ⁇ h(i) ⁇ .
- the processor computes a histogram of the values ⁇ h(i) ⁇ , with equal-size bins uniformly distributed between h(i) and h(i).
- the bin B j [B j,min , B j,max ] with the maximum associated count is then selected, and the center point (B j,min +B j,max )/2 of bin B j is computed as the estimate of the height of the box.
- the processor refines the estimation by considering all values of h(i) that fall within [B j,min , B j,max ] and by computing a new histogram of these values, with bins defined between B j,min and B j,max . This operation can be repeated recursively until a minimum pre-set bin size is reached, or until the variation of the box height estimate between two iterations is below a certain threshold (e.g., until B j,max B j,min is less than a threshold value).
- the processor computes, in operation 680 , two planar regions, P 1 and P 2 , which are built from two vertical half planes that intersect the ground plane at the semi-lines considered above, and are limited from below by the ground plane, and from above by the plane containing the top face of the box (e.g., the plane parallel to the ground plane and at the height computed in operation 670 ).
- some aspects of embodiments of the present invention relate to reliably computing the location of the outer edges of the vertical faces in operation 680 , even when depth data is unreliable.
- the processor projects the two vertical planar regions P 1 and P 2 , described above, onto the camera's focal plane using the known intrinsic parameters of the camera, which may be previously computed offline and stored in memory.
- the projection on the focal plane of planar regions P 1 and P 2 is defined by one segment (corresponding to the corner where P 1 and P 2 are joined) and by two pairs of half lines (l 1 1 , l 1 2 ) and (l 2 1 , l 2 2 ), respectively, which are the projections onto the focal plane of the top and of the bottom edges of P 1 and P 2 .
- a spatially ordered sequence of regularly spaced pixels ⁇ p 1 , p 2 , . . .
- ⁇ is determined on either the top or the bottom half line for P 1 (where p 1 is the closest pixel to the intersection of the two half lines).
- the processor computes the vertical plane (orthogonal to the ground plane) that contains both p i and the optical center of the camera. This plane intersects the camera's focal plane in an image line through p i .
- the processor considers the segment of pixels S i intersected by this line between l 1 1 and l 1 2 ; for each such pixel and determines the measured 3-D surface point projecting onto such a pixel. The processor then checks whether these points are consistent the planar surface P 1 . More precisely, the processor counts the number n i,1 of such points that are within a certain distance d from P 1 , as well as the number n i,2 of points that are further than d from P 2 . This operation is repeated for pixels ⁇ p 1 , p 2 , . . . ⁇ . Then, starting from p 1 , the processor considers the pixels p 2 , p 3 , . . . until one such pixel, p j , is found with an associated value n j,1 that is lower than a threshold value.
- the processor safely concludes that all points projecting onto the segments S 1 , S 2 , . . . , S j ⁇ 1 are consistent with the hypothesis that they belong to the vertical face represented by P 1 .
- the processor then continues to visit the pixels p j+1 , p j+2 , . . . , until one pixel, p k , is found (if any) with associated value n j,2 larger than another threshold. If one such pixel is found, the processor safely concludes that all points projecting onto the segments S k , S k+1 , . . . are not consistent with the hypothesis that they belong to the vertical face represented by P 1 , and therefore do not belong to the surface of the box.
- the processor determines which segment S m , with j ⁇ m ⁇ k, is the projection of the outer edge of the box's face represented by P 1 . To this end, the processor considers the color content of the RGB-D frame in the quadrilateral region bounded by l 1 1 , l 1 2 , S j , and S k . This is expected to work because the image content can be usually expected to be relatively uniform on the surface of the box, yet different from the content of the background.
- each segment S m within this region three histograms of the color values are computed (one histogram per color channel) over an appropriate number of bins.
- each segment S m is evaluated in turn while moving outward, and the associated color histograms are compared, using a standard histogram distance operator (for example, the sum of the squared differences over bins), with the weighted sum of the color histograms from the previously visited segments ⁇ S l , j ⁇ l ⁇ m ⁇ , where the weight assigned to the histograms for the segment S l is a decreasing function of the distance m ⁇ l.
- a standard histogram distance operator for example, the sum of the squared differences over bins
- the processor stores the sum of these histogram distances over the three color channels, D m , for each segment S m . After this operation is completed, the processor finds the index m (with j ⁇ m ⁇ k) with largest value of associated D m . The segment S n associated with the largest index n such that D n ⁇ K ⁇ D m is chosen to identify the outer edge of the vertical face represented by P 1 . The same sequence of operations is repeated for the half plane P 2 to compute the outer edge of the vertical face represented by P 2 .
- FIG. 7 D is a pictorial representation of a method for estimating the vertical surfaces extent according to one embodiment of the present invention.
- the green-red-black lines are the vanishing lines obtained by estimating the height of the box and orientation of the vertical sides of the box encoding the compatibility of the vertical edge with the acquired depth (green is compatible, red is not compatible and black is uncertain), the thin green lines extending along the vertical faces of the box are some of the possible candidates for the extent of the vertical sides of the box, and the thick green lines are the estimated extents of the vertical sides of the box.
- FIGS. 8 A, 8 B, and 8 C are histograms of colors computed from the RGB image for possible candidates of vertical sides extents (thin green lines of FIG. 7 D ).
- FIGS. 8 A and 8 B are reported color histograms for candidates internal to the box and
- FIG. 8 C is a histogram for a candidate external to the box.
- the extent of the box (vertical thick green lines of FIG. 7 D ) is estimated in correspondence of a significant histogram variation.
- box-mode is activated or deactivated based on a switch or toggle, which may be a physical switch or a software switch in a user interface of the system (e.g., displayed on the display device).
- a switch or toggle which may be a physical switch or a software switch in a user interface of the system (e.g., displayed on the display device).
- Some aspects of embodiments of the present invention relate to automatically switching between box mode and arbitrary mode based on whether the object is box-like.
- the object may be assumed to be box-shaped and the processor may automatically compute the dimensions of the object in “box-mode” as described, for example, above with respect to FIG. 6 .
- the processor may automatically compute the dimensions of the object in an “arbitrary object” mode as described, for example, above with respect to FIG. 3 .
- aspects of embodiments of the present invention relate to systems and methods for automatically and quickly estimating the dimensions of a box tightly fitting an arbitrary object and/or estimating the dimensions of a box-shaped object.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (35)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/330,248 US12228388B2 (en) | 2018-01-05 | 2023-06-06 | Systems and methods for volumetric sizing |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862613957P | 2018-01-05 | 2018-01-05 | |
| US16/240,691 US11341350B2 (en) | 2018-01-05 | 2019-01-04 | Systems and methods for volumetric sizing |
| US17/726,998 US11709046B2 (en) | 2018-01-05 | 2022-04-22 | Systems and methods for volumetric sizing |
| US18/330,248 US12228388B2 (en) | 2018-01-05 | 2023-06-06 | Systems and methods for volumetric sizing |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/726,998 Continuation US11709046B2 (en) | 2018-01-05 | 2022-04-22 | Systems and methods for volumetric sizing |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230349686A1 US20230349686A1 (en) | 2023-11-02 |
| US12228388B2 true US12228388B2 (en) | 2025-02-18 |
Family
ID=67140881
Family Applications (3)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/240,691 Active 2039-01-06 US11341350B2 (en) | 2018-01-05 | 2019-01-04 | Systems and methods for volumetric sizing |
| US17/726,998 Active US11709046B2 (en) | 2018-01-05 | 2022-04-22 | Systems and methods for volumetric sizing |
| US18/330,248 Active US12228388B2 (en) | 2018-01-05 | 2023-06-06 | Systems and methods for volumetric sizing |
Family Applications Before (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/240,691 Active 2039-01-06 US11341350B2 (en) | 2018-01-05 | 2019-01-04 | Systems and methods for volumetric sizing |
| US17/726,998 Active US11709046B2 (en) | 2018-01-05 | 2022-04-22 | Systems and methods for volumetric sizing |
Country Status (3)
| Country | Link |
|---|---|
| US (3) | US11341350B2 (en) |
| CA (1) | CA3125730C (en) |
| WO (1) | WO2019136315A2 (en) |
Families Citing this family (66)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10078794B2 (en) * | 2015-11-30 | 2018-09-18 | Pilot Ai Labs, Inc. | System and method for improved general object detection using neural networks |
| CA3125730C (en) | 2018-01-05 | 2023-10-24 | Aquifi, Inc. | Systems and methods for volumetric sizing |
| CN108460333B (en) * | 2018-01-19 | 2020-03-24 | 北京华捷艾米科技有限公司 | Ground detection method and device based on depth map |
| US10977827B2 (en) * | 2018-03-27 | 2021-04-13 | J. William Mauchly | Multiview estimation of 6D pose |
| DK180640B1 (en) | 2018-05-07 | 2021-11-09 | Apple Inc | Devices and methods of measurement using augmented reality |
| US10930001B2 (en) * | 2018-05-29 | 2021-02-23 | Zebra Technologies Corporation | Data capture system and method for object dimensioning |
| US10610130B2 (en) * | 2018-06-29 | 2020-04-07 | Intel Corporation | Measuring limb range of motion |
| US11531954B2 (en) * | 2018-08-24 | 2022-12-20 | Exel Inc | Shipping carton optimization system and method |
| US10785413B2 (en) | 2018-09-29 | 2020-09-22 | Apple Inc. | Devices, methods, and graphical user interfaces for depth-based annotation |
| US11399462B2 (en) * | 2018-10-31 | 2022-08-02 | Cnh Industrial America Llc | System and method for calibrating alignment of work vehicles |
| WO2020132627A1 (en) * | 2018-12-20 | 2020-06-25 | Aquifi, Inc. | Systems and methods for object dimensioning based on partial visual information |
| US11039118B2 (en) * | 2019-04-17 | 2021-06-15 | XRSpace CO., LTD. | Interactive image processing system using infrared cameras |
| WO2020219744A1 (en) * | 2019-04-24 | 2020-10-29 | Magic Leap, Inc. | Perimeter estimation from posed monocular video |
| US11335021B1 (en) * | 2019-06-11 | 2022-05-17 | Cognex Corporation | System and method for refining dimensions of a generally cuboidal 3D object imaged by 3D vision system and controls for the same |
| US11748991B1 (en) * | 2019-07-24 | 2023-09-05 | Ambarella International Lp | IP security camera combining both infrared and visible light illumination plus sensor fusion to achieve color imaging in zero and low light situations |
| US11160409B2 (en) * | 2019-08-28 | 2021-11-02 | Kevin Bowman | Storage container with remote monitoring and access control |
| KR102770795B1 (en) * | 2019-09-09 | 2025-02-21 | 삼성전자주식회사 | 3d rendering method and 3d rendering apparatus |
| US11227446B2 (en) | 2019-09-27 | 2022-01-18 | Apple Inc. | Systems, methods, and graphical user interfaces for modeling, measuring, and drawing using augmented reality |
| US11532093B2 (en) | 2019-10-10 | 2022-12-20 | Intermap Technologies, Inc. | First floor height estimation from optical images |
| TWI709913B (en) | 2019-10-28 | 2020-11-11 | 阿丹電子企業股份有限公司 | Multifunctional handheld scanner |
| TWI709725B (en) * | 2019-12-03 | 2020-11-11 | 阿丹電子企業股份有限公司 | Volume measuring apparatus and volume measuring method for boxes |
| CN112991429B (en) * | 2019-12-13 | 2024-08-20 | 顺丰科技有限公司 | Box volume measurement method, device, computer equipment and storage medium |
| CN113065810B (en) * | 2019-12-31 | 2024-01-26 | 杭州海康机器人股份有限公司 | Methods, devices, computing equipment, logistics systems and storage media for detecting packages |
| CN115756167A (en) * | 2020-02-03 | 2023-03-07 | 苹果公司 | Systems, methods, and user interfaces for annotating, measuring, and modeling an environment |
| US11080879B1 (en) * | 2020-02-03 | 2021-08-03 | Apple Inc. | Systems, methods, and graphical user interfaces for annotating, measuring, and modeling environments |
| US12307066B2 (en) | 2020-03-16 | 2025-05-20 | Apple Inc. | Devices, methods, and graphical user interfaces for providing computer-generated experiences |
| US11727650B2 (en) | 2020-03-17 | 2023-08-15 | Apple Inc. | Systems, methods, and graphical user interfaces for displaying and manipulating virtual objects in augmented reality environments |
| CN111507390B (en) * | 2020-04-11 | 2023-07-04 | 华中科技大学 | A storage box recognition and location method based on contour features |
| CN113538558B (en) * | 2020-04-15 | 2023-10-20 | 深圳市光鉴科技有限公司 | Volume measurement optimization method, system, equipment and storage medium based on IR diagram |
| CN111524193B (en) * | 2020-04-17 | 2022-05-03 | 西安交通大学 | Method and device for measuring two-dimensional size of object |
| CN111784765B (en) * | 2020-06-03 | 2024-04-26 | Oppo广东移动通信有限公司 | Object measurement method, virtual object processing method, virtual object measurement device, virtual object processing device, medium and electronic equipment |
| TWI724926B (en) * | 2020-06-19 | 2021-04-11 | 阿丹電子企業股份有限公司 | Method for alarming and measuring of volume measuring device |
| EP3929111B1 (en) * | 2020-06-24 | 2025-08-13 | Tata Consultancy Services Limited | Apparatus and method for packing heterogeneous objects |
| CN112150527B (en) * | 2020-08-31 | 2024-05-17 | 深圳市慧鲤科技有限公司 | Measurement method and device, electronic equipment and storage medium |
| CN112184790B (en) * | 2020-09-02 | 2024-05-17 | 福建(泉州)哈工大工程技术研究院 | Object size high-precision measurement method based on depth camera |
| CN116547559A (en) * | 2020-09-22 | 2023-08-04 | 美国亚德诺半导体公司 | Z-plane identification and cassette sizing using three-dimensional time-of-flight imaging |
| US11615595B2 (en) | 2020-09-24 | 2023-03-28 | Apple Inc. | Systems, methods, and graphical user interfaces for sharing augmented reality environments |
| CN112115913B (en) * | 2020-09-28 | 2023-08-25 | 杭州海康威视数字技术股份有限公司 | Image processing method, device and equipment and storage medium |
| CN112529952B (en) * | 2020-12-15 | 2023-11-14 | 武汉万集光电技术有限公司 | Object volume measurement methods, devices and electronic equipment |
| US11995230B2 (en) | 2021-02-11 | 2024-05-28 | Apple Inc. | Methods for presenting and sharing content in an environment |
| US11551366B2 (en) * | 2021-03-05 | 2023-01-10 | Intermap Technologies, Inc. | System and methods for correcting terrain elevations under forest canopy |
| US20220319059A1 (en) * | 2021-03-31 | 2022-10-06 | Snap Inc | User-defined contextual spaces |
| US11941764B2 (en) | 2021-04-18 | 2024-03-26 | Apple Inc. | Systems, methods, and graphical user interfaces for adding effects in augmented reality environments |
| WO2022225795A1 (en) | 2021-04-18 | 2022-10-27 | Apple Inc. | Systems, methods, and graphical user interfaces for adding effects in augmented reality environments |
| US20230022065A1 (en) * | 2021-07-23 | 2023-01-26 | Aetrex, Inc. | Systems and methods for determining physical parameters of feet |
| WO2023009270A1 (en) * | 2021-07-29 | 2023-02-02 | Laitram, L.L.C. | Conveyed-object identification system |
| US12056888B2 (en) | 2021-09-07 | 2024-08-06 | Intermap Technologies, Inc. | Methods and apparatuses for calculating building heights from mono imagery |
| DE102021127789A1 (en) * | 2021-10-26 | 2023-04-27 | Zf Cv Systems Global Gmbh | Procedure for registering cargo |
| US12456271B1 (en) | 2021-11-19 | 2025-10-28 | Apple Inc. | System and method of three-dimensional object cleanup and text annotation |
| WO2023137402A1 (en) | 2022-01-12 | 2023-07-20 | Apple Inc. | Methods for displaying, selecting and moving objects and containers in an environment |
| WO2023141535A1 (en) | 2022-01-19 | 2023-07-27 | Apple Inc. | Methods for displaying and repositioning objects in an environment |
| US12541280B2 (en) | 2022-02-28 | 2026-02-03 | Apple Inc. | System and method of three-dimensional placement and refinement in multi-user communication sessions |
| US20230306625A1 (en) * | 2022-03-22 | 2023-09-28 | Qboid, Inc. | Object dimensioning system |
| CN114719759B (en) * | 2022-04-01 | 2023-01-03 | 南昌大学 | Object surface perimeter and area measurement method based on SLAM algorithm and image instance segmentation technology |
| US12469207B2 (en) | 2022-05-10 | 2025-11-11 | Apple Inc. | Systems, methods, and graphical user interfaces for scanning and modeling environments |
| DE102022113259B3 (en) * | 2022-05-25 | 2023-06-22 | Sick Ag | Method and device for recording master data of an object |
| US12112011B2 (en) | 2022-09-16 | 2024-10-08 | Apple Inc. | System and method of application-based three-dimensional refinement in multi-user communication sessions |
| CN120266077A (en) | 2022-09-24 | 2025-07-04 | 苹果公司 | Methods for controlling and interacting with a three-dimensional environment |
| US12524956B2 (en) | 2022-09-24 | 2026-01-13 | Apple Inc. | Methods for time of day adjustments for environments and environment presentation during communication sessions |
| JP2024062300A (en) * | 2022-10-24 | 2024-05-09 | キヤノン株式会社 | Image processing device, image processing method, and computer program |
| CN120813918A (en) | 2023-01-30 | 2025-10-17 | 苹果公司 | Devices, methods, and graphical user interfaces for displaying multiple sets of controls in response to gaze and/or gesture input |
| CN116188457B (en) * | 2023-04-18 | 2023-07-21 | 天津恒宇医疗科技有限公司 | Processing method and processing system of coronary angiography skeleton map |
| CN121187445A (en) | 2023-06-04 | 2025-12-23 | 苹果公司 | Method for managing overlapping windows and applying visual effects |
| US20250078302A1 (en) * | 2023-08-29 | 2025-03-06 | Wipro Limited | Method and system for automatically determining dimensions of a carton box |
| WO2025155278A1 (en) * | 2024-01-15 | 2025-07-24 | Siemens Aktiengesellschaft | System and method for estimation of object planar dimensions for autonomous handling of objects |
| CN120190144B (en) * | 2025-05-25 | 2025-08-15 | 矸石云(山西)科技有限公司 | Gangue treatment method, system, electronic equipment and storage medium |
Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030053671A1 (en) * | 2001-05-10 | 2003-03-20 | Piet Dewaele | Retrospective correction of inhomogeneities in radiographs |
| US20120200862A1 (en) * | 2011-02-08 | 2012-08-09 | Quantronix, Inc. | Object dimensioning system and related methods |
| US20130100286A1 (en) | 2011-10-21 | 2013-04-25 | Mesa Engineering, Inc. | System and method for predicting vehicle location |
| US20140104416A1 (en) * | 2012-10-16 | 2014-04-17 | Hand Held Products, Inc. | Dimensioning system |
| US20140159925A1 (en) | 2012-03-02 | 2014-06-12 | Leddartech Inc. | System and method for multipurpose traffic detection and characterization |
| US20150294511A1 (en) | 2014-04-09 | 2015-10-15 | Imagination Technologies Limited | Virtual Camera for 3-D Modeling Applications |
| US20160189426A1 (en) | 2014-12-30 | 2016-06-30 | Mike Thomas | Virtual representations of real-world objects |
| US20160196659A1 (en) * | 2015-01-05 | 2016-07-07 | Qualcomm Incorporated | 3d object segmentation |
| US9392262B2 (en) | 2014-03-07 | 2016-07-12 | Aquifi, Inc. | System and method for 3D reconstruction using multiple multi-channel cameras |
| US9516295B2 (en) | 2014-06-30 | 2016-12-06 | Aquifi, Inc. | Systems and methods for multi-channel imaging based on multiple exposure settings |
| US20170251143A1 (en) | 2016-02-29 | 2017-08-31 | Aquifi, Inc. | System and method for assisted 3d scanning |
| US20170264880A1 (en) * | 2016-03-14 | 2017-09-14 | Symbol Technologies, Llc | Device and method of dimensioning using digital images and depth data |
| US20170284799A1 (en) * | 2013-01-07 | 2017-10-05 | Wexenergy Innovations Llc | System and method of measuring distances related to an object utilizing ancillary objects |
| US9826216B1 (en) | 2014-11-03 | 2017-11-21 | Aquifi, Inc. | Systems and methods for compact space-time stereo three-dimensional depth sensing |
| WO2019136315A2 (en) | 2018-01-05 | 2019-07-11 | Aquifi, Inc. | Systems and methods for volumetric sizing |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU2015101099A6 (en) * | 2015-08-10 | 2016-03-10 | Wisetech Global Limited | Volumetric estimation methods, devices, & systems |
| US10404970B2 (en) * | 2015-11-16 | 2019-09-03 | Intel Corporation | Disparity search range compression |
| US11047672B2 (en) * | 2017-03-28 | 2021-06-29 | Hand Held Products, Inc. | System for optically dimensioning |
-
2019
- 2019-01-04 CA CA3125730A patent/CA3125730C/en active Active
- 2019-01-04 US US16/240,691 patent/US11341350B2/en active Active
- 2019-01-04 WO PCT/US2019/012434 patent/WO2019136315A2/en not_active Ceased
-
2022
- 2022-04-22 US US17/726,998 patent/US11709046B2/en active Active
-
2023
- 2023-06-06 US US18/330,248 patent/US12228388B2/en active Active
Patent Citations (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030053671A1 (en) * | 2001-05-10 | 2003-03-20 | Piet Dewaele | Retrospective correction of inhomogeneities in radiographs |
| US20120200862A1 (en) * | 2011-02-08 | 2012-08-09 | Quantronix, Inc. | Object dimensioning system and related methods |
| US20130100286A1 (en) | 2011-10-21 | 2013-04-25 | Mesa Engineering, Inc. | System and method for predicting vehicle location |
| US20140159925A1 (en) | 2012-03-02 | 2014-06-12 | Leddartech Inc. | System and method for multipurpose traffic detection and characterization |
| US20140104416A1 (en) * | 2012-10-16 | 2014-04-17 | Hand Held Products, Inc. | Dimensioning system |
| US20170284799A1 (en) * | 2013-01-07 | 2017-10-05 | Wexenergy Innovations Llc | System and method of measuring distances related to an object utilizing ancillary objects |
| US9392262B2 (en) | 2014-03-07 | 2016-07-12 | Aquifi, Inc. | System and method for 3D reconstruction using multiple multi-channel cameras |
| US20150294511A1 (en) | 2014-04-09 | 2015-10-15 | Imagination Technologies Limited | Virtual Camera for 3-D Modeling Applications |
| US9516295B2 (en) | 2014-06-30 | 2016-12-06 | Aquifi, Inc. | Systems and methods for multi-channel imaging based on multiple exposure settings |
| US9826216B1 (en) | 2014-11-03 | 2017-11-21 | Aquifi, Inc. | Systems and methods for compact space-time stereo three-dimensional depth sensing |
| US20160189426A1 (en) | 2014-12-30 | 2016-06-30 | Mike Thomas | Virtual representations of real-world objects |
| US20160196659A1 (en) * | 2015-01-05 | 2016-07-07 | Qualcomm Incorporated | 3d object segmentation |
| US20170251143A1 (en) | 2016-02-29 | 2017-08-31 | Aquifi, Inc. | System and method for assisted 3d scanning |
| US20170264880A1 (en) * | 2016-03-14 | 2017-09-14 | Symbol Technologies, Llc | Device and method of dimensioning using digital images and depth data |
| WO2019136315A2 (en) | 2018-01-05 | 2019-07-11 | Aquifi, Inc. | Systems and methods for volumetric sizing |
| US20190213389A1 (en) | 2018-01-05 | 2019-07-11 | Aquifi, Inc. | Systems and methods for volumetric sizing |
| US20220327847A1 (en) | 2018-01-05 | 2022-10-13 | Packsize Llc | Systems and Methods for Volumetric Sizing |
Non-Patent Citations (12)
| Title |
|---|
| Agarwal, Pankaj et al., Geometric Approximation via Coresets, Combinatorial and computational geometry, 2005, pp. 1-30, vol. 52, MSRI Publications, Berkeley, California. |
| Barequet, Gill et al., Efficiently Approximating the Minimum-Volume Bouding Box of a Point Set in Three Dimensions, Journal of Algorithms, Jun. 30, 2001, pp. 1-17, 38(1), Elsevier, Amsterdam, Netherlands. |
| Chang, Chia-Tche et al., Fast oriented bounding box optimization on the rotation group SO (3, R), ACM Transactions on Graphics, 2011, pp. 1-17, vol. 30, No. 5, Article 122, Association for Computing Machinery, New York, New York. |
| Gottschalk, S. et al., OBBTree: A Hierarchical Structure for Rapid Interference Detection, Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 171-180, Association for Computing Machinery, New York, New York. |
| Huebner, Kai et al., Minimum vol. Bounding Box Decomposition for Shape Approximation in Robot Grasping, IEEE International Conference on Robotics and Automation, 2008, pp. 1628-1633, IEEE, New York, New York. |
| International Search Report and Written Opinion received for PCT Patent Application No. PCT/US19/12434, mailed on Apr. 19, 2019, 11 pages. |
| Jia, Zhaoyin et al., 3D-Based Reasoning with Blocks, Support, and Stability, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1-8, IEEE, New York, New York. |
| Non-Final Office Action received for U.S. Appl. No. 17/726,998, mailed on Oct. 12, 2022, 10 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/726,998, mailed on Mar. 9, 2023, 9 pages. |
| O'Rourke, Joseph, Finding Minimal Enclosing Boxes, Reprinted from International Journal of Computer and Information Sciences, 1985, pp. 183-199, vol. 14, No. 3, Pienum Publishing Corporation, Belgium. |
| Szeliski, R., "Computer Vision: Algorithms and Applications", Springer, 2010 pp. 467-503. |
| Toussaint, Godfried, Solving Geometric Problems with the Rotating Calipers, Proceedings of IEEE MELECON'83, May 1983, pp. 1-8, Athens, Greece. |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2019136315A3 (en) | 2020-04-16 |
| US11341350B2 (en) | 2022-05-24 |
| WO2019136315A2 (en) | 2019-07-11 |
| US11709046B2 (en) | 2023-07-25 |
| US20230349686A1 (en) | 2023-11-02 |
| US20190213389A1 (en) | 2019-07-11 |
| CA3125730A1 (en) | 2019-07-11 |
| CA3125730C (en) | 2023-10-24 |
| US20220327847A1 (en) | 2022-10-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12228388B2 (en) | Systems and methods for volumetric sizing | |
| US11798152B2 (en) | Systems and methods for object dimensioning based on partial visual information | |
| EP3422955B1 (en) | System and method for assisted 3d scanning | |
| US11481915B2 (en) | Systems and methods for three-dimensional data acquisition and processing under timing constraints | |
| US20230063197A1 (en) | System and method for identifying items | |
| US10096131B2 (en) | Dimensional acquisition of packages | |
| US20200380229A1 (en) | Systems and methods for text and barcode reading under perspective distortion | |
| US11568511B2 (en) | System and method for sensing and computing of perceptual data in industrial environments | |
| US20160189419A1 (en) | Systems and methods for generating data indicative of a three-dimensional representation of a scene | |
| CN106575438A (en) | Combination of stereoscopic and structured light processing | |
| CN107525466A (en) | Automatic mode switching in Volume Dimensioner | |
| US11017548B2 (en) | Methods, systems, and apparatuses for computing dimensions of an object using range images | |
| JP2025178438A (en) | Measurement method, information processing device and program | |
| CN110260801A (en) | Method and apparatus for measuring volume of material | |
| US20250292519A1 (en) | Augmented reality feedback in user interfaces for dimensioning objects | |
| CN115427755A (en) | Filling rate measuring method, information processing device, and program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: PACKSIZE INTERNATIONAL, LLC, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AQUIFI, INC.;REEL/FRAME:063880/0816 Effective date: 20211029 Owner name: PACKSIZE, LLC, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PACKSIZE INTERNATIONAL, LLC;REEL/FRAME:063880/0895 Effective date: 20211123 Owner name: AQUIFI, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PERUCH, FRANCESCO;PASQUALOTTO, GIULIANO;MURALI, GIRIDHAR;AND OTHERS;SIGNING DATES FROM 20190102 TO 20190107;REEL/FRAME:063880/0725 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS Free format text: SECURITY INTEREST;ASSIGNOR:PACKSIZE LLC;REEL/FRAME:068730/0393 Effective date: 20240819 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., ILLINOIS Free format text: SECURITY INTEREST;ASSIGNOR:PACKSIZE LLC;REEL/FRAME:071282/0082 Effective date: 20250515 |