WO2016068869A1 - Three dimensional object recognition - Google Patents

Three dimensional object recognition Download PDF

Info

Publication number
WO2016068869A1
WO2016068869A1 PCT/US2014/062580 US2014062580W WO2016068869A1 WO 2016068869 A1 WO2016068869 A1 WO 2016068869A1 US 2014062580 W US2014062580 W US 2014062580W WO 2016068869 A1 WO2016068869 A1 WO 2016068869A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
point cloud
depth
dimensional
image
Prior art date
Application number
PCT/US2014/062580
Other languages
French (fr)
Inventor
Divya SHARMA
Kar-Han Tan
Daniel R Tretter
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to EP14904836.5A priority Critical patent/EP3213292A4/en
Priority to US15/518,412 priority patent/US20170308736A1/en
Priority to CN201480083119.8A priority patent/CN107077735A/en
Priority to PCT/US2014/062580 priority patent/WO2016068869A1/en
Priority to TW104131293A priority patent/TWI566204B/en
Publication of WO2016068869A1 publication Critical patent/WO2016068869A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/653Three-dimensional objects by matching three-dimensional models, e.g. conformal mapping of Riemann surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • a visual sensor captures visual data associated with an image of an object in a field of view. Such data can include data regarding the color of the object, data regarding the depth of the object, and other data regarding the image.
  • a cluster of visual sensors can be applied to certain application. Visual data captured by the sensors can be combined and processed to perform a task of an application.
  • Figure 1 is a block diagram illustrating an example system of the present disclosure.
  • Figure 2 is a schematic diagram of an example of the system of Figure 1.
  • Figure 3 is a block diagram illustrating an example method that can be performed with the system of Figure 1.
  • Figure 4 is a block diagram illustrating an example system constructed in accordance with the system of Figure 1.
  • Figure 5 is a block diagram illustrating an example computer system that can be used to implement the system of Figure 1 and perform the methods of Figures 3 and 4.
  • Figure 1 illustrates an example method 100 that can be applied as a user application or system to robustly and accurately recognize objects in a 3D image.
  • a 3D scanner 102 is used to generate one or more images of one or more real objects 104 placed in the field of view.
  • the 3D scanner can include color sensors and depth sensors each generating an image of an object.
  • images from each of the sensors are calibrated and then merged together to form a corrected 3D image to be stored as a point cloud.
  • a point cloud is a set of data points in some coordinate system stored as a data file.
  • x, y, and z coordinates usually defines these points, and often are intended to represent the external surface of the real object 104.
  • the 3D scanner 102 measures a large number of points on an object's surface, and outputs the point cloud as a data file having spatial information of the object.
  • the point cloud represents the set of points that the device has measured.
  • Segmentation 106 applies algorithms to the point cloud to detect the boundaries of the object or objects in the image.
  • Recognition 108 includes matching the features of the segmented objects to a set of known features, such as by comparing the data regarding the segmented object to predefined data in a tangible storage medium such as a computer memory.
  • FIG. 2 illustrates a particular example system 200 applying method 100 where like parts of Figure 1 have like reference numerals in Figure 2.
  • System 200 includes sensor cluster module 202 used to scan the objects 104 and input data into a computer 204 running an object detection application.
  • the computer 204 includes a display 206 to render images and/or interfaces of the object detection application.
  • the sensor cluster module 202 includes a field of view 208.
  • the objects 104 are placed a generally planar surface, such as a tabletop, within the field of view 208 from the sensor cluster module 202.
  • the system 200 can include a generally planar platform 210 within the field of view 208 receives the object 104.
  • the platform 210 is stationary, but it is contemplated that the platform 210 can include a turntable that can rotate the object 104 about an axis with respect to the sensor cluster module 202.
  • System 200 shows an example where objects 104 are placed on a generally planar surface in a field of view 208 of an overhead sensor cluster module 202.
  • Object 104 placed within the field of view 208 can be scanned and input one or more times.
  • a turntable on platform 210 can rotate the object 104 about the z-axis with respect to the sensor cluster module 202 when multiple views of the objects 104 is input.
  • multiple sensor cluster modules 202 can be used, or the sensor cluster module 202 can provide a scan of the object and projection of the image without having to move the object 104 and while the object is in any or most orientations with respect to the sensor cluster module 202.
  • Sensor cluster module 202 can include a set of heterogeneous visual sensors to capture visual data of an object in a field of view 208.
  • the module 202 includes one or more depth sensors and one or more color sensors.
  • a depth sensor is a visual sensor used to capture depth data of the object.
  • depth generally refers to the distance of the object from the depth sensor.
  • Depth data can be developed for each pixel of each depth sensor, and the depth data is used to create a 3D
  • a depth sensor is relatively robust against effects due to a change in light, shadow, color, or a dynamic background.
  • a color sensor is a visual sensor used to collect color data in a visible color space, such as a red-green-blue (RGB) color space or other color space, which can be used to detect the colors of the object 104.
  • RGB red-green-blue
  • a depth sensor and a color sensor can be included a depth camera and color camera, respectively.
  • the depth sensor and color sensor can be combined in a color/depth camera.
  • the depth sensor and color sensor have overlapping fields of view indicated in the example as filed of view 208.
  • a sensor cluster module 108 can include multiple sets of spaced-apart heterogeneous visual sensors that can capture depth and color data from various different angles of the object 104.
  • the sensor cluster module 202 can capture the depth and color data as a snapshot scan to create a 3D image frame.
  • An image frame refers to a collection of visual data at particular point in time.
  • the sensor cluster module can capture the depth and color data as a continuous scan as a series of image frames over the course of time.
  • a continuous scan can include image frames staggered over the course of time in periodic or aperiodic intervals of time.
  • the sensor cluster module 202 can be used to detect the object and then later to detect the location and orientation of the object.
  • the 3D images are stored as point cloud data files in a computer memory either locally or remotely from the sensor cluster module 202 or computer 204.
  • a user application such as an object recognition application having tools such as point cloud libraries, can access the data files.
  • Point cloud libraries with object recognition applications typically include 3D object recognition algorithms applied to 3D point clouds. The complexity in applying these algorithms increases exponentially as the size, or amount of data points, in the point cloud increases. Accordingly, 3D object recognition algorithms applied to large data files become slow and inefficient. Further, the 3D object recognition algorithms are not well suited for 3D scanners having visual sensors of different resolutions. In such circumstances, a developer will tune the algorithms using a complicated process in order to recognize objects created with sensors of different resolutions. Still further, these algorithms are built around random sampling of the data in the point cloud and data fitting and are not particularly accurate. For example, multiple applications of the 3D object recognition algorithms often do not generate the same result.
  • Figure 3 illustrates an example of a robust and efficient method 300 to quickly segment and recognize objects 104 placed on a generally planar base in the field of view 208 of a sensor cluster module 202.
  • the texture of the objects 104 stored as two- dimensional data, is analyzed to recognize the objects. Segmentation and recognition can be performed real time without the inefficiencies of bloated 3D point cloud processing. Processing in the 2D space allows for the use of more sophisticated and accurate feature recognition algorithms. Merging this information with 3D cues improves the accuracy and robustness of segmentation and recognition.
  • method 300 can be implemented as a set of machine readable instructions on a computer readable medium.
  • a 3D image of an object 104 is received at 302.
  • image information for each sensor is often calibrated to create an accurate 3D point cloud of the object 104 including coordinates such as (x, y, z).
  • This point cloud includes 3D images of the objects as well as the generally planar base on which the objects are placed.
  • the received 3D image may include unwanted outlier data that can be removed with tools such as a pass-through filter. Many, if not all, of the points that do not fall in the permissible depth range from camera are removed.
  • the base, or generally planar surface, on which the object 104 is placed, is removed from the point cloud at 304.
  • a plane fitting technique is used to remove the base from the point cloud.
  • One such plane fitting technique can be found in tools applying RANSAC (Random sample consensus), which is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers.
  • the outliers can be the images of the objects 104 and the inliers can be the image of the planar base.
  • the base on which the object is placed can deviate from a true plane.
  • plane-fitting tools are able to detect the base if it is generally planar to the naked eye. Other plane-fitting techniques can be used.
  • the 3D data from the point cloud is used to remove the planar surface from the image.
  • the point cloud with the base removed can be used as a mask to detect the object 104 in the image.
  • the mask includes data points representing the object 104.
  • the 2D data developed at 304 is suitable for segmentation at 306 with more sophisticated techniques than those typically used on a 3D point cloud.
  • the 2D planar image of the object is subjected to a contour analysis for segmentation.
  • contour analysis includes a topological structural analysis of digitized binary images using border following technique, which is available in OpenCV available under a form of permissive free software license.
  • OpenCV or Open Source Computer Vision
  • Another technique can be Moore's Neighbour tracing algorithm to find the boundary of object from processed 2D image data.
  • Segmentation 306 can also distinguish multiple objects in the 2D image data from each other.
  • the segmented object image is given a label, which may be different than other objects in the 2D image data, and the label is a representation of the object in 3D space.
  • a label mask is generated containing all the objects assigned a label. Further processing can be applied to remove unexpected or ghost contours, if any appear in the 2D image data.
  • the label mask can be applied to recognize the object 104 at 308.
  • corrected depth data is used to find the object's height, orientation, or other characteristics of a 3D object.. This way without processing or clustering the 3D point cloud, additional characteristics can be determined from the 2D image data to refine and improve the segmentation from the color sensor.
  • the color data corresponding to each label is extracted and used in feature matching for object recognition.
  • the color data can be compared to data regarding to known objects, which can be retrieved from a storage device, to determine a match.
  • Color data can correspond with intensity data, and several sophisticated algorithms are available to match objects based on features derived from the intensity data. Accordingly, the recognition is more robust than randomized algorithms.
  • Figure 4 illustrates an example system 400 for applying method 300.
  • the system 400 includes the sensor cluster module 202 to generate color and depth images of the object 104 or objects on a base, such as a generally planar surface.
  • the images from the sensor are provided to a calibration model 402 to generate a 3D point cloud to be stored as a data file in a tangible computer memory device 404.
  • a conversion module 406 receives the 3D data file and applies conversion tools 408, such as RANSAC, to remove the base from the 3D data file and create a 2D image data of the object with an approximate segmentation providing label of each segmented object along with other 3D characteristics such as height, which can be stored as a data file in the memory 404.
  • conversion tools 408 such as RANSAC
  • a segmentation module 410 can receive the data file of the 2D representation of the object and applies segmentation tools 412 to determine the boundaries of the object image.
  • the segmentation tools 412 can include contour analysis on the 2D image data, which is faster and more accurate than techniques to determine images in 3D representations.
  • the segmented object images can be given a label that represents the object in a 3D space.
  • a recognition module 414 can also receive the data file of the 2D image data.
  • the recognition module 414 can apply recognition tools 416 to the data file of the 2D image data to determine the height, orientation and other characteristics of the object 104.
  • the color data in the 2D image that corresponds to each label is extracted and used in feature matching for recognizing object.
  • the color data can be compared to data regarding to known objects, which can be retrieved from a storage device, to determine a match.
  • Example method 300 and system 400 provide a real time
  • Figure 5 illustrates an example computer system that can be employed in an operating environment and used to host or run a computer application implementing an example method 300 as included on one or more computer readable storage mediums storing computer executable instructions for controlling the computer system, such as a computing device, to perform a process.
  • the computer system of Figure 5 can be used to implement the modules and its associated tools set forth in system 400.
  • the exemplary computer system of Figure 5 includes a computing device, such as computing device 500.
  • Computing device 500 typically includes one or more processors 502 and memory 504.
  • the processors 502 may include two or more processing cores on a chip or two or more processor chips.
  • the computing device 500 can also have one or more additional processing or specialized processors (not shown), such as a graphics processor for general-purpose computing on graphics processor units, to perform processing functions offloaded from the processor 502.
  • Memory 504 may be arranged in a hierarchy and may include one or more levels of cache. Memory 504 may be volatile (such as random access memory (RAM)), nonvolatile (such as read only memory (ROM), flash memory, etc.), or some combination of the two.
  • RAM random access memory
  • ROM read only memory
  • flash memory etc.
  • the computing device 500 can take one or more of several forms.
  • Computing device 500 often includes one or more input and/or output connections, such as USB connections, display ports, proprietary connections, and others to connect to various devices to receive and/or provide inputs and outputs.
  • Input devices 510 may include devices such as keyboard, pointing device (e.g., mouse), pen, voice input device, touch input device, or other.
  • Output devices 512 may include devices such as a display, speakers, printer, or the like.
  • Computing device 500 often includes one or more communication connections 514 that allow computing device 500 to communicate with other computers/applications 516.
  • Example communication connections can include, but are not limited to, an Ethernet interface, a wireless interface, a bus interface, a storage area network interface, a proprietary interface.
  • the communication connections can be used to couple the computing device 500 to a computer network 518, which is a collection of computing devices and possibly other devices interconnected by communications channels that facilitate communications and allows sharing of resources and information among interconnected devices.
  • Examples of computer networks include a local area network, a wide area network, the Internet, or other network.
  • Computing device 500 can be configured to run an operating system software program and one or more computer applications, which make up a system platform.
  • a computer application configured to execute on the computing device 500 is typically provided as set of instructions written in a programming language.
  • a computer application configured to execute on the computing device 500 includes at least one computing process (or computing task), which is an executing program. Each computing process provides the computing resources to execute the program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

A methods and system for recognizing a three dimensional object on a base are disclosed. A three dimensional image of the object is received as a three-dimensional point cloud having depth data and color data. The base is removed from the three dimensional image, and the three-dimensional point cloud with the base removed is converted into a two-dimensional point cloud representing the object. The two-dimensional point cloud is segmented to determine object boundaries of a detected object. The depth data is applied to determine height of the detected object, and color data is applied to match the detected object to a reference object data.

Description

THREE DIMENSIONAL OBJECT RECOGNITION
Background
[0001] A visual sensor captures visual data associated with an image of an object in a field of view. Such data can include data regarding the color of the object, data regarding the depth of the object, and other data regarding the image. A cluster of visual sensors can be applied to certain application. Visual data captured by the sensors can be combined and processed to perform a task of an application.
Brief Description of the Drawings
[0002] Figure 1 is a block diagram illustrating an example system of the present disclosure.
[0003] Figure 2 is a schematic diagram of an example of the system of Figure 1.
[0004] Figure 3 is a block diagram illustrating an example method that can be performed with the system of Figure 1.
[0005] Figure 4 is a block diagram illustrating an example system constructed in accordance with the system of Figure 1.
[0006] Figure 5 is a block diagram illustrating an example computer system that can be used to implement the system of Figure 1 and perform the methods of Figures 3 and 4.
Detailed Description
[0007] In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific examples in which the disclosure may be practiced. It is to be understood that other examples may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. It is to be understood that features of the various examples described herein may be combined, in part or whole, with each other, unless specifically noted otherwise.
[0008] The following disclosure relates to an improved method and system to segment and recognize objects in a three dimensional image. Figure 1 illustrates an example method 100 that can be applied as a user application or system to robustly and accurately recognize objects in a 3D image. A 3D scanner 102 is used to generate one or more images of one or more real objects 104 placed in the field of view. In one example, the 3D scanner can include color sensors and depth sensors each generating an image of an object. In the case of multiple sensors, images from each of the sensors are calibrated and then merged together to form a corrected 3D image to be stored as a point cloud. A point cloud is a set of data points in some coordinate system stored as a data file. In a 3D coordinate system, x, y, and z coordinates usually defines these points, and often are intended to represent the external surface of the real object 104. The 3D scanner 102 measures a large number of points on an object's surface, and outputs the point cloud as a data file having spatial information of the object. The point cloud represents the set of points that the device has measured. Segmentation 106 applies algorithms to the point cloud to detect the boundaries of the object or objects in the image. Recognition 108 includes matching the features of the segmented objects to a set of known features, such as by comparing the data regarding the segmented object to predefined data in a tangible storage medium such as a computer memory.
[0009] Figure 2 illustrates a particular example system 200 applying method 100 where like parts of Figure 1 have like reference numerals in Figure 2. System 200 includes sensor cluster module 202 used to scan the objects 104 and input data into a computer 204 running an object detection application. In the example, the computer 204 includes a display 206 to render images and/or interfaces of the object detection application. The sensor cluster module 202 includes a field of view 208. The objects 104 are placed a generally planar surface, such as a tabletop, within the field of view 208 from the sensor cluster module 202. Optionally, the system 200 can include a generally planar platform 210 within the field of view 208 receives the object 104. In one example, the platform 210 is stationary, but it is contemplated that the platform 210 can include a turntable that can rotate the object 104 about an axis with respect to the sensor cluster module 202. System 200 shows an example where objects 104 are placed on a generally planar surface in a field of view 208 of an overhead sensor cluster module 202.
[0010] Object 104 placed within the field of view 208 can be scanned and input one or more times. A turntable on platform 210 can rotate the object 104 about the z-axis with respect to the sensor cluster module 202 when multiple views of the objects 104 is input. In some examples, multiple sensor cluster modules 202 can be used, or the sensor cluster module 202 can provide a scan of the object and projection of the image without having to move the object 104 and while the object is in any or most orientations with respect to the sensor cluster module 202.
[0011] Sensor cluster module 202 can include a set of heterogeneous visual sensors to capture visual data of an object in a field of view 208. In one example, the module 202 includes one or more depth sensors and one or more color sensors. A depth sensor is a visual sensor used to capture depth data of the object. In one example, depth generally refers to the distance of the object from the depth sensor. Depth data can be developed for each pixel of each depth sensor, and the depth data is used to create a 3D
representation of the object. Generally, a depth sensor is relatively robust against effects due to a change in light, shadow, color, or a dynamic background. A color sensor is a visual sensor used to collect color data in a visible color space, such as a red-green-blue (RGB) color space or other color space, which can be used to detect the colors of the object 104. In one example, a depth sensor and a color sensor can be included a depth camera and color camera, respectively. In another example, the depth sensor and color sensor can be combined in a color/depth camera. Generally, the depth sensor and color sensor have overlapping fields of view indicated in the example as filed of view 208. In one example, a sensor cluster module 108 can include multiple sets of spaced-apart heterogeneous visual sensors that can capture depth and color data from various different angles of the object 104.
[0012] In one example, the sensor cluster module 202 can capture the depth and color data as a snapshot scan to create a 3D image frame. An image frame refers to a collection of visual data at particular point in time. In another example, the sensor cluster module can capture the depth and color data as a continuous scan as a series of image frames over the course of time. In one example, a continuous scan can include image frames staggered over the course of time in periodic or aperiodic intervals of time. For example, the sensor cluster module 202 can be used to detect the object and then later to detect the location and orientation of the object.
[0013] The 3D images are stored as point cloud data files in a computer memory either locally or remotely from the sensor cluster module 202 or computer 204. A user application, such as an object recognition application having tools such as point cloud libraries, can access the data files. Point cloud libraries with object recognition applications typically include 3D object recognition algorithms applied to 3D point clouds. The complexity in applying these algorithms increases exponentially as the size, or amount of data points, in the point cloud increases. Accordingly, 3D object recognition algorithms applied to large data files become slow and inefficient. Further, the 3D object recognition algorithms are not well suited for 3D scanners having visual sensors of different resolutions. In such circumstances, a developer will tune the algorithms using a complicated process in order to recognize objects created with sensors of different resolutions. Still further, these algorithms are built around random sampling of the data in the point cloud and data fitting and are not particularly accurate. For example, multiple applications of the 3D object recognition algorithms often do not generate the same result.
[0014] Figure 3 illustrates an example of a robust and efficient method 300 to quickly segment and recognize objects 104 placed on a generally planar base in the field of view 208 of a sensor cluster module 202. The texture of the objects 104, stored as two- dimensional data, is analyzed to recognize the objects. Segmentation and recognition can be performed real time without the inefficiencies of bloated 3D point cloud processing. Processing in the 2D space allows for the use of more sophisticated and accurate feature recognition algorithms. Merging this information with 3D cues improves the accuracy and robustness of segmentation and recognition. In one example, method 300 can be implemented as a set of machine readable instructions on a computer readable medium.
[0015] A 3D image of an object 104 is received at 302. When an image taken with color sensor and an image taken with the depth sensor are used to create the 3D image, image information for each sensor is often calibrated to create an accurate 3D point cloud of the object 104 including coordinates such as (x, y, z). This point cloud includes 3D images of the objects as well as the generally planar base on which the objects are placed. In some examples, the received 3D image may include unwanted outlier data that can be removed with tools such as a pass-through filter. Many, if not all, of the points that do not fall in the permissible depth range from camera are removed.
[0016] The base, or generally planar surface, on which the object 104 is placed, is removed from the point cloud at 304. In one example, a plane fitting technique is used to remove the base from the point cloud. One such plane fitting technique can be found in tools applying RANSAC (Random sample consensus), which is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers. In this case, the outliers can be the images of the objects 104 and the inliers can be the image of the planar base. Accordingly, depending on the sophistication of the plane fitting tool, the base on which the object is placed can deviate from a true plane. In typical cases, plane-fitting tools are able to detect the base if it is generally planar to the naked eye. Other plane-fitting techniques can be used.
[0017] In this example, the 3D data from the point cloud is used to remove the planar surface from the image. The point cloud with the base removed can be used as a mask to detect the object 104 in the image. The mask includes data points representing the object 104. Once the base has been subtracted from the image, the 3D point cloud is projected onto a 2D plane having depth information but using much less storage space than the 3D point cloud.
[0018] The 2D data developed at 304 is suitable for segmentation at 306 with more sophisticated techniques than those typically used on a 3D point cloud. In one example, the 2D planar image of the object is subjected to a contour analysis for segmentation. An example of contour analysis includes a topological structural analysis of digitized binary images using border following technique, which is available in OpenCV available under a form of permissive free software license. OpenCV, or Open Source Computer Vision, is a cross-platform library of programming functions generally directed at real-time computer vision. Another technique can be Moore's Neighbour tracing algorithm to find the boundary of object from processed 2D image data.
Segmentation 306 can also distinguish multiple objects in the 2D image data from each other. The segmented object image is given a label, which may be different than other objects in the 2D image data, and the label is a representation of the object in 3D space. A label mask is generated containing all the objects assigned a label. Further processing can be applied to remove unexpected or ghost contours, if any appear in the 2D image data.
[0019] The label mask can be applied to recognize the object 104 at 308. In one example, corrected depth data is used to find the object's height, orientation, or other characteristics of a 3D object.. This way without processing or clustering the 3D point cloud, additional characteristics can be determined from the 2D image data to refine and improve the segmentation from the color sensor.
[0020] The color data corresponding to each label is extracted and used in feature matching for object recognition. In one example, the color data can be compared to data regarding to known objects, which can be retrieved from a storage device, to determine a match. Color data can correspond with intensity data, and several sophisticated algorithms are available to match objects based on features derived from the intensity data. Accordingly, the recognition is more robust than randomized algorithms.
[0021] Figure 4 illustrates an example system 400 for applying method 300. In one example, the system 400 includes the sensor cluster module 202 to generate color and depth images of the object 104 or objects on a base, such as a generally planar surface. The images from the sensor are provided to a calibration model 402 to generate a 3D point cloud to be stored as a data file in a tangible computer memory device 404. A conversion module 406 receives the 3D data file and applies conversion tools 408, such as RANSAC, to remove the base from the 3D data file and create a 2D image data of the object with an approximate segmentation providing label of each segmented object along with other 3D characteristics such as height, which can be stored as a data file in the memory 404.
[0022] A segmentation module 410 can receive the data file of the 2D representation of the object and applies segmentation tools 412 to determine the boundaries of the object image. As described above, the segmentation tools 412 can include contour analysis on the 2D image data, which is faster and more accurate than techniques to determine images in 3D representations. The segmented object images can be given a label that represents the object in a 3D space.
[0023] A recognition module 414 can also receive the data file of the 2D image data. The recognition module 414 can apply recognition tools 416 to the data file of the 2D image data to determine the height, orientation and other characteristics of the object 104. The color data in the 2D image that corresponds to each label is extracted and used in feature matching for recognizing object. In one example, the color data can be compared to data regarding to known objects, which can be retrieved from a storage device, to determine a match.
[0024] No current, generally available solution that merges depth data and color data performs a faster and more accurate 3D object segmentation and recognition than that describe above. Example method 300 and system 400 provide a real time
implementation that provides faster, more accurate results consuming less memory for segmenting and recognizing 3D data than using a 3D point cloud.
[0025] Figure 5 illustrates an example computer system that can be employed in an operating environment and used to host or run a computer application implementing an example method 300 as included on one or more computer readable storage mediums storing computer executable instructions for controlling the computer system, such as a computing device, to perform a process. In one example, the computer system of Figure 5 can be used to implement the modules and its associated tools set forth in system 400.
[0026] The exemplary computer system of Figure 5 includes a computing device, such as computing device 500. Computing device 500 typically includes one or more processors 502 and memory 504. The processors 502 may include two or more processing cores on a chip or two or more processor chips. In some examples, the computing device 500 can also have one or more additional processing or specialized processors (not shown), such as a graphics processor for general-purpose computing on graphics processor units, to perform processing functions offloaded from the processor 502. Memory 504 may be arranged in a hierarchy and may include one or more levels of cache. Memory 504 may be volatile (such as random access memory (RAM)), nonvolatile (such as read only memory (ROM), flash memory, etc.), or some combination of the two. The computing device 500 can take one or more of several forms. Such forms include a tablet, a personal computer, a workstation, a server, a handheld device, a consumer electronic device (such as a video game console or a digital video recorder), or other, and can be a stand-alone device or configured as part of a computer network, computer cluster, cloud services infrastructure, or other.
[0027] Computing device 500 may also include additional storage 508. Storage 508 may be removable and/or non-removable and can include magnetic or optical disks or solid-state memory, or flash storage devices. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any suitable method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. A propagating signal by itself does not qualify as storage media.
[0028] Computing device 500 often includes one or more input and/or output connections, such as USB connections, display ports, proprietary connections, and others to connect to various devices to receive and/or provide inputs and outputs. Input devices 510 may include devices such as keyboard, pointing device (e.g., mouse), pen, voice input device, touch input device, or other. Output devices 512 may include devices such as a display, speakers, printer, or the like. Computing device 500 often includes one or more communication connections 514 that allow computing device 500 to communicate with other computers/applications 516. Example communication connections can include, but are not limited to, an Ethernet interface, a wireless interface, a bus interface, a storage area network interface, a proprietary interface. The communication connections can be used to couple the computing device 500 to a computer network 518, which is a collection of computing devices and possibly other devices interconnected by communications channels that facilitate communications and allows sharing of resources and information among interconnected devices. Examples of computer networks include a local area network, a wide area network, the Internet, or other network.
[0029] Computing device 500 can be configured to run an operating system software program and one or more computer applications, which make up a system platform. A computer application configured to execute on the computing device 500 is typically provided as set of instructions written in a programming language. A computer application configured to execute on the computing device 500 includes at least one computing process (or computing task), which is an executing program. Each computing process provides the computing resources to execute the program.
[0030] Although specific examples have been illustrated and described herein, a variety of alternate and/or equivalent implementations may be substituted for the specific examples shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific examples discussed herein. Therefore, it is intended that this disclosure be limited only by the claims and the equivalents thereof.

Claims

1. A processor-implemented method for recognizing a three dimensional object on a base, comprising:
receiving a three dimensional image of the object as a three dimensional point cloud having spatial information of the object;
removing the base from the three dimensional point cloud to generate a two dimensional image representing the object;
segmenting the two dimensional image to determine object boundaries; and applying color data from the object to refine segmentation and match the detected object to a reference object data.
2. The method of claim 1 comprising calibrating the color data and depth data to generate the three dimensional image of the object.
3. The method of claim 1 wherein removing the base includes applying an iterative process to estimate parameters of a model from a set of observed data that contains outliers that represent the object.
4. The method of claim 1 wherein the base is generally planar.
5 The method of claim 1 wherein the two-dimensional point cloud includes a mask including data representing the object.
6. The method of claim 1 wherein the segmenting includes distinguishing multiple objects in the point cloud from each other.
7. The method of claim 1 wherein the segmenting includes attaching a label to the detected object.
8. The method of claim wherein applying depth data includes determining the orientation of the detected object.
9. A computer readable medium for storing computer executable instructions for controlling a computing device having a processor and memory to perform a method for recognizing a three dimensional object on a base, the method comprising:
receiving a three dimensional image of the object as a three dimensional point cloud as data file in the memory, the three dimensional point cloud having depth data; removing, with the processor, the base from the three dimensional point cloud to generate a two dimensional image in the memory representing the object;
segmenting, with the processor, the two dimensional image to determine object boundaries;
applying, with the processor, the depth data to determine height of the object; and
applying, with the processor, color data from the image to match the object to a reference object data.
10. The computer readable medium of claim 9 wherein removing the base is performed with a plane fitting technique.
11. The computer readable medium of claim 9 wherein removing the segmenting is performed with a contour analysis algorithm
12. A system for recognizing a three dimensional object on a base, comprising: a module for receiving a first data file representing a three dimensional image of the object as a three dimensional point cloud having depth data;
a conversion module operating on a processor and configured to remove the base from the three dimensional point cloud into a second data file representing a two dimensional image of the object to be stored in a memory device;
a segmenting module to determine object boundaries in the two dimensional image; and a detection module operating on the processor and configured to apply the depth data to determine height of the object, and configured to apply color data from the image to match the object to a reference object data.
13. The system of claim 12 comprising a color sensor configured to generate a color image having color data and a depth sensor configured to generate a depth image having depth data.
14. The system of claim 13 wherein the color sensor and depth sensor are configured as a color/depth camera.
15. The system of claim 13 wherein the color/depth camera includes a field of view and comprising a turntable configured as the base and disposed in the field of view.
PCT/US2014/062580 2014-10-28 2014-10-28 Three dimensional object recognition WO2016068869A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP14904836.5A EP3213292A4 (en) 2014-10-28 2014-10-28 Three dimensional object recognition
US15/518,412 US20170308736A1 (en) 2014-10-28 2014-10-28 Three dimensional object recognition
CN201480083119.8A CN107077735A (en) 2014-10-28 2014-10-28 Three dimensional object is recognized
PCT/US2014/062580 WO2016068869A1 (en) 2014-10-28 2014-10-28 Three dimensional object recognition
TW104131293A TWI566204B (en) 2014-10-28 2015-09-22 Three dimensional object recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/062580 WO2016068869A1 (en) 2014-10-28 2014-10-28 Three dimensional object recognition

Publications (1)

Publication Number Publication Date
WO2016068869A1 true WO2016068869A1 (en) 2016-05-06

Family

ID=55857986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/062580 WO2016068869A1 (en) 2014-10-28 2014-10-28 Three dimensional object recognition

Country Status (5)

Country Link
US (1) US20170308736A1 (en)
EP (1) EP3213292A4 (en)
CN (1) CN107077735A (en)
TW (1) TWI566204B (en)
WO (1) WO2016068869A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590836A (en) * 2017-09-14 2018-01-16 斯坦德机器人(深圳)有限公司 A kind of charging pile Dynamic Recognition based on Kinect and localization method and system
CN107679458A (en) * 2017-09-07 2018-02-09 中国地质大学(武汉) The extracting method of roadmarking in a kind of road color laser point cloud based on K Means
WO2018199958A1 (en) * 2017-04-27 2018-11-01 Hewlett-Packard Development Company, L.P. Object recognition
WO2019052318A1 (en) * 2017-09-13 2019-03-21 杭州海康威视数字技术股份有限公司 Method, apparatus and system for monitoring elevator car
WO2020043041A1 (en) * 2018-08-27 2020-03-05 腾讯科技(深圳)有限公司 Method and device for point cloud data partitioning, storage medium, and electronic device
CN113128515A (en) * 2021-04-29 2021-07-16 西北农林科技大学 Online fruit and vegetable recognition system and method based on RGB-D vision

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025642B (en) * 2016-01-27 2018-06-22 百度在线网络技术(北京)有限公司 Vehicle's contour detection method and device based on point cloud data
JP6837498B2 (en) * 2016-06-03 2021-03-03 ウトゥク・ビュユクシャヒンUtku BUYUKSAHIN Systems and methods for capturing and generating 3D images
US10841561B2 (en) * 2017-03-24 2020-11-17 Test Research, Inc. Apparatus and method for three-dimensional inspection
US10937182B2 (en) * 2017-05-31 2021-03-02 Google Llc Non-rigid alignment for volumetric performance capture
US10438371B2 (en) * 2017-09-22 2019-10-08 Zoox, Inc. Three-dimensional bounding box from two-dimensional image and point cloud data
US10558844B2 (en) * 2017-12-18 2020-02-11 Datalogic Ip Tech S.R.L. Lightweight 3D vision camera with intelligent segmentation engine for machine vision and auto identification
CN108345892B (en) * 2018-01-03 2022-02-22 深圳大学 Method, device and equipment for detecting significance of stereo image and storage medium
US10671835B2 (en) 2018-03-05 2020-06-02 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Object recognition
US11618438B2 (en) * 2018-03-26 2023-04-04 International Business Machines Corporation Three-dimensional object localization for obstacle avoidance using one-shot convolutional neural network
CN108647607A (en) * 2018-04-28 2018-10-12 国网湖南省电力有限公司 Objects recognition method for project of transmitting and converting electricity
CN109034418B (en) * 2018-07-26 2021-05-28 国家电网公司 Operation site information transmission method and system
CN109344750B (en) * 2018-09-20 2021-10-22 浙江工业大学 Complex structure three-dimensional object identification method based on structure descriptor
EP3861752A1 (en) * 2018-10-05 2021-08-11 InterDigital VC Holdings, Inc. A method and device for encoding and reconstructing missing points of a point cloud
CN110119721B (en) * 2019-05-17 2021-04-20 百度在线网络技术(北京)有限公司 Method and apparatus for processing information
JP7313998B2 (en) * 2019-09-18 2023-07-25 株式会社トプコン Survey data processing device, survey data processing method and program for survey data processing
CN111028238B (en) * 2019-12-17 2023-06-02 湖南大学 Robot vision-based three-dimensional segmentation method and system for complex special-shaped curved surface
WO2021134795A1 (en) * 2020-01-03 2021-07-08 Byton Limited Handwriting recognition of hand motion without physical media
US11074708B1 (en) * 2020-01-06 2021-07-27 Hand Held Products, Inc. Dark parcel dimensioning
CN113052797B (en) * 2021-03-08 2024-01-05 江苏师范大学 BGA solder ball three-dimensional detection method based on depth image processing
CN113219903B (en) * 2021-05-07 2022-08-19 东北大学 Billet optimal shearing control method and device based on depth vision
CN114638846A (en) * 2022-03-08 2022-06-17 北京京东乾石科技有限公司 Pickup pose information determination method, pickup pose information determination device, pickup pose information determination equipment and computer readable medium
TWI845450B (en) * 2023-11-24 2024-06-11 國立臺北科技大學 3d object outline data establishment system based on robotic arm and method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060285755A1 (en) * 2005-06-16 2006-12-21 Strider Labs, Inc. System and method for recognition in 2D images using 3D class models
KR20110044392A (en) * 2009-10-23 2011-04-29 삼성전자주식회사 Image processing apparatus and method
US20110273442A1 (en) * 2010-05-07 2011-11-10 Mvtec Software Gmbh Recognition and pose determination of 3d objects in 3d scenes
US20120114251A1 (en) * 2004-08-19 2012-05-10 Apple Inc. 3D Object Recognition
JP4940706B2 (en) * 2006-03-01 2012-05-30 トヨタ自動車株式会社 Object detection device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS4940706B1 (en) * 1969-09-03 1974-11-05
KR100707206B1 (en) * 2005-04-11 2007-04-13 삼성전자주식회사 Depth Image-based Representation method for 3D objects, Modeling method and apparatus using it, and Rendering method and apparatus using the same
TWI450216B (en) * 2008-08-08 2014-08-21 Hon Hai Prec Ind Co Ltd Computer system and method for extracting boundary elements
KR101619076B1 (en) * 2009-08-25 2016-05-10 삼성전자 주식회사 Method of detecting and tracking moving object for mobile platform
EP2569721A4 (en) * 2010-05-14 2013-11-27 Datalogic Adc Inc Systems and methods for object recognition using a large database
TWI433529B (en) * 2010-09-21 2014-04-01 Huper Lab Co Ltd Method for intensifying 3d objects identification
JP2014508954A (en) * 2011-03-22 2014-04-10 アナロジック コーポレイション Composite object segmentation method and system {COMPUNDOBJECTSEPARATION}
KR101907081B1 (en) * 2011-08-22 2018-10-11 삼성전자주식회사 Method for separating object in three dimension point clouds
WO2013182232A1 (en) * 2012-06-06 2013-12-12 Siemens Aktiengesellschaft Method for image-based alteration recognition
CN103207994B (en) * 2013-04-28 2016-06-22 重庆大学 A kind of motion object kind identification method based on multi-project mode key morphological characteristic
TWM478301U (en) * 2013-11-11 2014-05-11 Taiwan Teama Technology Co Ltd 3D scanning system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114251A1 (en) * 2004-08-19 2012-05-10 Apple Inc. 3D Object Recognition
US20060285755A1 (en) * 2005-06-16 2006-12-21 Strider Labs, Inc. System and method for recognition in 2D images using 3D class models
JP4940706B2 (en) * 2006-03-01 2012-05-30 トヨタ自動車株式会社 Object detection device
KR20110044392A (en) * 2009-10-23 2011-04-29 삼성전자주식회사 Image processing apparatus and method
US20110273442A1 (en) * 2010-05-07 2011-11-10 Mvtec Software Gmbh Recognition and pose determination of 3d objects in 3d scenes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3213292A4 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018199958A1 (en) * 2017-04-27 2018-11-01 Hewlett-Packard Development Company, L.P. Object recognition
CN110546651A (en) * 2017-04-27 2019-12-06 惠普发展公司,有限责任合伙企业 Object recognition
CN110546651B (en) * 2017-04-27 2023-04-04 惠普发展公司,有限责任合伙企业 Method, system and computer readable medium for identifying objects
US11030436B2 (en) 2017-04-27 2021-06-08 Hewlett-Packard Development Company, L.P. Object recognition
CN107679458B (en) * 2017-09-07 2020-09-29 中国地质大学(武汉) Method for extracting road marking lines in road color laser point cloud based on K-Means
CN107679458A (en) * 2017-09-07 2018-02-09 中国地质大学(武汉) The extracting method of roadmarking in a kind of road color laser point cloud based on K Means
WO2019052318A1 (en) * 2017-09-13 2019-03-21 杭州海康威视数字技术股份有限公司 Method, apparatus and system for monitoring elevator car
CN107590836B (en) * 2017-09-14 2020-05-22 斯坦德机器人(深圳)有限公司 Kinect-based charging pile dynamic identification and positioning method and system
CN107590836A (en) * 2017-09-14 2018-01-16 斯坦德机器人(深圳)有限公司 A kind of charging pile Dynamic Recognition based on Kinect and localization method and system
US11282210B2 (en) 2018-08-27 2022-03-22 Tencent Technology (Shenzhen) Company Limited Method and apparatus for segmenting point cloud data, storage medium, and electronic device
WO2020043041A1 (en) * 2018-08-27 2020-03-05 腾讯科技(深圳)有限公司 Method and device for point cloud data partitioning, storage medium, and electronic device
CN113128515A (en) * 2021-04-29 2021-07-16 西北农林科技大学 Online fruit and vegetable recognition system and method based on RGB-D vision
CN113128515B (en) * 2021-04-29 2024-05-31 西北农林科技大学 Online fruit and vegetable identification system and method based on RGB-D vision

Also Published As

Publication number Publication date
CN107077735A (en) 2017-08-18
TWI566204B (en) 2017-01-11
TW201629909A (en) 2016-08-16
EP3213292A1 (en) 2017-09-06
US20170308736A1 (en) 2017-10-26
EP3213292A4 (en) 2018-06-13

Similar Documents

Publication Publication Date Title
US20170308736A1 (en) Three dimensional object recognition
CN111127422B (en) Image labeling method, device, system and host
CN107388960B (en) A kind of method and device of determining object volume
US10373380B2 (en) 3-dimensional scene analysis for augmented reality operations
TWI395145B (en) Hand gesture recognition system and method
US8989455B2 (en) Enhanced face detection using depth information
US10223839B2 (en) Virtual changes to a real object
CN111178250A (en) Object identification positioning method and device and terminal equipment
Takimoto et al. 3D reconstruction and multiple point cloud registration using a low precision RGB-D sensor
JP6899189B2 (en) Systems and methods for efficiently scoring probes in images with a vision system
Song et al. DOE-based structured-light method for accurate 3D sensing
KR20130044099A (en) Method of image processing and device thereof
CN107272899B (en) VR (virtual reality) interaction method and device based on dynamic gestures and electronic equipment
US11816857B2 (en) Methods and apparatus for generating point cloud histograms
CN116134482A (en) Method and device for recognizing surface features in three-dimensional images
CN116958145A (en) Image processing method and device, visual detection system and electronic equipment
Zhao et al. Region-based saliency estimation for 3D shape analysis and understanding
Sert A new modified neutrosophic set segmentation approach
Sulaiman et al. DEFECT INSPECTION SYSTEM FOR SHAPE-BASED MATCHING USING TWO CAMERAS.
CN110458177B (en) Method for acquiring image depth information, image processing device and storage medium
JP5620741B2 (en) Information processing apparatus, information processing method, and program
JP5217917B2 (en) Object detection and tracking device, object detection and tracking method, and object detection and tracking program
JP6127958B2 (en) Information processing apparatus, information processing method, and program
KR101357581B1 (en) A Method of Detecting Human Skin Region Utilizing Depth Information
Fathi et al. Machine vision-based infrastructure as-built documentation using edge points

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14904836

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15518412

Country of ref document: US

REEP Request for entry into the european phase

Ref document number: 2014904836

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014904836

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE