US9501725B2 - Interactive and automatic 3-D object scanning method for the purpose of database creation - Google Patents
Interactive and automatic 3-D object scanning method for the purpose of database creation Download PDFInfo
- Publication number
- US9501725B2 US9501725B2 US14/302,056 US201414302056A US9501725B2 US 9501725 B2 US9501725 B2 US 9501725B2 US 201414302056 A US201414302056 A US 201414302056A US 9501725 B2 US9501725 B2 US 9501725B2
- Authority
- US
- United States
- Prior art keywords
- points
- interest
- key
- scene
- key frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000002452 interceptive effect Effects 0.000 title description 3
- 238000001514 detection method Methods 0.000 claims abstract description 34
- 238000001914 filtration Methods 0.000 claims description 32
- 230000015654 memory Effects 0.000 claims description 22
- 230000033001 locomotion Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 description 21
- 238000004891 communication Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 13
- 230000011218 segmentation Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 11
- 238000013138 pruning Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 6
- 230000003190 augmentative effect Effects 0.000 description 5
- 230000003936 working memory Effects 0.000 description 5
- 238000009434 installation Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000010223 real-time analysis Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000000844 transformation Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000001953 sensory effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241001310793 Podium Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 238000001454 recorded image Methods 0.000 description 1
- 230000002040 relaxant effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G06K9/78—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G06K9/6228—
-
- G06T7/0046—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Definitions
- the present disclosure relates generally to image recognition, and in particular, to the creation of object representation base information which may be used to assist in identifying objects.
- Devices such digital cameras, phones with embedded cameras, or other camera or sensor devices may be used to identify and track objects in three-dimensional environments. This may be used to create augmented reality displays where information on objects recognized by a system may be presented to a user that is observing a display of the system. Such information may be presented on an overlay of the real environment in a device's display. Information from a database of objects may then be used to identify objects in the environment observed by a device.
- Mobile devices in particular with embedded digital cameras may have limited storage and processing, particularly in comparison to powerful fixed installation server systems.
- One way of reducing the processing and bandwidth load of a system implementing such object detection/tracking is to store a local database of object information that may be used to identify objects in the environment.
- This database information may essentially be considered assistance information to help a device identify objects using templates that are stored in a database.
- images captured by the device are compared with object representations in a database to determine if there is an object match, and if so, what the current pose of the camera is compared to the identified object.
- a responsive action may be initiated or additional information related to the object may be presented in a device display in conjunction with the image containing the identified object.
- One embodiment of such an existing system uses combined geometric/texture models of the object of interest. These models are sometimes known at the object production stage (CAD models), but in most cases they are unavailable.
- Another known method is to use a laser-based or IR-based scanning system to simultaneously estimate the geometry and collect images of an object.
- scanning systems are typically expensive, and yet are texture challenged due to physical limitations of different sensors used. Thus, in general, the models are either unavailable or somewhat inaccurate to the point where they affect detection performance.
- Systems and methods for creating three-dimensional object representations for use in computer vision as described herein may provide improvements and simplification in the way object representations are currently obtained for use in detection and tracking systems.
- One embodiment may be a method of capturing compact representations of three-dimensional objects suitable for offline object detection comprising: capturing, using a camera module of a device, a plurality of images of a scene, wherein each of the plurality of images of the scene includes an image of at least a portion of an object; identifying a first image of the plurality of images as a first key frame and a first position of the device associated with the first image, wherein the first image is captured by the device from the first position; identifying a second image of the plurality of images as a second key frame and a second position of the device associated with the second image, wherein the second image is captured by the device from the second position, and wherein the second position is different from the first position; identifying a first plurality of points of interest from the first key frame, wherein the first plurality of points of interest identify features from the scene; identifying a second plurality of points of interest from the second key frame, wherein the second plurality of
- Additional embodiments may further operate where identifying key points associated with the object comprises: filtering the first plurality of points of interest and the second plurality of points of interest to identify points of interest associated with the object.
- Additional embodiments may further operate where filtering the first plurality of points of interest and the second plurality of points of interest comprises one or more of: deleting points of interest with a mean distance to a threshold number of the nearest points of interest that is less than a threshold distance; deleting the points of interest that are not matched with points of interest from other key frames; and deleting the key points outside of a defined volume of the scene.
- Additional embodiments may further operate where the scene further comprises a planar target or where matching the first plurality of points of interest and the second plurality of points of interest comprises: identifying the first position of the device from a first location of the planar target in the first image; identifying the second position of the device from a second location of the planar target in the second image; determining a relative position between the first position of the device and the second position of the device; matching the first plurality of points of interest and the second plurality of points of interest based on the relative position between the first position and the second position; and determining and recording a position of each key point in a coordinate system.
- each key point comprises key point location information and a key point descriptor, comprising information derived from the appearance of the pixel area around the key point of interest.
- the key point descriptor may comprise a gradient or other information associated with a key point and pixels surrounding the key point.
- Additional embodiments may further operate where identifying the first image as the first key frame comprises a user selection.
- Additional embodiments may further operate where identifying the first image as the first key frame comprises an automatic selection by the device.
- Additional embodiments may further operate where identifying the second image as the second key frame comprises: identifying a key point density within the second image; identifying a spatial relationship between the second position and the first position; determining that a key frame at the second position would provide data with a data value above a threshold value for use in the object representation; and selecting the second image as the second key frame.
- An alternative embodiment may be a system for capturing compact representations of three-dimensional objects suitable for offline object detection comprising: a camera module of a device that captures a plurality of images of a scene, wherein each of the plurality of images of the scene includes an image of at least a portion of an object; one or more processors that (1) identifies a first image of the plurality of images as a first key frame and a first position of the device associated with the first image, wherein the first image is captured by the device from the first position; (2) identifies a second image of the plurality of images as a second key frame and a second position of the device associated with the second image, wherein the second image is captured by the device from the second position, and wherein the second position is different from the first position; (3) identifies a first plurality of points of interest from the first key frame, wherein the first plurality of points of interest identify features from the scene; (4) identifies a second plurality of points of interest from the second key frame, wherein the second plurality of points
- Such an embodiment may further function where the device further comprises: a display coupled to the camera module, wherein the display outputs an image of at least a portion of the key points as the camera module of the device that captures at least a portion of the plurality of images of the scene.
- Such an embodiment may further function where the display further outputs a video image of the scene with the key points overlaid on the object, where the device further comprises a motion sensor, wherein the second position of the device is identified by the one or more processors using information from the motion sensor, or where the device further comprises: a user input module, wherein identifying the first image as the first key frame comprises a user selection received at the user input module of the device.
- Such an embodiment may further function where the device further comprises: an antenna; and a wireless transceiver; wherein the one or more processors are coupled to the device via a network, the antenna, and the wireless transceiver.
- Another embodiment may be a non-transitory computer-readable medium comprising instructions that, when executed by a processor coupled to the non-transitory computer-readable medium cause a device to: capture, using a camera module of the device, a plurality of images of a scene, wherein each of the plurality of images of the scene includes an image of at least a portion of an object; identify a first image of the plurality of images as a first key frame and a first position of the device associated with the first image, wherein the first image is captured by the device from the first position; identify a second image of the plurality of images as a second key frame and a second position of the device associated with the second image, wherein the second image is captured by the device from the second position, and wherein the second position is different from the first position; identify a first plurality of points of interest from the first key frame, wherein the first plurality of points of interest identify features from the scene; identify a second plurality of points of interest from the second key frame, wherein the second plurality of points of interest
- Examples of such an embodiment may further operate where the instructions, when executed by the processor, further cause the device to: filter the first plurality of points of interest and the second plurality of points of interest to identify points of interest associated with the object as part of identifying key points associated with the object.
- Examples of such an embodiment may further operate where the instructions, when executed by the processor, further cause the device to: delete points of interest with a mean distance to a threshold number of other points of interest that is less than a threshold distance and deleting the points of interest that are not matched with points of interest from other key frames as part of the filtering the first plurality of points of interest and the second plurality of points of interest to identify points of interest associated with the object.
- Examples of such an embodiment may further operate where the instructions, when executed by the processor, further cause the device to: delete the key points outside of a defined volume of the object as part of the filtering the first plurality of points of interest and the second plurality of points of interest to identify points of interest associated with the object.
- Examples of such an embodiment may further operate where each key point of the key points associated with the object as the object representation in the object detection database comprises coordinate information, brightness information, and surrounding pixel pattern information.
- FIG. 1 illustrates aspects of one embodiment including an object to be scanned into a database
- FIG. 2 illustrates aspects of a method of scanning an object to create an object representation for a database according to one embodiment
- FIG. 3A illustrates aspects of one potential embodiment including unfiltered points of interest from one device position
- FIG. 3B illustrates aspects of one embodiment including a histogram of points of interest
- FIG. 3C illustrates aspects of one potential embodiment including filtered points of interest
- FIG. 3D illustrates aspects of one embodiment including filtered points of interest
- FIG. 3E illustrates aspects of one embodiment including 3-D key points that make up an object representation for storage in a database
- FIG. 4 illustrates aspects of one embodiment related to triangulation
- FIG. 5 is one embodiment of a device for use with various embodiments described herein;
- FIG. 6 is one embodiment of a computing device for use with various embodiments described herein.
- FIG. 7 is one embodiment of a network system which may connect devices and databases in various embodiments described herein.
- Embodiments described herein relate to systems and methods for scanning objects to create object representation, where the object representation is created to optimize object recognition by a device.
- Embodiments described herein may create compact object representation which may be stored in a database and used later to match objects seen in an image captured at a device with previously scanned objects. This may be distinguished from other embodiments where a compact representation of an object is created and used to track the object, but is not stored for future object identification.
- compact representations may compress a large number of video or picture images into a relatively small number of key points with associated descriptive data. In one example, several megabytes of video data may be processed to achieve a compact object model with 1000 key points and descriptive information about those key points, such as gradient information of surrounding area viewed from different angles.
- An extractor of saliency key points may process such video data by first filtering out images from all the images in the video data by selecting a subset of images as key frames.
- the key frames may then be processed by selecting points of interest of high contrast or high curvature within the key frames.
- the points of interest may then further be ordered by repetition across key frames, their proximity to other points of interest, or other image-level or geometric point of interest values.
- Such processing which takes sequence of images on one end and produces a compact object consisting of saliency key points and their description, is done in a manner not known in the prior art.
- Certain embodiments may use aspects of SLAM (Simultaneous Location and Mapping) or PTAM (Parallel Tracking and Mapping) systems as means for separating images into key frames and establishing geometric relationship between the points of interest observed across images and key frames, and then may additionally provide point of interest segmentation and pruning so as to arrive at compact objects from sets of key points in a manner not known in the prior art.
- SLAM Simultaneous Location and Mapping
- PTAM Parallel Tracking and Mapping
- Such systems thus provide efficient creation of object representations suitable for creating databases of compact object information for arbitrary objects in a manner not previously known.
- object representations may be stored on a device that is not connected to a network, and may be used to recognize objects in images captured by the device.
- an object to be scanned may be placed on a table next to a known scene.
- the known scene may be given by a known planar object (planar target), a known three-dimensional object (3-D target) or a combination of the two.
- the target's position and orientation are known to a mobile device that is to scan the object. This is achieved by object detection and tracking of previously known object-target.
- the mobile device may be, for example, a phone with a camera, a processor, and available memory storage space.
- a mobile device may be a camera acquiring a video sequence that may be post-processed on a separate processing unit offline.
- a mobile device may also be a camera connected to a personal computer or an alternative processing device.
- a user may enter a command to begin a scanning process, at which point the mobile scanning camera may begin capturing images of a scene including the object.
- the device may analyze the images in real time, in a time period delayed by up to several seconds, or may simply store the images for later analysis. As a user moves the device to different positions around and above the object, images from different distances, angles, and elevations will be captured such that different views of the object are stored. In some embodiments, for example for a device implementing real time or near real time analysis, the device may provide directions or recommendations for movement of the device to capture images from preferred positions. The compact representation may then be accessed later to identify the object in an image or a video stream.
- a scanning device with camera may be mounted and fixed, while the object of interest may be rotated and moved in a fashion so as to reveal as much of its surface from various viewing angles as possible.
- This scanning device may be a phone, a video recorder, a digital camera, or any other such device that may include a camera and other modules according to the particular implementation.
- the object may again be accompanied by a known target in order to facilitate associating points of interest extracted to a known coordinate system.
- the entire system from camera holder to podium for object scanning may be perfectly calibrated so that the camera position with respect to the object is known at any moment.
- Certain images may then be selected as key image frames. Such key frames may simply be taken periodically, may be selected after analysis by a processor, or may be selected manually by a user. Once a plurality of key frames has been selected, points of interest within the key frames are identified, and an analysis may be done to identify a relative location of the camera at the time each key frame was captured.
- the device position analysis may use image data for the known target, data from a position module integrated as part of or coupled to the camera.
- the position module may be any suitable module such as an accelerometer or gyroscope, data from a calibrated image acquisition system (like a robotic arm with object holder and rotating table) or any combination of such means for tracking movement and position of the camera with respect to a fixed coordinate system as images are captured.
- the position of the camera during the capture of each key frame may then be used to match two dimensional points of interest from different key frames to create three dimensional (3-D) key points.
- key points or points of interest from the key frames may be filtered to remove key points or points of interest unlikely to be associated with the object being scanned. This leaves a compact set of key points that describe the object. These remaining key points describing the object may be stored as object representation in a database. Later, when an augmented reality or object identification application is executed by an object identification device, the images captured by a camera of the object identification device may be analyzed using the compact key point object representation in the database to identify particular objects present in camera view and their poses with respect to the camera of the object identification device.
- This object identification device may be the same scanning device that initially created the object representation, or may be a different device.
- the object representation is a collection of key points in a particular coordinate system which are associated for later use in identifying the object or other objects with a similar shape and size.
- the object representation may include not only coordinate locations for key points, but color information, or any other such information that may be useful for object identification.
- a database of multiple object representations, each of which contains key points for a previously scanned object, may then be accessed while a user is interacting with a scene in the detection mode in order to identify the object or similar objects using the object representations as stored in the database.
- FIG. 1 illustrates an aspect of one embodiment.
- FIG. 1 includes device 110 , object 120 , target 130 , and scene 100 .
- Device 110 is shown in a first position 116 and in a second position 118 .
- Scene 100 may be a specifically defined area or volume which device 110 has identified as a boundary for key points. Alternatively, scene 100 may simply be the limits of the area for which images are captured as device 110 moves to different positions capturing images as part of the scanning process for creating a compact representation of object 120 to store in a database.
- Device 110 may be any device capable of capturing an image with a coupled processor and storage for compact object representation. As described above, in one embodiment, device 110 may be a phone with an embedded camera. Device 110 may alternatively be a dedicated augmented reality device, a head mounted device with a camera module, a camera with a port for transferring data to a separate computing module, or any such device capable of capturing images of an object and identifying key data. Any of the above examples of device 110 may create image data, key frame data, key point data, compact object representation, or any combination thereof, which may be stored at a local or a remote database. In certain embodiments this data may then be transferred to another device for use in tracking objects, detecting objects, or both. In alternative embodiments, the local object representation may be used on the local device just after creation of the local object representation for tracking of the object.
- Device 110 includes at least one sensor for capturing image data. Examples of such sensors include monocular cameras, stereo cameras, and RGBD sensors. As shown in FIG. 1 , as part of a scanning process, device 110 will capture at least two images, from different positions, which may be used as key frames. FIG. 1 shows field of view 112 for a first image 122 which is captured when the device 110 is at the first position 116 . Also shown is field of view 114 for a second image 124 which is captured when the device 110 is at the second position 118 . In order to function as a key frame, each image must include at least a portion of object 120 . A remaining portion of object 120 may be occluded by another object, or may be outside the field of view for the particular position of the camera.
- the position of a device refers to the spatial location and orientation of the device including the spatial location and orientation of any sensors on the device and the relationship between the sensors on the device and the device. Position may also be referred to as pose, especially as directed to a handheld device being moved through various positions and orientations by a user.
- the position information thus captures the location and field of view information for a camera of a device with respect of a coordinate system in which the object is seen as static.
- Object 120 may be any object with object point of interest features able to be captured by a camera of device 110 .
- object 120 may be sufficiently large that only a portion of the object may be captured by a user close to object 120 .
- object 120 may be of any small size as long as the camera of device 110 has sufficient resolution and sensitivity to capture point of interest information for the object.
- An acceptable object may then be considered to be an object that has points of interest that may be identified from images. In processing of key frames, these points of interest may be identified as two dimensional aspects of 3-D key points. Key points may be identifying points which enable the efficient identification of an object. Points near areas of high contrast and high curvature may be one example of key points.
- key point refers to a point in a three-dimensional coordinate system that, in conjunction with other key points, may be used to identify an object.
- Single key frames may contain a two-dimensional projection of a plurality of points of interest that are associated with these key points. These two-dimensional aspects are referred to herein as “points of interest.”
- points of interest As these points of interest are identified in multiple key frames from different camera poses or different device positions, the three-dimensional position of each key point may be derived from the two-dimensional point of interest information and the device position information.
- a key frame will include two-dimensional information about a key point.
- the two-dimensional location of a point of interest within a key frame, in conjunction with associated points of interest from other key frames, enables identification of a point on the object in three dimensions as a 3-D key point.
- the two-dimensional appearance of a point on the object associated with a key point as a point of interest and its surroundings within a key frame may then be used to form a descriptor of this key point associated with the key frame.
- key points may have multiple possible positions in a 3-D coordinate system.
- Statistical averages or processing of points of interest from multiple key frames may be used to identify the 3-D key point location from the two-dimensional information of multiple key frames in conjunction with the position information of the device when each frame was captured. Examples of points of interest and key points may be seen in FIGS. 3A-3E , and will be described in more detail below.
- Target 130 is shown as an arrow, but may be any patterned or unpatterned shape which may be used to determine the orientation of device 110 based on image data. Orientation of the camera may be given by three angles of the camera optical axis with respect to a coordinate system, such as the world coordinate system or the target-centered coordinates. Device 110 's position provides another three values: x, y, z of the camera lens in the world coordinate system. Together, they form the camera six degrees of freedom.
- target 130 may be, for example, a piece of paper with edges and distinguishable from the surrounding portion of scene 100 .
- target 130 may be a known patterned surface on which the object 120 is placed.
- a volume target may be used, a planar target may be used, or no target may be used.
- target 130 may enable matching of points of interest from different images that is more efficient for object representation creation, as described in detail below. This matching may be more efficient for database creation than typical SLAM key point matching.
- Typical SLAM systems establish correspondences between key frames by calculating small transformations between consecutive images, and following the transformations across multiple images between key frames. This process is processor-intensive, latency-sensitive, and suited to a real time analysis of an environment where this information may have other uses. In an environment where processing power is limited and the goal is creation of compact object representations for a database, this process is inefficient for establishing correspondences between key frames. Further, in certain embodiments, this information may not be available to track the transformations of points of interest across images between key frames.
- an automated object segmentation algorithm may be used to distinguish objects in various key frames.
- a user input may identify an object, and the object as identified by the user in one or more frames may then be tracked in other frames based on the user input identifying the volume in 3-D where the object resides.
- any combination of different object identification methods may be used.
- FIG. 2 describes one method that may be used in conjunction with various embodiments.
- a camera module of a device such as device 110 is used to capture a plurality of images of a scene, where each of the plurality of images includes at least a portion of a first object.
- a user may move the device around the object being scanned in order to capture information from as many positions as possible to provide more information for the creation of a compact object representation.
- a pre-programmed robotic arm may move the camera to enable capture of multiple different images of the scene including the object being scanned.
- the device may interactively provide feedback to a user regarding the quality of the images scanned and how useful the images are in creating an object representation for a database from the plurality of images.
- a display of the device may show scene 100 with object 120 and target 130 .
- the display may also include text and image indications related to the number and quality of key points or points of interest identified during scanning of object 120 .
- S 204 a may be repeated periodically after identification of key frames or key points to update the feedback provided to the user.
- extracted key points or points of interest may be visualized directly on the object and/or the rest of the scene depending on whether segmentation is implemented prior to the display step or after the display step.
- only extracted key points which have been observed as points of interest in a threshold number of frames may be displayed, with the threshold set as a rough indication of the number of reliable points of interest observed as part of the scanning process for an object being scanned.
- a system may identify criteria for automatic selection of key frames and/or for automatic selection of points of interest from key frames. Additional details regarding such selection are described in detail herein, but may include criteria such as the angle and location of a nearest selected key frame, an image quality, a density of points of interest on the object to be scanned, a similarity of appearances of the points of interest, or other similar such criteria.
- criteria such as the angle and location of a nearest selected key frame, an image quality, a density of points of interest on the object to be scanned, a similarity of appearances of the points of interest, or other similar such criteria.
- automatic key frame selection criteria may be altered by a user.
- the key frame selection may be done entirely as per request from an underlying SLAM system, with the automatic selection of key frames part of the structure of the SLAM system.
- Such automatic key frame selection criteria may be disabled in favor of manual selection by a user.
- a user may explicitly select specific frames to be identified as key frames. This selection may occur either on-line, in which case the user selects key frames by live interaction with the scanning system, or off-line, where the user has the ability to override the selection key frames determined by the system.
- a system may provide automated feedback for when a sufficient diversity of key frames has been achieved to create an adequate object representation.
- the feedback may be provided by simply displaying the key points on the object which have thus been selected for object representation, and/or by displaying the selected key frame count and location coupled with orientation. By inspecting the density of selected points of interest and/or the key frames, the user may then infer the likely quality of such representation and decide when sufficient information has been captured.
- the feedback may be provided in a more explicit manner by interactively displaying a measure of the representation quality. This measure may be based on a real time analysis or on a user-selected setting to verify a sufficient diversity of views of the object.
- the system may check for occluded sides of the object, or partial capture of certain elements of an object. It may also check noise levels of the key frames to ensure that excessive motion or blur has not corrupted the key frame information.
- the capturing system builds the object representation on the fly, and uses such representation to attempt detecting the object in real time.
- the successful detection instances may be visualized by displaying near the real object an augmentation, the size and position of which depends on the computed camera pose at the time of detection.
- a user may then determine from the visual feedback when sufficient information has been captured because the user may observe that the augmentation is stable from various views. Note that the quality of object representation may not be uniform from all views, and this can also be efficiently captured by the interactive system as described herein.
- the selection criteria may be used to identify a first image of the plurality of images as a first key frame, and in S 208 , a second image of the plurality of images captured from a different location may be identified as a second key frame.
- the position of the device when the frame was recorded may be known. Any number of methods may be used to determine the device position. Accelerometers, or any number of various displacement measurement methods may be used to determine a current position of the device for each key frame.
- the camera may be placed on a gripping device which has been perfectly calibrated with respect to the object coordinate system in a way such that the camera location information at each key frame is automatically known.
- the camera location information may also be inferred by tracking the Target 130 at any given time, or may be determined by the underlying SLAM system in certain embodiments, or by any device tracking methods or systems such as parallel tracking and mapping (PTAM) systems. Any combination of the mentioned camera localization systems is also possible.
- This position information will include not only x, y, and z position information, but also angle information about the direction the lens of the camera is facing and the field of view of the camera, or other such information. This position information may also be referred to as the camera pose.
- points of interest from each key frame are identified. Such points of interest identify point of interest features of each frame, such as areas of high contrast.
- the points of interest from each image are matched. Because the points of interest from each image are taken from a different position, this enables three-dimensional information to be associated with each point. The greater number of key frames used, the greater the amount of three-dimensional information created in the matching of points of interest.
- correspondences between two-dimensional points of interest from particular frames are established. This correspondence enables determination of a three-dimensional coordinate for the key point based on the plurality of two-dimensional identifications of points of interest associated with the key points from different key frames.
- a three-dimensional key point may include information from the two-dimensional point of interest aspects associated with a particular key point identified in as few as two key frames, or in many hundreds or thousands of key frames.
- the information may be filtered or averaged using a variety of different means in order to increase the accuracy of a single key point location for use in the final object representation to be stored in a database for later use.
- key points associated with the object are identified. This step may include various components, including object segmentation, filtering outliers based on proximity to the nearest neighbors' key points, filtering by the number of observations, or other such filters. In certain embodiments, this may be done by separating information from a known target or known background in the scene to identify key points associated with the object being scanned. In other embodiments, other information may be used to segment the object representation from the background scene representation. Specific embodiments of such segmentation are described further below.
- FIGS. 3A, 3B, 3C, 3D, and 3E then describe further details of points of interest identified from 2D images that may be used to derive three-dimensional (3-D) key points which make up compact object representations for a database along with additional descriptive information.
- 3-D three-dimensional
- FIGS. 3A and 3C because the points of interest are viewed looking down from the top, the points of interest are points around the sides of the mug. Point of interest 306 a is also shown. Because point of interest 306 a is relatively isolated, it is likely that point of interest 306 a is not part of the object being scanned, and is unlikely to provide valuable information at a later point in time for image recognition when the compact object representations including points of interest 301 and 302 are retrieved from a database and used for object identification.
- each point of interest indicated in FIGS. 3A, 3C, and 3D may have associated information about the brightness, color, or pattern of pixels surrounding the point of interest.
- the associated brightness, color, or pattern of pixels may be incorporated into the compact object representation in a way which may be useful for later object detection. It is the combination of 3-D key point descriptions and their relative geometric location that creates a unique signature for each object suitable for detection.
- the resulting key points which make up an object representation stored in a database need to be invariant to a number of geometric transformations resulting from the changing position/orientation of a camera during query time, yet discriminative enough to avoid generating many false matches to features from different objects.
- a sufficient amount of detail may be derived for the key points which make up the object representation 310 shown in FIG. 3E .
- FIGS. 3A, 3C, and 3D show points of interest from key frames taken from one position at a given angle.
- FIG. 3A shows a top view points of interest prior to filtering.
- FIG. 3C shows a top view points of interest post filtering.
- FIG. 3D shows side view points of interest post filtering.
- the points of interest from each view are combined to create three dimensional key points 305 which make up object representation 310 of FIG. 3E .
- points of interest 301 and points of interest 304 will be combined to create three dimensional key points 305 which are derived from the other key points. While points of interest from two key frames with different positions are shown in FIGS.
- any number of key frames from different views may contribute points of interest which are used to derive the key points that make up the final object representation. Further, it will be apparent that keyframes from each position may contribute to only a portion of the total number of three dimensional key points 305 . This may be because a certain surface on an object is occluded from one view, or may be filtered or noisy in certain key frames from which the key points are derived.
- a single image taken from a single position such as image 122 taken from position 116 in FIG. 1 is essentially a two-dimensional projection from a scene captured by the image.
- Points of interest identified from such an image are associated with a detail descriptor describing the area around those points of interest in the two-dimensional projection captured by the image.
- a single point of interest may be associated with numerous planar descriptors, as points of interest associated with a single 3-D key point are typically visible from multiple key frames. Though these planar descriptors will, in general, look different even for very close yet different viewing angles, in practice, the descriptors corresponding to close viewing angles are relatively similar, and may be collapsed into a single descriptor that may be associated with a 3-D key point. Thus, regardless of how many key frames contain points of interest associated with a single 3-D key point, this 3-D key point will be associated with at most a handful of entries in the compact object representation.
- a second image captured from a different angle will similarly capture information that is a two-dimensional projection of a three-dimensional object.
- the two images together include three-dimensional information about a single point collected from multiple two-dimensional projections, like shown in FIG. 4 .
- Correlating points of interest from one key frame with points of interest from another key frame thus identify three-dimensional information which may be used to derive key points when key frames are taken from different angles.
- Merged points of interest thus not only identify the three-dimensional location of the key point in a standardized set of coordinates, but also may be associated with three-dimensional descriptive data about the volume surrounding the key point.
- a system may establish correspondences between sets of two-dimensional points of interest across key frames in order to identify three-dimensional location of the key points of interest along with three-dimensional descriptive data. While certain types of filtering, such as boundary filtering, may be performed on the sets of two-dimensional points of interest from a single key frame, segmentation to identify an object may then be done on the correlated key points and not on sets of two-dimensional points of interest. In embodiments which function with this filtering, this eliminates repetitive segmentation/filtering on what may be large numbers of two-dimensional points of interest from key frames. This also enables use of all information about a 3-D key point location in space and the key point's relation to other key points, rather than only using two-dimensional information. Filtering on a single three-dimensional merged set of key points for an object representation may provide the same filtering as filtering on many sets of two-dimensional data.
- filtering such as boundary filtering
- a two minute scan of an object at a standard frame rate in moderate background clutter may produce approximately 15000 distinct points of interest of interest, out of which only approximately 1000-1500 key points may be derived which belong to the object, and further only 750-1000 key points may be suitable for object detection.
- FIG. 3A shows points of interest of a coffee mug which survived a first stage of segmentation—that by the three-dimensional location. Namely, in practical systems it is beneficial to define a bounding box of three-dimensional coordinates of the object with respect to a known target. At the first stage of object point of interest segmentation and filtering, all the collected points of interest that do not reside within this bounding box may be discarded.
- an initial number of around 15000 distinct points of interest may be reduced to about 2000 key points during this step, such that an object representation such as object representation 310 may only use a fraction of the total points of interest that were in the key frames from which the object representation was derived.
- coordinates for a scene with points of interest represented by FIGS. 3A and 3C may be tied to the middle of the target.
- a bounding volume may then be identified for key points belonging to the object. Some portion of the approximately 15000 points of interest may be identified as outside the bounding box, and may be filtered out and eliminated.
- a system may assume a certain density for points of interest belonging to an object. Segmentation to identify the object may be performed by filtering based on a threshold distance to a given number of nearest neighbors.
- FIG. 3B shows a histogram of estimated point of interest distances in three dimensions for an object such as in FIG. 3A .
- a filtering threshold 308 may be used to identify which pixels to filter. Because the point of interest 302 is in a dense area, it will be grouped with pixels to the left of the filtering threshold 308 in FIG. 3B . Point of interest 306 a , however, is clearly not in a dense area of points of interest, and will be to the right of filtering threshold 308 in FIG. 3B . Thus, in FIG. 3C , filtered point of interest 306 b is not shown as it would be deleted by the filtering process, when pixels to the right of filtering threshold 308 are deleted from the compact object representation.
- a system performing segmentation may identify a dominant plane in a scene.
- a reference to the dominant plane may be used to define the scene and further assist in creating correspondence between points of interest from different images.
- object target
- the coordinate system of reference as well as the bounding box may be manually given by a user either at the time of the scan or during offline processing.
- particular methods may be used to identify points of interest.
- high density high gradient areas are identified, with thresholds used to determine which points are selected based on the gradient of surrounding pixels.
- images are processed at various scales to detect preferred points of interest in a key frame which are observable at a particular scale. Selection of key points and/or points of interest as well as their description may be performed in a variety of ways using such transforms, including analysis of feature orientations with offsets (the scale at which surrounding intensity differences or curvature are most pronounced), analysis of surrounding pixels with principal component analysis, and use of steerable filters with Gaussian derivative filters. Additionally, differential invariants may be identified for given key points with selection based on the values invariant to rotation.
- shape context descriptors may be used to represent an area of interest.
- any combination of such selection criteria, along with any other selection criteria that may optimize the creation of compact object representation suitable for assisting with offline object detection, may be used to identify points of interest or key points.
- FIG. 4 then provides details for one embodiment whereby correspondences may be established for key frames.
- images 112 and 114 of FIG. 1 may be key frames that have correspondences established.
- FIG. 4 shows images 412 and 414 of object 420 taken by device 410 from two different positions, with epipolar plane 423 .
- Image 412 is taken from first position 416 and image 414 is taken from second position 418 .
- Object 420 is shown as having a point of interest X.
- point of interest X is imaged as point of interest x 1 .
- point of interest X is imaged as point of interest x 2 .
- the epipolar line l 1 corresponding to x 1 can be identified in image 414 .
- Point of interest x 2 may be extracted in image 414 along with descriptive information for surrounding pixels if (A) the descriptions of the surrounding pixels are sufficiently close between the two point of interest observations (e.g. the distance in the descriptor domain is below a threshold), and (B) x 2 is below a threshold distance to the epipolar line l 2 .
- the threshold distance in the descriptor domain and the threshold distance from the epipolar line corresponding to x(1) may be selectable parameters within a system. These may be set automatically, or may be selected by a user with a user interface.
- One threshold value for a maximum epipolar line distance may be two pixels, three pixels, or four pixels. Values may be used other than these threshold epipolar line distance values in other embodiments.
- Example descriptor distance threshold values may be set as a fixed difference between descriptive information, or may be set as a fraction of a normalized descriptor value. For example, if a 128-element long descriptor is normalized to a value of 1, squared distances considered which would indicate a same point of interest is observed may be a portion of that normalized range, such as between 0.2 and 0.35 of the normalized value. In other words, this is checking that the area surrounding a point of interest is consistently identified as associated with other points of interest when multiple key frames are merged.
- the two thresholds together are essentially a check to make sure that the two points of interest are actually capable of being corresponding points of interest given the position and information associated with the points of interest. For both of these thresholds, relaxing the parameter leads to a higher number of correspondences, and thus potentially higher number of points of interest successfully extracted. In other words, as two-dimensional points of interest are correlated with other two-dimensional points of interest to create three-dimensional key points, more three-dimensional key points are identified as the thresholds are relaxed, at the price of a higher number of errors. These errors may be in the form of incorrect or fictitious points of interest or key points which include partially or completely incorrect data. Many of the points of interest floating outside the object shown in FIG. 3A , such as point of interest 306 a , are presumably identified and triangulated using erroneous correspondences. Later, filtering and segmentation may identify and remove a portion of these fictitious points.
- Similar calculations may be used for triangulation and bundle adjustment in identifying the location of the points of interest X in a more robust fashion.
- Bundle adjustment may refer to assessing and adjusting the matching of points of interest from three or more different key frames at one time.
- the first projection of the point of interest X at first position 416 is shown as X 0,1 and the second projection of the point of interest X at the second position 418 is shown as X 0,2 .
- the image in FIG. 4 is a target such as target 130
- a system will be able to associate points of interest from different images using previously provided information about the position and orientation of the target. Given the information about these correspondences, the location of the focal points can be triangulated.
- Such an estimate may be noisy since sub-pixel errors in the position of the points of interest x may result in a large error in the calculated position of the point of interest X These errors may be reduced by multiple observations at the same point. Moreover, by minimizing the re-projection error of an estimated location, bundle adjustment may at the same time correct the initial information about the camera poses in key frames, for example, frames 412 and 414 in FIG. 4 .
- Bundle adjustments may further be used when more than two correspondences and positions are used from more than two key frames, resulting in a much greater confidence in the resulting locations for all associated points of interest X, Y, and Z due to the averaged information. Further, when such a bundle adjustment is performed using both tracking and detection of points of interest, accuracy is further improved.
- additional sensors in a device may be used to further improve the accuracy of the relative positioning of the device when capturing key frames.
- Accelerometers, gyroscopes, and various other positioning systems that measure location and movement of a device may be used either to replace or to supplement the above described position measurements. This may provide increased accuracy or reduced processor usage in certain embodiments.
- these positioning systems may be used to determine the position or pose of a device when the device captures a particular key frame. This information may be used to create information about the key points which are derived from two dimensional points of interest in particular key frames.
- the descriptive information about pixels surrounding points of interest may be selected as having a large influence on the matching decision, due to the straightforward process of matching actual pixels between images. For many objects, however, multiple observations of the same point of interest tend to vary gradually but persistently over multiple views. This may be due to changes in surface reflectance, varying self-occlusions on the object, or simply the nature of projective views. Because of this it is possible to end up with several clusters of key point groupings where the descriptors for each grouping satisfy certain common filter requirements, but not others. For example, descriptors within each cluster may satisfy maximum distance requirements, but may not satisfy maximum distance requirements across clusters. This can lead to several three-dimensional key point matches being estimated where only one exists.
- observations in the intersection between descriptor clusters may be made. These provide a link between points of interest and additional information for decision making on whether to merge multiple points of interest or correspondence sets of points. It further provides added robustness to any bundle adjustment, and can serve as a precursor to pruning extra or unnecessary data. Bundle adjustment process may benefit from merging correspondences across clusters of views, as the same point of interest X is then estimated based on more data, instead of estimating two distinct points X and X′, out of which one is fictitious. This may also be combined with other pruning or data filtering techniques to optimize data to be stored as compact object representation in a database, where merged points of interest from multiple frames may be stored as a single key point with associated descriptive data on the object around the key point.
- each point of interest X visible from at least two viewpoints is now represented by its three-dimensional location and multiple descriptors each one typically corresponding to one distinct key frame where the point was observed during the scan.
- This step is typically followed by segmentation and filtering techniques aimed at removing the key points which are not associated with the object of interest.
- This methods typically only rely on three-dimensional locations (x 1 , x 2 , x 3 ) of all captured key points, and as such may be combined with steps for bundle adjustment, as three-dimensional locations are already known at this step.
- the final step in the process for forming a compact object representation following a scan is called feature pruning. Namely, the surviving points of interest X are now associated with at least two, and typically several descriptors.
- Feature Location of a point of interest together with the attached description is sometimes referred to as “feature”.
- feature Location of a point of interest together with the attached description is sometimes referred to as “feature”.
- pruning is the combined effect of removing certain descriptors and combining multiple surviving descriptors to form a reduced number of new “pruned” descriptors.
- This pruning may be performed by a module implementing a series of pruning steps. Such steps may filter points of interest based on repeatability for the number of different images and viewpoints for which a point of interest is observed as a feature detected as a key point. This may also filter based on discriminativity, such that a fraction of similar features that correspond to the same key point in multiple views is selected. The key points for the remaining fraction of similar features are removed to reduce redundancy in compact object representation.
- an analysis may be performed that associates a value with key points in order to optimize the size of an object representation.
- a value threshold may be established, such that key points that are redundant or otherwise less valuable are removed, while unique and highly visible key points may be saved with a score above a data value threshold.
- different pruning steps may be used depending on the processing resources available to the device and other choices selected by a user.
- additional parameters that may control the level of key point pruning include: a radius of an epsilon ball in a multi-dimensional descriptor space to determine if pixels around key points are sufficiently similar; a radius of an epsilon ball in a three-dimensional space to determine that distinct key points are bundled very closely together in Euclidean space; a repeatability threshold based on number of views of a particular key point; and a discriminativity threshold based on feature changes identified for a single key point in multiple views.
- One embodiment may thus involve capturing, using a camera module of a mobile computing device, a plurality of images of a scene.
- Each of the plurality of images of the scene includes an image of at least a portion of a first object.
- a camera position or “camera pose,” consisting of six degrees of freedom and a position in three dimensions with respect to a world coordinate system in which the object of interest is unmoving is presumed known for each one of the captured images.
- the camera pose may be obtained in various ways: either by carefully calibrating a fixed setup (like with a robotic arm), or by detecting and tracking the projective appearance of a known object “target” present in the same scene with the object being scanned.
- a first image of the plurality of images may then be identified as a first key frame, where the first image is captured by the mobile computing device from a first position.
- a second image of the plurality of images may be selected as a second key frame. The second image is captured by the mobile computing device from a second position that is different from the first position.
- a first plurality of points of interest may be identified from the first key frame, where the first plurality of points of interest identifies features from the scene.
- a second plurality of points of interest may be identified from the second key frame.
- a system may then match the first plurality of points of interest and the second plurality of points of interest, and identify key points associated with the object.
- the key points associated with the object may next be associated with at least one description of the area surrounding each key point, and together they may be stored as compact object representations in an object detection database.
- a device may operate a SLAM system.
- a SLAM system is a standard system for using imaging data to build up a map within an unknown environment (without a priori knowledge), or to update a map within a known environment (with a priori knowledge from a given map), while at the same time keeping track of their current location from imaging data.
- the map data from a standard SLAM system is used to build a map of the object using key points created as described above.
- the SLAM system selects key frames from images as described above, as standard operation of the SLAM system includes the creation of key frames as part of SLAM operation.
- Scene mapping and device position tracking may be used as a tool for extracting salient features and structural properties of the object as described above.
- the overall system may provide the key frames from the SLAM system to a separate extractor and descriptor system. This extractor and descriptor system may then be run on key frames to extract object appearance information.
- Separate SLAM and extractor/descriptor systems may provide benefits in certain embodiments as a simpler and cheaper system for tracking, map building, and localization.
- the overall system may be more complex, but may also provide more efficient discrimination and invariant point of interest detection.
- the descriptor system may then establish the key point correspondence across key frames, and perform any remaining steps.
- Such an embodiment may use SLAM to select and store key frames using a number of criteria, including camera position stability, a number of sufficiently “different” features extracted, and other such metrics. SLAM key frames may thus be used unmodified for detection feature extraction.
- Other embodiments may enable custom key frame selection targeted to automatically create key frames more in tune with database creation. Either of these embodiments enables automated key frame selection which may be hidden from the user as an object is scanned.
- a SLAM system is implemented in a multithreaded fashion, with key point feature extraction running in a background process.
- Descriptors which are extracted with points of interest may correspond to projective object view within particular key frames.
- traditional multi-view epipolar geometry techniques may be used by certain embodiments as described above.
- the points of interest may be filtered before matching points of interest between images to create key points, after such matching, or both before and after.
- Further embodiments may use detection as well as SLAM features and their correspondences across multiple key frames to robustly estimate three-dimensional key point location. Further embodiments may also post-process key point data to prune extracted multi-view detection features and create compact object representations for an object detection database.
- a user interface may provide a different key frame selection criterion that can be targeted to optimize compact object representations for database creation.
- a display may present extracted and triangulated key points in near real time to visualize the scanning process. In certain embodiments, this may enable a user to alter parameters on the fly to adjust key point creation as key frames are selected.
- FIG. 5 now describes one implementation of a device 500 according to certain embodiments.
- FIGS. 1 and 5 illustrate a system which, in one embodiment, may include a device 110 or 410 which is used to scan an object.
- Device 500 may be one embodiment of device 110 or device 410 and may perform all of the elements of a method for creating compact object representations for an object detection database.
- specialized modules may be used to implement object scanning including object identification module 521 and scanning and database input module 522 .
- Database 524 may be a specialized compact object representation base or may be part of a larger database system.
- Object identification module 521 may be a module which implements SLAM as described herein, or may be a customized module for identifying key frames.
- object identification module 521 and database input module 522 may be implemented as a single module.
- a control module or a control input for object identification module 521 and/or scanning and database input module 522 may enable manual selection of various scanning aspects. For example, a user may elect to have automatic prompts presented at display output 503 when key frames are sparse at certain angles to determine when more key frames from different angles are needed. Such a system may also enable prompts and directions to specific angles where high value key point data is expected. In certain embodiments, such a system may essentially track a key point density and/or a point of interest density around certain portions of an object. For a given image, the system may determine a spatial relationship between the location from which the image is taken and the location that the nearest key frame was taken from, and use this information along with point of interest information for these locations to determine the value of an additional key frame from the new location.
- the system may thus inform a user when additional key frames would provide high data value from certain angles.
- Such a control may also enable a user to customize selection of key frames, or to update selection of key frames for an in-progress scan. In certain embodiments, this may also enable a user to view recorded images and to manually select specific images as key frames. Further still, thresholds for key point pruning and filtering may be set by user selection.
- mobile device 500 includes processor 510 configured to execute instructions for performing operations at a number of components and can be, for example, a general-purpose processor or microprocessor suitable for implementation within a portable electronic device. Processor 510 may thus implement any or all of the specific steps for compact object representation creation described herein. Processor 510 is communicatively coupled with a plurality of components within mobile device 500 . To realize this communicative coupling, processor 510 may communicate with the other illustrated components across a bus 540 .
- Bus 540 can be any subsystem adapted to transfer data within mobile device 500 .
- Bus 540 can be a plurality of computer buses and include additional circuitry to transfer data.
- Memory 520 may be coupled to processor 510 .
- memory 520 offers both short-term and long-term storage and may in fact be divided into several units.
- Memory 520 may be volatile, such as static random access memory (SRAM) and/or dynamic random access memory (DRAM) and/or non-volatile, such as read-only memory (ROM), flash memory, and the like.
- SRAM static random access memory
- DRAM dynamic random access memory
- ROM read-only memory
- flash memory flash memory
- memory 520 can include removable storage devices, such as secure digital (SD) cards.
- SD secure digital
- memory 520 provides storage of computer-readable instructions, data structures, program modules, and other data for mobile device 500 .
- memory 520 may be distributed into different hardware modules.
- memory 520 stores a plurality of application modules.
- Application modules contain particular instructions to be executed by processor 510 .
- other hardware modules may additionally execute certain applications or parts of applications.
- Memory 520 may be used to store computer-readable instructions for modules that implement scanning according to certain embodiments, and may also store compact object representations as part of a database.
- memory 520 includes an operating system 523 .
- Operating system 523 may be operable to initiate the execution of the instructions provided by application modules and/or manage other hardware modules as well as interfaces with communication modules which may use WAN wireless transceiver 512 and LAN wireless transceiver 542 .
- Operating system 523 may be adapted to perform other operations across the components of mobile device 500 including threading, resource management, data storage control and other similar functionality.
- mobile device 500 includes a plurality of other hardware modules.
- Each of other hardware modules is a physical module within mobile device 500 .
- each of the hardware modules is permanently configured as a structure, a respective one of hardware modules may be temporarily configured to perform specific functions or temporarily activated.
- a common example is an application module that may program a camera 501 (i.e., hardware module) for shutter release and image capture. Such a camera module may be used to capture images such as images 122 and 124 of FIG. 1 and images 412 and 414 of FIG. 4 .
- Other hardware modules can be, for example, an accelerometer, a Wi-Fi transceiver, a satellite navigation system receiver (e.g., a GPS module), a pressure module, a temperature module, an audio output and/or input module (e.g., a microphone), a camera module, a proximity sensor, an alternate line service (ALS) module, a capacitive touch sensor, a near field communication (NFC) module, a Bluetooth® 1 transceiver, a cellular transceiver, a magnetometer, a gyroscope, an inertial sensor (e.g., a module the combines an accelerometer and a gyroscope), an ambient light sensor, a relative humidity sensor, or any other similar module operable to provide sensory output and/or receive sensory input.
- a satellite navigation system receiver e.g., a GPS module
- a pressure module e.g., a temperature module
- an audio output and/or input module e.g., a microphone
- one or more functions of the hardware modules may be implemented in software. Further, as described herein, certain hardware modules such as the accelerometer, the GPS module, the gyroscope, the inertial sensor, or other such modules may be used to estimate relative locations between key frames. This information may be used to improve data quality in conjunction with image based techniques described above, or may replace such methods in order to conserve processor resources. In certain embodiments, a user may use a user input module 504 to select such options.
- Mobile device 500 may include a component such as wireless communication module which may integrate an antenna 514 and wireless transceiver 512 with any other hardware, firmware, or software necessary for wireless communications.
- a wireless communication module may be configured to receive signals from various devices such as data sources via networks and access points such as a network access point.
- compact object representations may be communicated to server computers, other mobile devices, or other networked computing devices to be stored in a remote database and used by multiple other devices when the devices execute object recognition functionality.
- mobile device 500 may have a display output 503 and a user input module 504 .
- Display output 503 graphically presents information from mobile device 500 to the user. This information may be derived from one or more application modules, one or more hardware modules, a combination thereof, or any other suitable means for resolving graphical content for the user (e.g., by operating system 523 ).
- Display output 503 can be liquid crystal display (LCD) technology, light-emitting polymer display (LPD) technology, or some other display technology.
- display module 503 is a capacitive or resistive touch screen and may be sensitive to haptic and/or tactile contact with a user.
- the display output 503 can comprise a multi-touch-sensitive display. Display output 503 may then be used to display any number of outputs associated with an object identification module 521 , such as an augmented reality output using object recognition in conjunction with compact object representations from database 524 . Interface selections may also be displayed to select scanning and storage options. Key points may also be displayed along with an image of the object in real time as an object is scanned.
- an object identification module 521 such as an augmented reality output using object recognition in conjunction with compact object representations from database 524 .
- Interface selections may also be displayed to select scanning and storage options. Key points may also be displayed along with an image of the object in real time as an object is scanned.
- FIG. 6 provides a schematic illustration of one embodiment of a computing device 600 that may be used with various other embodiments such as the embodiments described by FIGS. 1-5 as described herein.
- FIG. 6 is meant only to provide a generalized illustration of various components, any or all of which may be utilized as appropriate. In certain embodiments, for example, components of FIG. 6 and FIG. 5 may be included in a single device, or in multiple distributed devices which may comprise one particular embodiment.
- FIG. 6 therefore, broadly illustrates how individual system elements may be implemented in a relatively separated or relatively more integrated manner, and describes elements that may implement specific methods according to embodiments when, for example, controlled by computer-readable instructions from a non-transitory computer-readable storage device, such as storage devices 625 .
- the computing device 600 is shown comprising hardware elements that can be electrically coupled via a bus 605 (or may otherwise be in communication, as appropriate).
- the hardware elements may include: one or more processors 610 , including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration processors, and/or the like); one or more input devices 615 , which can include, without limitation, a mouse, a keyboard and/or the like; and one or more output devices 620 , which can include, without limitation, a display device, a printer and/or the like.
- processors 610 including, without limitation, one or more general-purpose processors and/or one or more special-purpose processors (such as digital signal processing chips, graphics acceleration processors, and/or the like)
- input devices 615 which can include, without limitation, a mouse, a keyboard and/or the like
- output devices 620 which can include, without limitation, a display device, a printer and/or the like.
- the computing device 600 may further include (and/or be in communication with) one or more non-transitory storage devices 625 , which can comprise, without limitation, local and/or network accessible storage, and/or can include, without limitation, a disk drive, a drive array, an optical storage device, a solid-state storage device such as a random access memory (“RAM”) and/or a read-only memory (“ROM”), which can be programmable, flash-updateable and/or the like.
- RAM random access memory
- ROM read-only memory
- Such storage devices may be configured to implement any appropriate data stores, including, without limitation, various file systems, database structures, and/or the like.
- the computing device 600 might also include a communications subsystem 630 , which can include without limitation a modem, a network card (wireless or wired), an infrared communication device, a wireless communication device and/or chipset (such as a Bluetooth device, a 702.11 device, a Wi-Fi device, a WiMax device, cellular communication facilities, etc.), and/or similar communication interfaces.
- the communications subsystem 630 may permit data to be exchanged with a network (such as the network described below, to name one example), other computer systems, and/or any other devices described herein.
- a mobile device such as mobile device 500 may thus include other communication subsystems in addition to those including wireless transceiver 512 and LAN wireless transceiver 542 .
- the computing device 600 will further comprise a non-transitory working memory 635 , which can include a RAM or ROM device, as described above.
- the computing device 600 also can comprise software elements, shown as being currently located within the working memory 635 , including an operating system 640 , device drivers, executable libraries, and/or other code, such as one or more applications 645 , which may comprise computer programs provided by various embodiments, and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein.
- one or more procedures described with respect to the method(s) discussed above might be implemented as code and/or instructions executable by a computer (and/or a processor within a computer); in an aspect, then, such code and/or instructions can be used to configure and/or adapt a general-purpose computer (or other device) to perform one or more operations in accordance with the described methods for scanning an object to identify key frames, points of interest, key points, to create an object representation, to store that object representation in a database, and to retrieve the object representation for object identification in a later scan of an unknown or partially unknown scene.
- a set of these instructions and/or code might be stored on a computer-readable storage medium, such as the storage device(s) 625 described above.
- the storage medium might be incorporated within a computer system, such as computing device 600 .
- the storage medium might be separate from a computer system (e.g., a removable medium, such as a compact disc), and/or provided in an installation package, such that the storage medium can be used to program, configure and/or adapt a general purpose computer with the instructions/code stored thereon.
- These instructions might take the form of executable code, which is executable by the computing device 600 and/or might take the form of source and/or installable code, which, upon compilation and/or installation on the computing device 600 (e.g., using any of a variety of generally available compilers, installation programs, compression/decompression utilities, etc.) then takes the form of executable code.
- Object identification module 521 and scanning and database input module 522 may thus be executable code as described herein. In alternative embodiments, these modules may be hardware, firmware, executable instructions, or any combination of these implementations.
- An activity selection subsystem configured to provide some or all of the features described herein relating to the selection of acceptable characteristics for an output of three-dimensional key points created from multiple two-dimensional points of interest derived from single key frames, and such subsystems comprise hardware and/or software that is specialized (e.g., an application-specific integrated circuit (ASIC), a software method, etc.) or generic (e.g., processor(s) 610 , applications 645 which may, for example, implement any module within memory 635 , etc.) Further, connection to other computing devices such as network input/output devices may be employed.
- ASIC application-specific integrated circuit
- applications 645 which may, for example, implement any module within memory 635 , etc.
- machine-readable medium and “computer-readable medium,” as used herein, refer to any medium that participates in providing data that causes a machine to operate in a specific fashion.
- various computer-readable media might be involved in providing instructions/code to processor(s) 610 for execution and/or might be used to store and/or carry such instructions/code (e.g., as signals).
- a computer-readable medium is a physical and/or tangible storage medium.
- Such a medium may take many forms, including, but not limited to, non-volatile media, non-transitory media, volatile media, and transmission media.
- Non-volatile media include, for example, optical and/or magnetic disks, such as the storage device(s) 625 .
- Volatile media include, without limitation, dynamic memory, such as the working memory 635 .
- Transmission media include, without limitation, coaxial cables, copper wire and fiber optics, including the wires that comprise the bus 605 , as well as the various components of the communications subsystem 630 (and/or the media by which the communications subsystem 630 provides communication with other devices).
- Common forms of physical and/or tangible computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read instructions and/or code.
- Any such memory may function as memory 520 or memory 635 or as secure memory if structured to maintain security of stored content.
- object representations may have a certain level of associated security, and may be stored in portions of memory 635 associated with certain security or privacy setting.
- the communications subsystem 630 (and/or components thereof) generally will receive the signals, and the bus 605 then might carry the signals (and/or the data, instructions, etc. carried by the signals) to the working memory 635 , from which the processor(s) 610 retrieves and executes the instructions.
- the instructions received by the working memory 635 may optionally be stored on a non-transitory storage device 625 either before or after execution by the processor(s) 610 .
- computing devices may be networked in order to communicate information.
- mobile device 500 may be networked to receive information or communicate with a remote object representation database as described above.
- each of these elements may engage in networked communications with other devices such as web servers, databases, or computers which provide access to information to enable applications via network.
- FIG. 7 illustrates a schematic diagram of a system 700 of networked computing devices that can be used in accordance with various embodiments to enable systems such as system 700 or other systems that may implement systems for creating and storing object representations for later use in identifying objects.
- the output object representation may be communicated via networked computers to one or more databases as described by system 700 .
- the system 700 can include one or more user computing devices 705 .
- the user computing devices 705 can be general-purpose personal computers (including, merely by way of example, personal computers and/or laptop computers running any appropriate flavor of Microsoft® Windows® 2 and/or Mac OS® 3 operating systems) and/or workstation computers running any of a variety of commercially-available UNIX® 4 or UNIX-like operating systems.
- These user computing devices 705 can also have any of a variety of applications, including one or more applications configured to perform methods of the invention, as well as one or more office applications, database client and/or server applications, and web browser applications.
- the user computing devices 705 can be any other electronic device, such as a thin-client computer, Internet-enabled mobile telephone, and/or personal digital assistant (PDA), capable of communicating via a network (e.g., the network 710 described below) and/or displaying and navigating web pages or other types of electronic documents.
- a network e.g., the network 710 described below
- the exemplary system 700 is shown with three user computing devices 705 a,b,c , any number of user computing devices can be supported.
- Certain embodiments of the invention operate in a networked environment, which can include a network 710 .
- the network 710 can be any type of network familiar to those skilled in the art that can support data communications using any of a variety of commercially-available protocols, including, without limitation, TCP/IP, SNA, IPX, AppleTalk® 3 , and the like.
- the network 710 can be a local area network (“LAN”), including, without limitation, an Ethernet network, a Token-Ring network and/or the like; a wide-area network (WAN); a virtual network, including, without limitation, a virtual private network (“VPN”); the Internet; an intranet; an extranet; a public switched telephone network (“PSTN”); an infrared network; a wireless network, including, without limitation, a network operating under any of the IEEE 802.11 suite of protocols, the Bluetooth protocol known in the art, and/or any other wireless protocol; and/or any combination of these and/or other networks.
- Network 710 may include access points for enabling access to network 710 by various computing devices.
- Embodiments of the invention can include one or more server computers 760 .
- Each of the server computers 760 a,b may be configured with an operating system, including, without limitation, any of those discussed above, as well as any commercially (or freely) available server operating systems.
- Each of the server computers 760 a,b may also be running one or more applications, which can be configured to provide services to one or more user computing devices 705 and/or other server computers 760 .
- one of the server computers 760 may be a web server, which can be used, merely by way of example, to process requests for web pages or other electronic documents from user computing devices 705 .
- the web server can also run a variety of server applications, including HTTP servers, FTP servers, CGI servers, database servers, Java® 5 servers, and the like.
- the web server may be configured to serve web pages that can be operated within a web browser on one or more of the user computing devices 705 to perform methods of the invention.
- Such servers may be associated with particular IP addresses, or may be associated with modules having a particular URL, and may thus store secure navigation modules which may interact with a mobile device such as mobile device 500 to provide secure indications of geographic points as part of location services provided to mobile device 500 .
- one or more server computers 760 can function as a file server and/or can include one or more of the files (e.g., application code, data files, etc.) necessary to implement methods of various embodiments incorporated by an application running on a user computing device 705 and/or another server computer 760 .
- a file server can include all necessary files, allowing such an application to be invoked remotely by a user computing device 705 and/or server computer 760 .
- the system can include one or more databases 720 .
- the location of the database(s) 720 is discretionary: merely by way of example, a database 720 a might reside on a storage medium local to (and/or resident in) a server 760 a (and/or a user computing device 705 ).
- a database 720 b can be remote from any or all of the user computing devices 705 or server computers 760 , so long as the database 720 b can be in communication (e.g., via the network 710 ) with one or more of these.
- a database 720 can reside in a storage-area network (“SAN”) familiar to those skilled in the art.
- SAN storage-area network
- the database 720 can be a relational database, such as an Oracle® 5 database, that is adapted to store, update, and retrieve data in response to SQL-formatted commands.
- the database might be controlled and/or maintained by a database server, as described above, for example.
- Such databases may store information relevant to levels of security.
- embodiments were described as processes which may be depicted in a flow with process arrows. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure.
- embodiments of the methods may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
- the program code or code segments to perform the associated tasks may be stored in a computer-readable medium such as a storage medium. Processors may perform the associated tasks.
- the above elements may merely be a component of a larger system, wherein other rules may take precedence over or otherwise modify the application various embodiments, and any number of steps may be undertaken before, during, or after the elements of any embodiment are implemented.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
- Length Measuring Devices By Optical Means (AREA)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2014/042005 WO2014201176A1 (en) | 2013-06-11 | 2014-06-11 | Interactive and automatic 3-d object scanning method for the purpose of database creation |
KR1020167000414A KR101775591B1 (ko) | 2013-06-11 | 2014-06-11 | 데이터베이스 생성의 목적을 위한 대화식 및 자동 3-d 오브젝트 스캐닝 방법 |
US14/302,056 US9501725B2 (en) | 2013-06-11 | 2014-06-11 | Interactive and automatic 3-D object scanning method for the purpose of database creation |
JP2016519630A JP6144826B2 (ja) | 2013-06-11 | 2014-06-11 | データベース作成のための対話型および自動的3dオブジェクト走査方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361833889P | 2013-06-11 | 2013-06-11 | |
US14/302,056 US9501725B2 (en) | 2013-06-11 | 2014-06-11 | Interactive and automatic 3-D object scanning method for the purpose of database creation |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140363048A1 US20140363048A1 (en) | 2014-12-11 |
US9501725B2 true US9501725B2 (en) | 2016-11-22 |
Family
ID=52005520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/302,056 Active 2034-07-21 US9501725B2 (en) | 2013-06-11 | 2014-06-11 | Interactive and automatic 3-D object scanning method for the purpose of database creation |
Country Status (6)
Country | Link |
---|---|
US (1) | US9501725B2 (da) |
EP (1) | EP3008694B1 (da) |
JP (1) | JP6144826B2 (da) |
KR (1) | KR101775591B1 (da) |
CN (1) | CN105247573B (da) |
WO (1) | WO2014201176A1 (da) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150081090A1 (en) * | 2013-09-13 | 2015-03-19 | JSC-Echigo Pte Ltd | Material handling system and method |
US20160210525A1 (en) * | 2015-01-16 | 2016-07-21 | Qualcomm Incorporated | Object detection using location data and scale space representations of image data |
US20170249742A1 (en) * | 2016-02-25 | 2017-08-31 | Nigella LAWSON | Depth of field processing |
US11740690B2 (en) | 2017-01-27 | 2023-08-29 | Qualcomm Incorporated | Systems and methods for tracking a controller |
Families Citing this family (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10395125B2 (en) | 2016-10-06 | 2019-08-27 | Smr Patents S.A.R.L. | Object detection and classification with fourier fans |
US10262462B2 (en) * | 2014-04-18 | 2019-04-16 | Magic Leap, Inc. | Systems and methods for augmented and virtual reality |
EP3023808A4 (en) * | 2013-07-18 | 2017-09-06 | Mitsubishi Electric Corporation | Target type identification device |
KR20150049535A (ko) * | 2013-10-30 | 2015-05-08 | 삼성전자주식회사 | 전자장치 및 그 이용방법 |
WO2015155628A1 (en) * | 2014-04-07 | 2015-10-15 | Eyeways Systems Ltd. | Apparatus and method for image-based positioning, orientation and situational awareness |
US10026010B2 (en) * | 2014-05-14 | 2018-07-17 | At&T Intellectual Property I, L.P. | Image quality estimation using a reference image portion |
JP6399518B2 (ja) * | 2014-12-18 | 2018-10-03 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 処理装置、処理方法、およびプログラム |
SI3262443T1 (sl) * | 2015-02-26 | 2023-10-30 | Bruel & Kjaer Sound & Vibration Measurement A/S | Metoda zaznavanja prostorske orientacije pretvornika z eno ali več značilnostmi prostorske orientacije |
US10042078B2 (en) * | 2015-02-27 | 2018-08-07 | The United States of America, as Represented by the Secretary of Homeland Security | System and method for viewing images on a portable image viewing device related to image screening |
US9595038B1 (en) * | 2015-05-18 | 2017-03-14 | Amazon Technologies, Inc. | Inventory confirmation |
RU2602386C1 (ru) * | 2015-05-26 | 2016-11-20 | Общество с ограниченной ответственностью "Лаборатория 24" | Способ визуализации объекта |
EP3304358A1 (en) * | 2015-06-05 | 2018-04-11 | Hexagon Technology Center GmbH | Method and apparatus for performing a geometric transformation on objects in an object-oriented environment using a multiple-transaction technique |
US9734587B2 (en) * | 2015-09-30 | 2017-08-15 | Apple Inc. | Long term object tracker |
US10318819B2 (en) * | 2016-01-05 | 2019-06-11 | The Mitre Corporation | Camera surveillance planning and tracking system |
EP3408848A4 (en) | 2016-01-29 | 2019-08-28 | Pointivo Inc. | SYSTEMS AND METHOD FOR EXTRACTING INFORMATION ON OBJECTS FROM SCENE INFORMATION |
JP6744747B2 (ja) * | 2016-04-01 | 2020-08-19 | キヤノン株式会社 | 情報処理装置およびその制御方法 |
US10019839B2 (en) | 2016-06-30 | 2018-07-10 | Microsoft Technology Licensing, Llc | Three-dimensional object scanning feedback |
US11400860B2 (en) | 2016-10-06 | 2022-08-02 | SMR Patents S.à.r.l. | CMS systems and processing methods for vehicles |
US10339627B2 (en) | 2016-10-10 | 2019-07-02 | Gopro, Inc. | Apparatus and methods for the optimal stitch zone calculation of a generated projection of a spherical image |
JP2018067115A (ja) * | 2016-10-19 | 2018-04-26 | セイコーエプソン株式会社 | プログラム、追跡方法、追跡装置 |
WO2018075628A1 (en) * | 2016-10-19 | 2018-04-26 | Shapeways, Inc. | Systems and methods for identifying three-dimensional printed objects |
US10194097B2 (en) | 2017-01-13 | 2019-01-29 | Gopro, Inc. | Apparatus and methods for the storage of overlapping regions of imaging data for the generation of optimized stitched images |
CN107481284A (zh) * | 2017-08-25 | 2017-12-15 | 京东方科技集团股份有限公司 | 目标物跟踪轨迹精度测量的方法、装置、终端及系统 |
US11087536B2 (en) | 2017-08-31 | 2021-08-10 | Sony Group Corporation | Methods, devices and computer program products for generation of mesh in constructed 3D images |
WO2019045724A1 (en) * | 2017-08-31 | 2019-03-07 | Sony Mobile Communications Inc. | METHODS, DEVICES AND COMPUTER PROGRAM PRODUCTS FOR GENERATING 3D IMAGES |
DK180470B1 (da) | 2017-08-31 | 2021-05-06 | Apple Inc | Systemer, fremgangsmåder og grafiske brugergrænseflader til interaktion med augmented og virtual reality-miljøer |
WO2019075276A1 (en) * | 2017-10-11 | 2019-04-18 | Aquifi, Inc. | SYSTEMS AND METHODS FOR IDENTIFYING OBJECT |
US10860847B2 (en) * | 2017-10-17 | 2020-12-08 | Motorola Mobility Llc | Visual perception assistant |
EP3486606A1 (de) * | 2017-11-20 | 2019-05-22 | Leica Geosystems AG | Stereokamera und stereophotogrammetrisches verfahren |
KR102666508B1 (ko) * | 2018-01-24 | 2024-05-20 | 애플 인크. | 3d 모델들에 대한 시스템 전체 거동을 위한 디바이스들, 방법들, 및 그래픽 사용자 인터페이스들 |
DK180842B1 (da) | 2018-01-24 | 2022-05-12 | Apple Inc | Indretninger, fremgangsmåder og grafiske brugergrænseflader til System-Wide adfærd for 3D-modeller |
CN108592919B (zh) * | 2018-04-27 | 2019-09-17 | 百度在线网络技术(北京)有限公司 | 制图与定位方法、装置、存储介质和终端设备 |
US10885622B2 (en) | 2018-06-29 | 2021-01-05 | Photogauge, Inc. | System and method for using images from a commodity camera for object scanning, reverse engineering, metrology, assembly, and analysis |
CN109063567B (zh) * | 2018-07-03 | 2021-04-13 | 百度在线网络技术(北京)有限公司 | 人体识别方法、装置及存储介质 |
CN111046698B (zh) * | 2018-10-12 | 2023-06-20 | 锥能机器人(上海)有限公司 | 可视化编辑的视觉定位方法和系统 |
CN109325970A (zh) * | 2018-12-03 | 2019-02-12 | 舒彬 | 一种基于目标位姿跟踪的增强现实系统 |
CN109840500B (zh) * | 2019-01-31 | 2021-07-02 | 深圳市商汤科技有限公司 | 一种三维人体姿态信息检测方法及装置 |
CN109931906B (zh) * | 2019-03-28 | 2021-02-23 | 华雁智科(杭州)信息技术有限公司 | 摄像机测距方法、装置以及电子设备 |
CN111986250B (zh) * | 2019-05-22 | 2024-08-20 | 顺丰科技有限公司 | 物体体积测量方法、装置、测量设备及存储介质 |
CN112102223B (zh) * | 2019-06-18 | 2024-05-14 | 通用电气精准医疗有限责任公司 | 用于自动设置扫描范围的方法和系统 |
US11288835B2 (en) * | 2019-09-20 | 2022-03-29 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Lighttrack: system and method for online top-down human pose tracking |
SE543108C2 (en) * | 2019-10-23 | 2020-10-06 | Winteria Ab | Method and device for inspection of a geometry, the device comprising image capturing and shape scanning means |
WO2021083475A1 (en) * | 2019-10-28 | 2021-05-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Method for generating a three dimensional, 3d, model |
US11407431B2 (en) | 2019-11-22 | 2022-08-09 | Samsung Electronics Co., Ltd. | System and method for object trajectory prediction in an autonomous scenario |
AU2020404468B2 (en) * | 2019-12-17 | 2023-02-23 | Motion Metrics International Corp. | Apparatus for analyzing a payload being transported in a load carrying container of a vehicle |
EP4111368A4 (en) * | 2020-02-26 | 2023-08-16 | Magic Leap, Inc. | CROSS REALITY SYSTEM WITH BUFFERING FOR LOCATION ACCURACY |
CN111432291B (zh) * | 2020-03-20 | 2021-11-05 | 稿定(厦门)科技有限公司 | 视频分段取帧场景下的视图更新方法及装置 |
US11941860B2 (en) * | 2021-09-24 | 2024-03-26 | Zebra Tehcnologies Corporation | Computational load mitigation for image-based item recognition |
CN114119686A (zh) * | 2021-11-24 | 2022-03-01 | 刘文平 | 空间布局相似计算的多源遥感影像配准方法 |
KR20230098944A (ko) * | 2021-12-27 | 2023-07-04 | 주식회사 버넥트 | 모바일 환경에서 실시간 트래킹을 위한 키포인트 선택 방법 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006084385A1 (en) | 2005-02-11 | 2006-08-17 | Macdonald Dettwiler & Associates Inc. | 3d imaging system |
US20070216675A1 (en) * | 2006-03-16 | 2007-09-20 | Microsoft Corporation | Digital Video Effects |
US20090010507A1 (en) * | 2007-07-02 | 2009-01-08 | Zheng Jason Geng | System and method for generating a 3d model of anatomical structure using a plurality of 2d images |
US20100232727A1 (en) * | 2007-05-22 | 2010-09-16 | Metaio Gmbh | Camera pose estimation apparatus and method for augmented reality imaging |
US20100310177A1 (en) | 2009-05-06 | 2010-12-09 | University Of New Brunswick | Method of interest point matching for images |
US20110299770A1 (en) | 2009-12-02 | 2011-12-08 | Qualcomm Incorporated | Performance of image recognition algorithms by pruning features, image scaling, and spatially constrained feature matching |
US20120194513A1 (en) * | 2011-02-01 | 2012-08-02 | Casio Computer Co., Ltd. | Image processing apparatus and method with three-dimensional model creation capability, and recording medium |
US20120300979A1 (en) | 2011-05-27 | 2012-11-29 | Qualcomm Incorporated | Planar mapping and tracking for mobile devices |
US20130148851A1 (en) | 2011-12-12 | 2013-06-13 | Canon Kabushiki Kaisha | Key-frame selection for parallel tracking and mapping |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4715539B2 (ja) * | 2006-02-15 | 2011-07-06 | トヨタ自動車株式会社 | 画像処理装置、その方法、および画像処理用プログラム |
-
2014
- 2014-06-11 JP JP2016519630A patent/JP6144826B2/ja not_active Expired - Fee Related
- 2014-06-11 KR KR1020167000414A patent/KR101775591B1/ko active IP Right Grant
- 2014-06-11 CN CN201480030440.XA patent/CN105247573B/zh active Active
- 2014-06-11 EP EP14738674.2A patent/EP3008694B1/en active Active
- 2014-06-11 WO PCT/US2014/042005 patent/WO2014201176A1/en active Application Filing
- 2014-06-11 US US14/302,056 patent/US9501725B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006084385A1 (en) | 2005-02-11 | 2006-08-17 | Macdonald Dettwiler & Associates Inc. | 3d imaging system |
US20070216675A1 (en) * | 2006-03-16 | 2007-09-20 | Microsoft Corporation | Digital Video Effects |
US20100232727A1 (en) * | 2007-05-22 | 2010-09-16 | Metaio Gmbh | Camera pose estimation apparatus and method for augmented reality imaging |
US20090010507A1 (en) * | 2007-07-02 | 2009-01-08 | Zheng Jason Geng | System and method for generating a 3d model of anatomical structure using a plurality of 2d images |
US20100310177A1 (en) | 2009-05-06 | 2010-12-09 | University Of New Brunswick | Method of interest point matching for images |
US20110299770A1 (en) | 2009-12-02 | 2011-12-08 | Qualcomm Incorporated | Performance of image recognition algorithms by pruning features, image scaling, and spatially constrained feature matching |
US20120194513A1 (en) * | 2011-02-01 | 2012-08-02 | Casio Computer Co., Ltd. | Image processing apparatus and method with three-dimensional model creation capability, and recording medium |
US20120300979A1 (en) | 2011-05-27 | 2012-11-29 | Qualcomm Incorporated | Planar mapping and tracking for mobile devices |
US20130148851A1 (en) | 2011-12-12 | 2013-06-13 | Canon Kabushiki Kaisha | Key-frame selection for parallel tracking and mapping |
Non-Patent Citations (3)
Title |
---|
Dong, Zilong, Zhang, Guofeng, Jla, Jiaya, Bao, Hujun, Bao. "Keyframe-Based Real-Time Camera Tracking" IEEE 12th Conference on Computer Vision. 2009. * |
International Search Report and Written Opinion-PCT/US2014/042005-ISA/EPO-Sep. 25, 2014. |
Pircchiem, Christian, Reitmayr, Gerhard. "Homography-Based Planar Mapping and Tracking for Mobile Phones". IEEE International Symposium on Mixed and Augmented Reality 2011. * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150081090A1 (en) * | 2013-09-13 | 2015-03-19 | JSC-Echigo Pte Ltd | Material handling system and method |
US10417521B2 (en) * | 2013-09-13 | 2019-09-17 | Jcs-Echigo Pte Ltd | Material handling system and method |
US20160210525A1 (en) * | 2015-01-16 | 2016-07-21 | Qualcomm Incorporated | Object detection using location data and scale space representations of image data |
US10133947B2 (en) * | 2015-01-16 | 2018-11-20 | Qualcomm Incorporated | Object detection using location data and scale space representations of image data |
US20170249742A1 (en) * | 2016-02-25 | 2017-08-31 | Nigella LAWSON | Depth of field processing |
US10003732B2 (en) * | 2016-02-25 | 2018-06-19 | Foodim Ltd | Depth of field processing |
US11740690B2 (en) | 2017-01-27 | 2023-08-29 | Qualcomm Incorporated | Systems and methods for tracking a controller |
Also Published As
Publication number | Publication date |
---|---|
KR20160019497A (ko) | 2016-02-19 |
KR101775591B1 (ko) | 2017-09-06 |
WO2014201176A1 (en) | 2014-12-18 |
JP2016521892A (ja) | 2016-07-25 |
EP3008694A1 (en) | 2016-04-20 |
EP3008694B1 (en) | 2021-01-27 |
CN105247573A (zh) | 2016-01-13 |
JP6144826B2 (ja) | 2017-06-07 |
US20140363048A1 (en) | 2014-12-11 |
CN105247573B (zh) | 2018-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9501725B2 (en) | Interactive and automatic 3-D object scanning method for the purpose of database creation | |
CN110411441B (zh) | 用于多模态映射和定位的系统和方法 | |
CN106471548B (zh) | 使用外围信息的加速模板匹配的方法和装置 | |
US9400941B2 (en) | Method of matching image features with reference features | |
US11842514B1 (en) | Determining a pose of an object from rgb-d images | |
KR101072876B1 (ko) | 이동 로봇에서 자신의 위치를 추정하기 위한 방법 및 장치 | |
US9679384B2 (en) | Method of detecting and describing features from an intensity image | |
JP7147753B2 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
US20160210761A1 (en) | 3d reconstruction | |
KR20150079730A (ko) | 컴퓨터 비전 기반 추적을 위해 멀티 맵들을 병합하는 시스템들 및 방법들 | |
JP2016514251A (ja) | カメラ補助による動き方向および速度推定 | |
US10607350B2 (en) | Method of detecting and describing features from an intensity image | |
WO2022237048A1 (zh) | 位姿获取方法、装置、电子设备、存储介质及程序 | |
KR20200110120A (ko) | 3d-vr 멀티센서 시스템 기반의 도로 시설물 관리 솔루션을 구현하는 시스템 및 그 방법 | |
CN118265996A (zh) | 显示图像转换可能性信息的计算设备 | |
KR20220062709A (ko) | 모바일 디바이스 영상에 기반한 공간 정보 클러스터링에 의한 재난 상황 인지 시스템 및 방법 | |
KR102249381B1 (ko) | 3차원 영상 정보를 이용한 모바일 디바이스의 공간 정보 생성 시스템 및 방법 | |
CN113228117B (zh) | 创作装置、创作方法和记录有创作程序的记录介质 | |
KR102249380B1 (ko) | 기준 영상 정보를 이용한 cctv 장치의 공간 정보 생성 시스템 | |
CN110617800A (zh) | 基于民航客机的应急遥感监测方法、系统及存储介质 | |
CN112927291B (zh) | 三维物体的位姿确定方法、装置及电子设备和存储介质 | |
KR20220032332A (ko) | Cctv 장치 기반의 침수위 추정 시스템 | |
CN115457231A (zh) | 一种更新三维图像的方法及相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VRCELJ, BOJAN;KNOBLAUCH, DANIEL;KRISHNAMOORTHI, RAGHURAMAN;AND OTHERS;SIGNING DATES FROM 20140617 TO 20141217;REEL/FRAME:034595/0503 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |