WO2011088497A1 - Object recognition method and computer system - Google Patents
Object recognition method and computer system Download PDFInfo
- Publication number
- WO2011088497A1 WO2011088497A1 PCT/AU2011/000028 AU2011000028W WO2011088497A1 WO 2011088497 A1 WO2011088497 A1 WO 2011088497A1 AU 2011000028 W AU2011000028 W AU 2011000028W WO 2011088497 A1 WO2011088497 A1 WO 2011088497A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data points
- function
- dimensional
- computer executable
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 187
- 230000009466 transformation Effects 0.000 claims abstract description 107
- 238000012549 training Methods 0.000 claims abstract description 100
- 238000012360 testing method Methods 0.000 claims abstract description 87
- 238000012545 processing Methods 0.000 claims abstract description 62
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims description 278
- 238000003384 imaging method Methods 0.000 claims description 32
- 238000013528 artificial neural network Methods 0.000 claims description 21
- 238000003066 decision tree Methods 0.000 claims description 13
- 238000013139 quantization Methods 0.000 claims description 13
- 230000001133 acceleration Effects 0.000 claims description 12
- 238000013519 translation Methods 0.000 claims description 12
- 125000004122 cyclic group Chemical group 0.000 claims description 9
- 238000010008 shearing Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 6
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 4
- 238000003491 array Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000006731 degradation reaction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000002059 diagnostic imaging Methods 0.000 description 3
- 238000002595 magnetic resonance imaging Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004040 coloring Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000005670 electromagnetic radiation Effects 0.000 description 2
- 239000000446 fuel Substances 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
Definitions
- the present invention relates to a method of 3 -dimensional object recognition, and to the computer executable steps, computer readable code, one or more computer readable mediums, and one or more computers adapted for such 3 -dimensional object recognition.
- the primary application of the invention is to the field of object recognition in 3 -dimensional space, as may be used as part of an artificial intelligence or robotics system. However, it will be appreciated that the invention is not limited to this particular field of use.
- 3-dimensional laser scanners and other non-laser reticle projectio systems which enable a 3-dimensional object to be scanned and a 3-dimensional digital model to be generated.
- such systems enable the human face and other objects to be scanned for computer games and movie production, and for the generation of 3 -dimensional laser images in crystalline polymers.
- Some current 3-dimensional object recognition systems rely upon environment-dependant conditions such as the distinct colouring on objects.
- Geometric hashing has been well documented as a means of recognising objects at arbitrary angles and positions within a scene, however the performance of this algorithm is dependent upon the ability to match the geometry of feature point distributions between scenes, and with a limited number of feature points per object (and thus a limited number of unique feature point distributions) this method degrades significantly when required to recognise an object against a large number of candidates. See
- US Patent 2006/0204079 (Toshiba) describes an object recognition method in which 2- dimensional images are generated of a 3-dimensional scene containing objects. Feature points in the 2- dimensional image are identified and sets of object triangles are identified, which are then normalised, enabling 2-dimensional images of cropped regions of the normalised image to be indexed in a hash table and compared.
- the methods disclosed in these two prior art patents have the disadvantage that large amounts of information relating to a scene is lost when a 2-dimensional image is made of a 3 - dimensional scene, and therefore much of the "information", after normalisation or other forms of statistical analysis, is essentially lost, resulting a degraded object recognition capability.
- EP 2133838 (Omron Corp) describe a true 3-dimensional object recognition method. Feature points are identified in the 3-dimensional image, and corresponding 3- dimensionally disposed object triangles are generated, again by grouping together sets of three apexes of the 3-dimensionaUy disposed object triangles. The object triangles are then normalised and the vertices matched geometrically, based on many different spatial orientations, to pre-stored sets of object triangle dispositions.
- the technique disclosed has the significant disadvantage, similar to the other two prior art patents, that the "rich" information in the original 3-dimensional (in this case) image is significantly degraded, this time because the the eventual object identification method relies on comparison of only the transformed feature data points, extracted from the image, with the pre-stored data.
- the object recognition method should operate in 3- dimensions and the data comparison should operate on the largest possible data set, rather than being limited to just the feature data points.
- Such methods have eluded those skilled i the art to date due to the huge amount of data processing and data management required by such methods, and thus preventing such a method to be employed as a useful "real-time" object recognition tool.
- the present invention aims to overcome the identified disadvantages of the prior art object recognition methods, and yet provide a method which is "data rich” and extremely efficient, and therefore fast in execution.
- the object recognition method should operate in 3 -dimensional space and operate independently of the object position and orientation, and the viewpoint position and orientation in a given scene, operate in a large range of lighting conditions and operate efficiently in a data-processing sense.
- the object recognition method should also operate without the need for artificial colouring to be applied to the objects, should not require an object to be viewed from all angles during either the training or testing stage.
- the method should make use of the appearance (i.e. the data "richness") of a scene and make use of its 3 -dimensional structure, should operate on objects without planar surfaces or sharp edges to aid identification, and should not require complete 3-dimensionally "closed" images of the object to be first obtained during the training stage.
- the method according to the present invention will reduce in performance where object features cannot be extracted for an object in a scene. This can occur in the scenario where an object being recognised has both no distinct areas of high curvature in shape, for example comers, and no distinct areas of high curvature in surface texture.
- object features there are other well known methods to identify object features. Also some "degenerate" objects may look the same regardless of the viewing angle and may still be recognisable regardless of the accuracy and repeatability of the object feature extraction method so disclosed.
- a method of recognising an object comprising: generating one or more sets of characterising data as training data during an initial training data generation stage; generating one or more sets of characterising data as test data during a later test data generation stage; and.comparing the training data and test data using a comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises:
- Step 1 for a scene containing die object in a 3-dimensional space defined by an axis system, deriving a set of 3-dimensional object data points for the object, each object data point defined by 3-dimensional coordinates and one or more light intensity values and, based on a first data- processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3- dimensional coordinates, and grouping one or more sets of three feature data, points; and
- Step 2 for each set of three feature data points, applying a 3-dimensional transformation function to the coordinates of one or more object data points to generate respective transformed object data points with respective new coordinates, each transformed object data point having the same associated one or more light intensity values as the object data point from which it was generated, wherein the 3-dimensional transformation function is a function of the set of three feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
- the object recognition method operates in 3 -dimensions and this minimizes the "information degradation" which would otherwise occur if the object recognition method operated in only 2-dimensions.
- the object recognition method uses the largest possible data set, the derived object data points, rather than being limited to just using the derived set of feature data points, and this also minimizes the "information degradation" associated with using the latter much more smaller and more limited data set.
- the object recognition method operates independently of the object position and orientation, and the viewpoint position and orientation in a given scene, and operates in a large range of lighting conditions.
- the object recognition method operates efficiently in a data-processing sense, because the sets of characterising data generated as training data during the initial training data generation stage can be of the same format as the sets of characterising data generated as test data during the later test data stage generation stage, thereby facilitating efficient comparison of the training data and test data.
- each object data point is defined by 3-dimensional coordinates and also one or more light intensity values, information relating to light intensity and "colour" of data points is not lost during the subsequent data processing.
- a set of feature data points is derived corresponding to object features of the object.
- each feature data point is also defined by 3-dimensional coordinates
- the 3- dimensional spatial disposition of the feature data points can be used in the subsequent data processing.
- one or more sets of three feature data points are grouped, thereby defining an associated 3-dimensional spatial disposition defined by these three feature data points.
- the 3-dimensional transformation function is a function of the set of three feature data points, and hence the 3-dimensional spatial disposition defined by these three feature data points.
- each transformed object data point has the same associated one or more light intensity values as the object data point from which it was generated, resulting in the information associated with the one or more light intensity values not being lost in the transformation process.
- the second data processing of the transformed object data points is carried out to generate one of the one or more sets of characterising data, in order to generate a predetermined format for the set of characterising data.
- the scene containing the object is computer software generated.
- the transformation function is a function of a cyclic permutation of the set of three feature data points.
- the transformation function in Step 2 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature dat points.
- the transformation function in Step 2 also involves shearing of the one or more object data points, based upon the set of three feature data points.
- the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3 -dimensional coordinates corresponding to the respective 3 -dimensional coordinates of the set of three feature data points.
- the one or more light intensity values correspond to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
- the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
- a unique object index is assigned to the object.
- the first data-processing in Step 1 comprises subdividing the object data points into a region of contiguous object data points with similar one or more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
- each set of three feature data points are derived from a single region.
- one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
- the first function is a luminosity contrast function of the one or more light intensity values.
- the second function is a geometric 3 -dimensional centroid of the coordinates of the object data points in the region.
- the third function relates to the distance of one or more of the object data points in the region from the base point.
- the third function is a relative and/or absolute minima and/or maxima of the distance of one or more of the object data points in the region from the base point.
- one more of the object data points in the region are on a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
- one or more feature points are derived as a function of one more object data points on a boundary of the region, based upon the local curvature of the boundary.
- the base point is derived as a feature data point.
- the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
- the first data-processing in Step 1 comprises detennining at least one point of high local curvature of the object data points, and designating the point as a feature data point.
- the first data-processing in Step 1 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data-point.
- the object data points are derived by 3-dimensiohally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3-dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3- - dimensional coordinates in the space.
- each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
- the object data points are isolated for the object.
- the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3- dimensional image of the object
- the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean dept gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold ' to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
- the position coordinates and the orientation of each viewpoint in the space are predefined for a scene.
- At least one of the viewable points reside on a surface of the object.
- the object data points are derived by 3 -dimensional imaging of the scene.
- one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3-dimensional imaging of the scene.
- the 3-dimensional imaging comprises creating at least two 2- dimensional sub-images from two alternative viewpoints, slightly offset by a predetermined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and/or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
- the 3-dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
- the viewing properties comprise a view width angle and a view height angle.
- the first data processing in Step 1 is performed using hardware acceleration, such as available in graphics processors.
- the transformation function in Step 2 comprises shifting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system ( ⁇ ', ⁇ ', Z 1 ) defined by a first new axis (X a second new axis ( ⁇ '), and a third new axis ( ⁇ '), the new axis orientation and new origin position are based on the 3-dimensional orientation of the reference plane and the 3-dimensional orientation and position of the reference side of the object triangle.
- the new axis system ( ⁇ ', ⁇ ', ⁇ ') is positioned and orientated such that the third new axis ( ⁇ ') is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis ( ⁇ ') is aligned parallel to the reference side, and the second new ⁇ axis ( ⁇ ') is directed through the mid-point o f the reference side in the direction of the remaining apex of the object triangle.
- the new axis system ( ⁇ ', ⁇ ', ⁇ ') is also scaled or the new origin position is shifted in the Z' direction, based upon the length of the reference side. '
- the transformation function in Step 2 comprises rotating the one or more object data points based upon the 3-dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-dimensional orientation of the reference side, and translating the one or more object data points based upon the 3-dimensional position of the reference side.
- the one or more object data points are also scaled or translated, based upon the length of the reference side.
- the transformation function in Step 2 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made coincident with the apexes of a predefined triangle.
- the reference side of the object triangle is transformed to be the same length as a predefined side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
- Step 2 a unique transformation index is assigned for the transformation function.
- the transformation function in Step 2 is performed using hardware acceleration, such as available in graphics processors.
- the second data-processing in Step 2 comprises creating an image of the transformed object data points.
- the image is created from an interpolated or non-interpolated 3-dimensional surface mesh defined by 3-dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
- the set of object data points is used to form an untransformed interpolated or non- interpolated 3-dimensional surface mesh defined by 3-dimensional surface mesh points, the 3- dimensional coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
- the surface mesh points of the untransformed interpolated or non-interpolated 3- dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non.-interpolated 3-dimensional surface mesh.
- the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
- the image comprises a depth map or a function of the depth map calculated based upon the 3-dimensional coordinates of the transformed object data points.
- the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3- dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
- a luminosity map of one or more light intensity values such as an RGB map or a YCbCr map
- a function of the luminosity map calculated based upon the 3- dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
- the image is a 2-dimensional image.
- the image is a 3-dimensional image.
- the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
- the function of the image is a spatial convolution function, and this is applied to the image to generate one or more convolution coefficients, or a function of the one or more convolution coefficients.
- the function of the one or more convolution coefficients is a binning function.
- the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
- each of the one or more subsets are represented as a 1 -dimensional binary string.
- the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
- a new axis system is defined by a first new axis ( ⁇ '), second new axis ( ⁇ ') and third new axis ( ⁇ '), a new axis orientation and a new origin position being a function of the set of three feature data points
- the viewable surface mesh points are generated by interpolating the surface mesh at particular coordinate intervals along the first new axis ( ⁇ ') and along the second new axis ( ⁇ '), or along the first new axis ( ⁇ ') and along the second new axis (Y 1 ) and along the third new axis ( ⁇ ')
- each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
- the second data processing in Step 2 is performed using hardware acceleration, such as available in graphics processors. "
- the set of characterising data in Step 2 also includes data relating to the object index.
- the set of characterising data in Step 2 also includes data relating to the transformation index.
- the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
- the comparison algorithm comprises usi g a neural network to determine the degree of equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
- the one or more sets of characterising data each comprise one or more subsets
- the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
- the one or more sets of characterising data each comprise one or more subsets
- the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
- the function of the image of the transformed object data points is a neural network or a decision tree.
- the function of the one or more convolution coefficients is a neural network or a decision tree.
- the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
- image processing function such as a Gaussian filter
- _the transformation function in Step 2 is a geometric hashing function or an affme transformation.
- the set of characterising data also includes one or more object data points.
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database. .
- a first index or set of indices for the indexed database is a function of the respective set of characterising data.
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional hash-table
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database
- a first inde or set of indices for the indexed database is one of the one or more subsets of the respective set of characterising data.
- a set of characterising data generated as test data is compared with the.training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
- a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
- the generation of the one or more sets of characterising data comprises:
- each object data point defined by 3-dimensional coordinates and one or more light intensity values and, based on a first data- processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3- dimensional coordinates, and grouping one or more sets of three feature data points;
- the computer executable steps for recognising an object operate in 3-dimensions and this minimizes the "information degradation" which would otherwise occur if the object recognition method operated in only 2-dimensions.
- the computer executable steps for recognising an object use the largest possible data set, the derived object data points, rather than being limited to just using the derived set of feature data points, and this also nrininiizes the "information degradation" associated with using the latter much more smaller and more limited data set.
- the computer executable steps for recognising an object operate independently of the object position and orientation, and the viewpoint position and orientation in a given scene, and operates in a large range of lighting conditions.
- the computer executable steps for recognising an object operate efficiently in a data- processing sense, because the sets of characterising data generated as training data during the initial training data generation stage can be of the same format as the sets of characterising data generated as test data during the later test data stage generation stage, thereby facilitating efficient comparison of the training data and test data.
- each object data point is defined by 3-dimensional coordinates and also one or more light intensity values, information relating to light intensity and "colour" of data points is not lost during the subsequent data processing.
- a set of feature data points is derived corresponding to object features of the object.
- each feature data point is also defined by 3-dimensional coordinates
- the 3- dimensional spatial disposition of the feature data points can be used in the subsequent data processing.
- one or more sets of three feature data points are grouped, thereby defining an associated 3-dimensional spatial disposition defined by these three feature data points.
- the 3-dimensional transformation function is a function of the set of three feature data points, and hence the 3-dimensional spatial disposition defined by these three feature data points;
- each transformed object data point has the same associated one or more light intensity values as the object data point from which it was generated, resulting in the information associated with the one or more light intensity values not being lost in the transformation process.
- the second data processing of the transformed object data points is carried out to generate one of the one or more sets of characterising data, in order to generate a predetermined format for the set of characterising data.
- the scene containing the object is computer software generated.
- the transformation function is a function of a cyclic permutation of the set of three feature data points.
- the transformation function in Step 3 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature data points.
- the transformation function in Step 3 also involves shearing of the one or more object data points, based upon the set of three feature data points.
- the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3-dimensional coordinates corresponding to the respective 3-dimensional coordinates of the set of three feature data points.
- the one or more light intensity values correspond to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
- the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
- a unique object index is assigned to the object
- the first data-processing in Step 2 comprises subdividing the object data points into a region of contiguous object data points with similar one or more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
- each set of three feature data points are derived from a single region.
- one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
- the first function is a luminosity contrast function of the one or more light intensity values.
- the second function is a geometric 3-dimensional centroid of the coordinates of the object data points in the region.
- the third function relates to the distance of one or more of the object data points in the region from the base point. .
- the third function is a relative and or absolute minima and/or maxima of the distance of one or more of the object data points in the region from the base point.
- one more of the object data points in the region are on a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
- one or more feature points are derived as a function of one more object data points on a boundary of the region, based upon the local curvature of the boundary.
- the base point is derived as a feature data point
- the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
- the first data-processing in Step 2 comprises determining at least one point of high local curvature of the object data points, and designating the point as a feature data point.
- the first data-processing in Step 2 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data point
- the object data points are derived by 3-dimensionally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3-dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3- dimensional coordinates in the space.
- each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
- the object data points are isolated for the object.
- the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3- dimensiond image of the object.
- one of the coordinates of the corresponding object data point will be set to a predetermined maximum value.
- the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean depth gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
- a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image
- a boolean depth gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image
- a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
- the position coordinates and the orientation of each viewpoint in the space are predefined for a scene.
- At least one of the viewable points reside on a surface of the object.
- the object data points are derived by 3 -dimensional imaging of the scene.
- one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3-dimensional imaging of the scene.
- the 3-dimensional imaging comprises creating at least two 2- dimensiohal sub-images from two alternative viewpoints, slightly offset by a predetermined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and/or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
- the 3 -dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
- the viewing properties comprise a view width angle and a view height angle.
- the first data processing in Step 2 is performed using hardware acceleration, such as available in graphics processors.
- the transformation function in Step 3 comprises sMfting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system (X ⁇ Y ⁇ Z') defined by a first new axis ( ⁇ '), a second new axis ( ⁇ '), and a third new axis ( ⁇ '), the new axis orientation and new origin position are based on the 3 -dimensional orientation of the reference plane and the 3 -dimensional orientation and position of the reference side of the object triangle.
- the new axis system ( ⁇ ', ⁇ ', ⁇ ') is positioned and orientated such that the third new axis (Z 1 ) is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis ( ⁇ ') is aligned parallel to the reference side, and the second new axis (Y*) is directed through the mid-point of the reference side in the direction of the remaining apex of the object triangle.
- the new axis system ⁇ ', ⁇ ', ⁇ ') is also scaled or the hew origin position is shifted in the Z' direction, based upon the length of the reference side.
- the transformation function in Step 3 comprises rotating the one or more object data points based upon the 3 -dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-dimensional orientation of the reference side, and translating the one or more object data points based upon the 3-dimensional position of the reference side.
- the one or more object data points are also scaled or translated, based upon the length of the reference side.
- the transformation function in Step 3 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made coincident with the apexes of a predefined triangle.
- the reference side of the object triangle is transformed to be the same length as a predefined side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
- Step 3 a unique transformation index is assigned for the transformation function.
- the transformation function in Step 3 is performed using hardware acceleration, such as available in graphics processors.
- the second data-processing in Step 3 comprises creating an image of the transformed object data points.
- the image is created from an interpolated or non-interpolated 3 -dimensional surface mesh defined by 3 -dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
- the set of object data points is used to form an un transformed interpolated or non- interpolated 3 -dimensional surface mesh defined by 3 -dimensional surface mesh points, the 3 - dimensional coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
- the surface mesh points of the untransformed interpolated or non-interpolated 3- dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non-interpolated 3-dimensional surface mesh.
- the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
- the image comprises a depth map or a function of the depth map calculated based upon the 3 -dimensional coordinates of the transformed object data points.
- the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3 - dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
- a luminosity map of one or more light intensity values such as an RGB map or a YCbCr map
- a function of the luminosity map calculated based upon the 3 - dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
- the image is a 2-dimensional image.
- the image is a 3-dimensional image.
- the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
- the function of the image is a spacial convolution function, and this is applied to the image to generate one or more convolution coefficients, or a function of the one or more convolution coefficients.
- the function of the one or more convolution coefficients is a binning function .
- the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
- each of the one or more subsets are represented as a 1 -dimensional binary string.
- the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
- a new axis system is defined by a first new axis ( ⁇ '), second new axis (Y 1 ) and third new axis ( ⁇ ') , a new axis orientation and a new origin position being a function o f the set of three feature data points
- the viewable surface mesh points are generated by interpolating the surface mesh at awakeular coordinate intervals along the first new axis ( ⁇ ') and along the second new axis (V), or along the first new axis ( ⁇ ') and along the second new axis ( ⁇ ') and along the third new axis ( ⁇ '), each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
- the second data processing in Step 3 is performed using hardware acceleration, such as available in graphics processors.
- the set of characterising data in Step 3 also includes data relating to the object index.
- the set of characterising data in Step 3 also includes data relating to the transformation index.
- the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
- the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
- the one or more sets of characterising data each comprise one or more subsets
- the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
- the one or more sets of characterising data each comprise one or more subsets
- the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
- the function of the image of the transformed object data points is a neural network or a decision tree.
- the function of the one or more convolution coefficients is a neural network or a decision tree.
- the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
- image processing function such as a Gaussian filter
- the transformation function in Step 3 is a geometric hashing function or an affine transformation.
- the set of characterising data also includes one or more object data points.
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
- a first index or set of indices for the indexed database is a function of the respective set of characterising data.
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional hash-table.
- the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
- a first index or set of indices for the indexed database is one of the one or more subsets of the respective set of characterising data.
- a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
- a set of characterising data generated as test data is compared with the training data by perforating a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
- one or more computer readable mediums comprising the computer readable code of any one of the preceding paragraphs.
- one or more computers adapted for recognising an object comprising:
- more than one computer are employed during the training data generation stage, more than one scene data input system can be employed during this stage for receiving the scene data, resulting in a correspondingly "richer" data set.
- more than one computer are employed during the test data generation stage, more than one scene data input system can be employed during this stage for receiving the scene data, resulting in a correspondingly "richer" data set.
- the scene data input systems employed according to the present invention may be a conventional 3 -dimensional imaging system, for example two stereoscopically arranged (laterally offset) digital cameras, or alternatively may comprise other more complex systems, for example a scanning laser interferometric imaging system, a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (MRI) scanning systems used in 3-dimensional medical imaging applications.
- a scanning laser interferometric imaging system for example two stereoscopically arranged (laterally offset) digital cameras
- MRI magnetic resonance imaging
- one of the computers has an output system for outputting the comparison data.
- the comparison data may be in the form of a rating (e.g. a numerical recognition rating) or a more simple "yes/no" indication (e.g. a graphic, an indicator light, or an acoustic "beep") corresponding to the recognition of the object.
- a rating e.g. a numerical recognition rating
- a more simple "yes/no" indication e.g. a graphic, an indicator light, or an acoustic "beep”
- Fig. 1 shows a cubic object residing a 3-dimensional space and viewed from two viewpoints
- Fig. 2 shows a flow diagram of the data generation method according to the present invention
- Fig. 3 shows a 2-dimensional image produced from one of the viewpoints in Fig. 1,
- Fig. 4 shows diagrammatieally a method of calculating the distance of a viewable point from a viewpoint using a parallax technique
- Fig. 5 shows a depth map based on the image in Fig. 3 and other depth data
- Fig. 6 shows the object data points produced from the image in Fig. 3,
- Fig. 7 shows an RGB map of the image in Fig. 3,
- Pig. 8 shows a luminosity map based on the RGB map in Fig. 7,
- Fig. 9 shows a luminosity contrast map based on the luminosity map in Fig. 8,
- Fig. 10 shows a depth contrast map based on the depth map in Fig. 5,
- Fig. 11 shows depth gradient map based on the depth map in Fig. 5,
- Fig. 12 shows a depth gradient contrast map based on the depth gradient map in Fig. 11 and the depth contrast map in Fig. 10,
- Fig. 13 shows a region based upon the luminosity contrast map in Fig. 9,
- Fig. 14 shows a combination of relative minimas and maximas from the Centroidal base point of the region in Fig. 13,
- Fig. 15 shows the corners map in Fig. 14, also with relevant object triangles
- Fig. 16 shows the start of a 3-dimeasional transformation function with the object triangle in the original coordinates system
- Fig. 17 shows step A of me transformation function in which a third new axis ( ⁇ ') is aligned perpendicular to me reference plane of the object triangle,
- Fig. 18 shows step B of the transformation function in which the third new axis ( ⁇ ') is positioned such that it passes through the mid-point between the two apexes at the extremities of the reference side
- Fig. 19 shows step C of the transformation function in which a first new axis ( ⁇ ') is aligned parallel to the reference side, and a second new axis ( ⁇ ') is directed through the mid-point in the direction of the third apex of the object triangle,
- Fig. 20 shows step D of the transformation function in which the coordinates (x, y, z) of the object data points are transformed into transformed object data points, with new coordinates ( ⁇ ', /, ⁇ ') in the new axis system,
- Fig. 21 shows an RGB representation of a set of object data points for a similar object containing a distinct surface pattern, also showing the location of an object triangle.
- Fig. 22 shows an interpolated 3 -dimensional surface mesh with surface mesh points
- Fig. 23 shows an interpolated 3-dimensional surface mesh in Fig. 22 with light intensity values of the mesh surface polygons
- Fig. 24 shows the interpolated 3-dimensional surface mesh in Fig. 22 with light intensity values of the mesh surface polygons, also showing the location of the object triangle,
- Fig. 25 shows a 2 -dimensional image of the interpolated 3 -dimensional surface mesh in Fig. 22,
- Fig.26 shows a 2-dimensional image of transformed object data points of the object shown in Fig. 21 based upon the object triangle shown in Fig. 21,
- Fig. 27 shows a set of characterising data comprising either one or two subsets
- Fig. 28 shows a 2-dimensional 8 8 and a 2-dimensional 4 x 4 DCT basis function
- Fig. 29 shows the comparison of 3 subsets of test data to 1735 subsets of training data based on YCbCr Y-only (i.e. luminosity-only) binary strings, and a match existing between Subset 3 of the training data and Subset 2 of the test data,
- Fig. 30 shows an RGB representation of a set of object data points for another object, containing a distinct surface pattern, also showing the location of an object triangle, shown with respect to the new axis system,
- Fig. 31 shows the object in Fig. 30 after transformation using the transformation function, in which the coordinates of the object data points in Fig. 30 are transformed into transformed object data points, with new coordinates, shown with respect to the new axis system
- Fig. 32 shows the start of the scaling transformation function as applied to the transformed object data in Fig. 31 , with the transformed object triangle and the predefined equilateral triangle shown with respect to the new coordinates system, where a predefined side of the predefined triangle is parallel to the new X' axis, and centred about the new coordinates system,
- Fig. 33 shows Step A of the scaling transformation function in which the object data is scaled such that the reference side of the object triangle is of same length as a predefined side of the predefined triangle
- Fig. 34 shows Step B of the scaling transformation function in which the object data is scaled in the new Y ' axis direction such that the perpendicular distance between the reference side of the object triangle and the third apex of the object triangle and perpendicular distance between the predefined side of the predefined triangle and the third apex of the predefined triangle are the same, and
- Fig. 35 shows Step G of the scaling transformation function in which the object data is sheared along the new X' axis.
- FIG. 1 an object recognition method according to a first embodiment of the present invention will be described based, for simplicity, on a single cubic object 1 residing in a single scene 2 in a 3 -dimensional cartesian space defined by an X, Y, Z axis system passing through an origin 3.
- Fig. 1 an object recognition method according to a first embodiment of the present invention will be described based, for simplicity, on a single cubic object 1 residing in a single scene 2 in a 3 -dimensional cartesian space defined by an X, Y, Z axis system passing through an origin 3.
- FIG. 1 also shows two alternative viewpoints 4 and 5 of the object 1 , each with respective viewpoint positions 6 and 7 and respective viewpoint orientations 8 and 9 in the space. It will be recognised by those skilled in the art of object recognition that the same object recognition method could also be employed in the much more complex environment in which multiple different objects reside in one or more different scenes, and for much more complex objects such as those with many locations of high curvature or complex surface texture (pattern).
- the method herein described can therefore be "looped through" for more than one scene, for more than one viewpoint and for more than one object.
- a vision system of an industrial robot it might be necessary in a particular application for a vision system of an industrial robot to be able to recognise a set of three objects, say objects Ol,
- the vision system could be arranged to successively view Ol in a first scene from a number of viewpoint positions and orientations in that scene, then view 02 in a second scene also from a number of viewpoint positions and orientations, and then view 03 in the third scene also from a number of viewpoint positions and orientations.
- the one or more sets of characterising data so generated for each object according to the present invention would then constitute the "training data”.
- the one or more scenes each containing the one or more objects need not be real-life scenes as viewed by a camera-based vision system. Rather the "views" of the various scene/object combinations could be computer software generated based on 3- , dimensional models using relevant viewing utilities in well known 3D design packages such as Catia, ProEngineer, SolidWorks or Autocad.
- the capability of the object recognition system of the industrial robot could be measured by arranging the (or another) vision system to view a different scene comprising one Or more of the objects Ol, 02 and 03, positioned and orientated differently to before, during a "test data generations stage". Also, during this stage, potentially previous "untrained” objects could be introduced in an attempt to "contaminate” the scene. The one or more sets of characterising data so generated for each object would constitute the "test data”.
- the method according to the present invention allows the segmentation of the sets of characterising data pertinent to a particular object (say 02), and then the comparison of this "test data" with the previously generated “training data” for that object (say 02) using a comparison algorithm, in order to quantifiably measure the "recognition level' of that object.
- relevant test data need not be generated using a a camera-based vision system. Rather the "views" of the various scene/object combinations could be computer software generated based on 3 -dimensional solid models.
- Procedure A comprises of 2 steps:
- a digital camera (or other digital imaging system) at an arbitrary original viewpoint 4, with a viewpoint position 6 and a viewpoint orientation 8 in space, to image the scene 2, with known optics parameters, for example a view width angle 10 and a view height angle 11. Also set a 2-dimensional pixel resolution for the camera.
- Each new viewpoint, during each pass through this loop, must be at a known relative position and orientation from the original viewpoint 4.
- This loop firstly involves generation of a 2D image 12 of object 1 as shown in Fig. 3, comprising an array of pixels, each pixel 14 (for example) corresponding to a viewable point 13 (for example) in the scene 2.
- Viewable points are shown here in Fig. 3 as all being positioned on the surface of the object 1. However, in the general sense, viewable points can also be located somewhere else in the scene 2, not on the object 1.
- each pixel has associated with it Red-Green-Blue (RBG) light intensity values
- RBG Red-Green-Blue
- other well known colour or monochrome light intensity value parameter sets could also be employed, for example, for colour representation in the image, the well known YCbCr light intensity value model.
- a range distance 15 of each viewable point 13 from the viewpoint 4 is now calculated using a parallax technique. This is facilitated by creating at least two 2D sub-images 12A & 12B from two alternative laterally separated viewpoints, each alternative viewpoint itself of known position and orientation and, by implication, offset by a predetermined separation distance. Then, either sequentially or using different cameras, the range distance 15 can be calculated by techniques well known in the art, using the parallax offset between the different positions of pixel 14 in the two sub-images 12A & 12B of the corresponding viewable point 13.
- the calculation of the range distance 15 involves creation of a contrast map of the 2D image (refer to Fig. 4a), and then performing sub-pixel edge detection on this contrast map (refer to Fig. 4b). This may be repeated for a number of pixel value lateral offset positions, adjusting the position in the direction of the sub-viewpoint separation distance, of pixel values in one sub -image (refer to Fig. 4c), testing the sub-image associated with each sub-viewpoints, and measuring how similar they are (refer to Fig. 4d).
- the range distance 15 between the viewable > point 13 on the surface of the object 1 and the viewpoint 4 can be calculated based upon their pixel value offset separation in the two sub-image 12A & 12B.
- a depth map as shown in Fig. 5 can now be generated based on the range distances calculated for each of the viewable points 13 as visible from the viewpoint 4.
- the cartesian coordinates (x, y, z) in the space of each viewable point 13 in scene 2 can no w be calculated based on the location of the- corresponding pixel 14 in the array, its corresponding range distance 1 , viewpoint position 6 and viewpoint orientation 8 in the space, and the viewing properties of the viewpoint 4 such as the relevant view width angle 10 and the view height angle 11.
- a set of 3- dimensional object data points 17 are thereby generated for object 1 residing in the scene 2 based on the "view" from the viewpoint 4 in the space as shown in Fig. 6.
- Each object data point is defined by corresponding 3 -dimensional coordinates (x, y, z) in the 3 -dimensional space and corresponding one or more light intensity values.
- the range distance values used to calculate the 3 -.dimensional (x, y, z) coordinates of the set of object data points in the space may be capped to a predefined saturation value rather, for example, than the range distance being allowed to vary from very low values (for viewable points very close to the viewpoint) up to effectively infinity (for viewable points at great distance in the far background).
- This capping of the range distance values reduces the dynamic range of the range distance values so derived, and hence allows a greater distance resolution to be achieved for viewable points closer to the viewpoint, for example viewable points associated with an objects in the foreground of the scene.
- a first data-processing is now applied to these coordinates (x, y, z) of the 3-dimensional object data points to derive a set of feature data points corresponding to object features of the object 1.
- one type of object feature specifically “corners” of the object, or regions of high curvature on the object 1, are derived as follows.
- each luminosity map pixel value is equal to the red plus green plus blue component values of the corresponding RGB map pixel.
- a luminosity contrast map shown in Fig. 9 is generated, in which each pixel value is equal to a function of the difference in luminosity between neighbouring pixels.
- a depth contrast map is generated shown in Fig. 10, in which each pixel value is equal to a function of the difference in depth between neighbouring pixels.
- a depth gradient map is generated shown in Fig. 11 , in which each pixel value has a vector associated with it which is equal to the gradient of the change in depth between neighbouring pixels.
- a depth gradient contrast map is generated shown in Fig. 12, in which each pixel value is equal to a function of the difference in depth gradient (in all directions) between neighbouring pixels.
- sets of object data points corresponding to unique objects may be isolated based upon, for example, contiguous regions of zero or near-zero depth contrast or contiguous regions of zero or near-zero luminosity contrast.
- a unique object index is assigned. It should be noted that if procedure B is executed for more than one viewpoint, the indices of the objects isolated with the indices of the objects isolated for previous viewpoints are mapped based upon the coordinates of the viewable points in the current image and the coordinates of the viewable points in previous images and their corresponding object indices.
- a function of the original object data such as the luminosity contrast map, the depth contrast map, or depth gradient contrast map, is used to identify a region 40 of contiguous object data points shown in Fig. 13, corresponding in this case to a side of the cubic object 1.
- the luminosity contrast map in Fig. 9 is chosen to be used in this case to generate the single contiguous region of zero or near-zero luminosity contrast.
- a base point is then calculated based upon one or more points contained within this region 40.
- the base point calculated is chosen i the case of this embodiment to be the centroid 41 of the region, corresponding to the average 3 -dimensional position (x, y, z) of all pixels in the region 40 - or all pixels in the contiguous region's 40 outer boundary, which happens to provide the same centroid position in this particular case.
- the coordinates (x,y,z) of the centroid 41 provides an initial feature data point for object 1.
- Additional object features may now be calculated based upon a function of the relative position of the centroid 41 and the coordinates of all other points in the region 40. Sequentially, taking adjacent points, the position (x,y,z) of every point 42, 43, 44 on the outer boundary of the region 40 is then compared to the position (x, y, z) of the centroid 41 measuring, for example, a distance 45, and it is noted where relative mdnimas and maximas occur in this distance 45 on the outer boundary of the region 40. Note that, in other embodiments, a routine could look for relative minimas or maximas oh the outer boundary of the region 40 without respect to this distance 45 or the centroid 41. In still other embodiments the relative minimas or maximas could be looked for in the surface of the contiguous region 40 with respect to the centroid 41, or the relative minimas and maximas could be looked for in the surface of the contiguous region 40.
- the identified feature data points 18 are object features corresponding to virtual comers of the object 1.
- two sets of 3-dimensional "data points” are created: (a) a set of object data points where each object data point is defined by the 3- dimensional coordinates (x, y, z) of the respective viewable points 13 in the image 12 pertaining to the particular object 1 when viewed from this particular viewpoint 4 in the space, plus its associated RBG light intensity values, and (b) a set of feature data points, each defined by a 3-dimensional coordinate, and corresponding to the object features of the object - the virtual corners 18 plus the centroids 41 in this case for each of the identified regions 40.
- each of the obj ects combinations of the set of feature data points 18 (virtual comers in this case) pertaining to object 1 are now grouped as multiple sets of three feature data points, where each group of three feature data points constitutes an "object triangle" defined by three apexes, with the coordinates of the apexes equivalent to the coordinates of the respective feature data points.
- object triangles defined by three apexes
- the three apexes 19, 20 and 21 define the object triangle 22.
- the possible obj ect triangles, four in the case of Fig. 15, are differently 3 -dimensionally orientated in the space, although all four reside in a single plane in this case.
- the 3 -dimensional orientation of reference plane 26 which is coplanar with object triangle 22 is fully defined by the 3-dimensional coordinates of the three respective apexes 19, 20 and 21.
- the 3-dimensional orientations of each of the three reference sides 23, 24 and 25 of object triangle 22 are also each defined by the 3-dimensional coordinates of two of the three apexes 19, 20 and 21 of object triangle 22, for example the 3-dimensional orientation of reference side 23 is defined by the 3-dimensional coordinates of apexes 20 and 21.
- a unique transformation index assigned, and the coordinates of the set of object data points are transformed into transformed object data points with respective new coordinates in a new axis system, where the transformation function is a function of the coordinates of the three feature data points, and hence the corresponding coordinates of the three apexes 19, 20 and 21 of the particular object triangle 22.
- the transformation performed here is preferably carried out at a computer hardware or firmware level, for example by using a PC 3D graphics card.
- each transformation function may involve use of an affine transformatio or a transformation function used within in a standard geometric hashing algorithm.
- translation and rotation in 3D may be implemented (3D rigid transformation) where the set of three feature data points define the plane of rotation and translation of the set of object data points, or translation, rotation and scaling in 3D may be implemented where the set of three feature data points define the translation, rotation, and scaring of the set of object data points.
- each transformation function in this first embodiment is a function of the orientation of the particular reference side 23, 24 or 25 and the orientation of the reference plane 26 of the respective object triangle 22.
- the transfonnation function can therefore be considered as a function of a cyclic permutation of the set of three feature data points (i.e. a cyclically ordered set of the three feature data points), these three feature data points corresponding to the three apexes 19, 20 and 21 of the given object triangle 22 - since the three apexes 19, 20 and 21 define the orientation of the reference plane 26 and the cyclically ordered three sets of two apexes (i.e. apexes 21 - 20, 20 - 19, and 19 - 21) define the orientation of the respective three reference sides 23, 24 and 25.
- the transformation function comprises, shifting the position of the origin to a new origin position and reorientating the axis system to a new axis orientation such that, in the new axis system ( ⁇ ', ⁇ ', ⁇ '), defined by a first new axis ( ⁇ '), a second new axis (V), and a third new axis ( ⁇ '), the new axis orientation, and new origin position, are a function of the apexes of the respective object triangle, and hence the set of three feature data points corresponding to these apexes.
- the origin 3 is shifted to a new origin position 27 and reorientated to a new axis orientation 28 such that, in the new X', Y', Z' axis system, a third new axis ( ⁇ ') is aligned perpendicular to the reference plane 26 of the object triangle 22 (Fig. 17), this third new axis ( ⁇ ') passes through a mid-point 29 between the apexes 1 and 21 at the extremities of reference side 25 (Fig.
- a first new axis ( ⁇ ') of the new axis system is aligned parallel to this reference side 25, and a second new axis ( ⁇ ') of the new axis system is directed through mid-point 29 in the direction 35 of the third apex 20 of the object triangle (Fig. 19).
- the coordinates of the set of object data may be further transformed using a scaling transformation function.
- the scaling transformation function may involve scaling the object data based upon the length of reference side 25.
- the scaling transformation function may involve further transforming the set of object data, such that in the new axis system, the three apexes of the object triangle are made coincident with the three apexes of a predefined triangle.
- This scaling transformation function is demonstrated using another object, where referring to Fig. 30 and 31, this object is pentagonal, and the pentagonal object is first transformed using the transformation function illustrated in Fig. 16, 17, 18 and 19. Referring to the sequence shown in Fig.
- the three apexes of the obj ect triangle in Fig. 31 are made coincident with the three apexes of a predefined equilateral triangle 46 in the new X', Y', Z' axis system, where a predefined side 49 of the predefined triangle 46 is parallel to the new X' axis (Fig. 32), by scaling the object data such that the reference side 25 of the object triangle is of same length 47 as a predefined side 49 of the predefined triangle 46 (Fig.
- Fig. 34 the perpendicular distance 48 between the reference side of the object triangle and the third ape of the object triangle and perpendicular distance between the predefined side 49 of the predefined triangle 46 and the third apex of the predefined triangle 46 are the same (Fig. 34), and by shearing the object data along the new X' axis (Fig. 35).
- Fig. 32, 33, 34, and 35 only show the object data bound by the object triangle, but all of the object data may be transformed using this scaling transformation function.
- the coordinates (x, y, z) of the set of object data points are therefore appropriately transformed into new coordinates ( , y 1 , z') in the new X', V, Z' axis system, hence forming a set of transformed object data points as shown in Fig. 20.
- Fig. 21 shows an RGB representation of a set of object data points of a similar cubic object, the similar object containing such a distinct surface pattern, with the location of an object triangle also shown.
- an interpolated 3-dimensional surface mesh 30 is formed after the transformation of the set of object data points is carried out, defined by surface mesh points 31.
- the coordinates of each surface mesh point 31 corresponds to the coordinates of a transformed object data point 33 associated with the object 1 and the relevant axis transformation or, as in the case of surface mesh point 3 IB, corresponds to an interpolation of the coordinates of one or more of the transformed object data points 33.
- Each polygon 32 of the surface mesh 30 is formed by proximate surface mesh points 31 and 31 B and has associated with it the light intensity values (or "RGB values" in this first embodiment) derived as a function of the RGB values associated with nearby transformed object data points.
- Fig. 24 shows the interpolated 3-dimensional surface mesh 30 (as shown in Fig. 22) including light intensity values of the mesh surface polygons (as shown in Fig. 23), and also showing the location of the object triangle.
- an as yet untransformed interpolated 3-dimensional surface mesh defined by surface mesh points may be formed before the transformation of the set of object data points is carried out.
- the coordinates of each surface mesh point will correspond to the coordinates of an ' object data point associated with the object or correspond to an interpolation of the coordinates of one or more of the object data points.
- Each polygon of the surface mesh would again be formed by proximate surface mesh points and have associated with it the light intensity values (or "RGB values" in this first embodiment) derived as a. function of the RGB values associated with nearby object data points.
- the 3-dimensional surface mesh points, together with the relevant polygons, would then be transformed according to the same axis transformation as described previously.
- the 3- dimensional transformation function applied to the object data points may be equivalent to viewing the untransformed interpolated 3-dimensional surface mesh from a virtual viewpoint, wherei this virtual viewpoint is a function of the set of three feature data points.
- the transformed object data points in this case would correspond to one or more of the viewable map images defined below (a 2-dimensional viewable RGB map image, a 2-dimensional viewable depth map image, or a viewable 3-dimensional mesh image).
- a 2-dimensional viewable RGB map image shown in Fig. 25 of the surface mesh 30 shown in Fig. 24 is generated.
- the image is generated using computer graphics by generating a virtual image of the surface mesh 30 from a particular virtual viewpoint, the virtual image comprising virtual image data points 34.
- a virtual mesh viewpoint (not shown) is orientated along the Z' axis passing through the new origin position 27 corresponding to the midpoint 29 of one of the three reference sides 23, 24 or 25 (25 as shown in Fig. 23), and the X' axis aligned with the relevant reference side (25 as shown).
- the virtual image data points 34 in the 2-dimensional virtual image are generated by interpolating the (already potentially interpolated) 3-dimensional surface mesh 30 at particular coordinate intervals along the X' axis and along the Y' axis, each virtual image data point 34 corresponding to a viewable surface mesh point 36 on the surface mesh and having one or more light intensity values derived as a function of the light intensity values (or "RGB values" in this first embodiment) of the transformed object data points proximate to the point i.e. the light intensity values of the polygon of surface mesh 30 at the viewable surface mesh point 36.
- Each virtual image data point 34 corresponds to a viewable surface mesh point 36 on the surface mesh 30 and has RGB values derived as a function of the RGB values of transformed object data points proximate to the viewable surface mesh point 36, and coordinates derived as a function of the transformed coordinates of the object data points proximate to the viewable surface mesh point 36. It will be recognised by those skilled in the art of object recognition that this 2-dimensional image of the mesh may also be referred to as a "pattern".
- the viewing properties of the virtual mesh viewpoint and Z* position of the object triangle determine the viewable area of the image (i.e. the size of the object of part thereof displayed in the 2- dimensional image). Therefore, the viewing properties of the virtual mesh viewpoint may be defined, or object data may be translated on the Z' axis, or the object data may be scaled, or the new axis system ( ⁇ ', ⁇ ', ⁇ ') may be scaled, such that the image formed is dependent upon the size of the object triangle (or length of reference side 25).
- the image formed will be independent of the size of the object triangle, and thus only dependent upon the size/scale of the object data in the coordinate system, meaning the image generated of a large object may only contain a small part of the object, centered at the midpoint of the reference side of the object triangle.
- a 2-dimensional viewable depth map image (a 2-dimensional array including Z' values of each of the viewable surface mesh points 31) may also be created along with the above described 2-dimensional viewable RGB map image (a 2-dimensional array of RGB values of each of the viewable 3- dimensional surface mesh points 31).
- the virtual image data points 34 may also be referred to as "pixels”.
- a viewable 3 -dimensional mesh image may be created which encapsulates the viewable transformed object data points in its 3 -dimensional format.
- Fig. 26 shows a 2-dimensional viewable RGB map image of the set of transformed object data points (or associated interpolated 3-dimensional mesh) of the similar object shown in Fig. 21, where the transformation of the object data points for this object has occurred based upon the object triangle shown in Fig. 21.
- one or more viewable maps may be generated at a different scale. Like .above, this may also be achieved by a number of methods; the viewing properties of the virtual mesh viewpoint may be modified, or object data may be transformed (translated) on the Z' axis, or the new axis system ( ⁇ ', ⁇ ', Z") may be scaled.
- a set of characterising data is then generated based on the transformed object data points and the associated viewable map images - in the case of this first embodiment, the viewable RGB map image and the viewable depth map image.
- the set of characterising data comprises the respective unique object index and the unique transformation index, the transformed object data points themselves, the one or more viewable map images, along with one or more functions of one or more of the viewable map images.
- a "function of one or more of the viewable map images" is henceforth referred to in this specification as a "subsef of the set of characterising data.
- the method of second data-processing, of the transformed object data points will determine the format of the set of characterising data generated, and this in turn is dependent on which type of comparison algorithm is eventually used to compare the training data and test data.
- a “database comparison algorithm” will be used, and this makes use of an ordinary database structure where fast indexing and lookup may only be performed on a single key (or row/dimension within the database).
- the subset is generated as a 1 -dimensional binary string and is used for database indexing purposes.
- the database comparison algorithm is described after the end of procedure A in this specification.
- the function of one or more of the viewable map images may involve an image processing function to generate the subset, such as a colour saturation contrast map or gaussian filter of the viewable RGB map and/or viewable depth map image.
- this image processing function should result in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
- the function of one or more of the viewable map images may involve a neural network or decision tree to generate the subset.
- the function of one or more of the viewable map images involves a spatial convolution function.
- Binned coefficient values of a spatial convolution of the viewable RGB map image, and binned coefficient values of a spatial convolution of the viewable depth map image may be recorded as the subset.
- the subset represented as a 1 -dimensional binary string, may contain, for example, 64 x 3 bits, representing a 4 x 4 array of spatial convolution coefficient values binned at a 4 bit resolution, for each light intensity value type (e.g. RGB) or function thereof (e.g. YCbCr).
- the spatial convolution technique utilised may be a low resolution DCT (discreet cosine transformation) such is implemented in image compression algorithms well known to the art of image processing (e.g. JPEG). Note that, as per this embodiment, it is useful to convert data from the RGB color model to YCbCr color model for the purposes of generating the DCT coefficients.
- Fig. 28 shows 2 -dimensional 8 8 and 4 DCT basis functions for such a DCT calculation, well known in the art of image data processing, and an integral part of the universally used JPEG and MPEG data compression algorithms.
- a higher order DCT may be performed (e.g. 8 x 8), and only a selection of the coefficients (e.g. 7 higher order coefficients out of the 64 coefficients) may be binned and used in the formation of the subset.
- the exact coefficients selected from a given DCT may depend upon a number of factors, including the light intensity value type, variations in object lighting conditions, and recognition accuracy and repeatability. For example, the highest order coefficient may be dropped to cater for variations in lighting conditions.
- additional bits maybe used to generate the subset, unrelated to the viewable map data.
- Fig. 27 shows four alternative possible structures: Fig. 27, with Option #1, shows a set of characterising data comprising a 192 bit long Subset A, which has been generated based upon the viewable RGB map image.
- Fig. 27, with Option #2 shows an alternative set of characterising data comprising two subsets, 192 bit long Subset B and 192 bit long Subset C, Subset B generated based upon the viewable RGB map image and Subset C generated based upon the viewable RGB map image also.
- Fig. 27, with Option #3 shows a still alternative set Of characterising data comprising a 384 bit long Subset D, which has been generated based upon both the viewable RGB map image and the viewable depth map image.
- Fig. 27, with Option #4 shows a still alternative set of characterising data comprising a 64 bit long Subset E, in which colour information is ignored in the generation of the subset.
- Subset E has been generated based upon the viewable RGB map image, but only the Y (luminosity) DCT coefficients derived therefrom have been unused in the formation of the subset.
- the raw DCT coefficients may be stored separately in the set of characterising data along with subset (binned coefficient values) for later verification purposes only.
- one or more bits of the 1 -dimensional binary string may be generated as a function of the spatial convolution coefficient values.
- This function could be a neural network, or decision tree, where these are input data to the neural network(s) or decision tree(s).
- a subset could be generated using a neural network or decision tree directly in conjunction with the viewable RGB map image or the viewable depth map image to produce one or more bits of the 1 -dimensional binary string.
- a subset could be generated using an image processing function (such as a Gaussian filter) of the viewable RGB map image, or the viewable depth map image where it has been configured to result in significant quantization (spatial resolution and intensity resolution reduction) of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
- the quantization should ideally be independent of aliasing or the pixel map structure, and produce a repeatable (the same) output with slight variations to the input maps, and hence ensure me training data will match the test data.
- DCT coefficient values could be stored separately without the need for a 1 -dimensional binary string.
- Each set of characterising data generated according to the present invention is used for the purposes of object recognition.
- the characterising data generating procedure according to the present invention can be executed sequentially for multiple scenes, each with multiple objects and viewpoints.
- Training data can be initially generated during a training data generation stage and this training data later compared with test data, also generated using the same procedure during a test data generation stage, and a comparison algorithm then employed to compare the training data and test data, and enabling the recognition of one or more of the test objects.
- the one or more sets of characterising data pertaining to the test data are compared with the one or more sets of characterising data pertaining to the training data.
- a network based artificial intelligence algorithm maybe used to train one or more networks (such as neural networks or decision trees) with the training data.
- networks such as neural networks or decision trees
- the input values into the network are the values of one or more
- the network based artificial intelligence algorithm is used to test the network against the test data object(s), thereby determining whether or not the testing object(s) is (or are) recognised and which of the trained objects this (or these) test data object(s) corresponds to.
- an efficient method is implemented using a database comparison algorithm, where an indexed database is created of training data, and where the subset of each set of characterising data of the training data is used to index, and hence identify, the corresponding sets of characterising data stored in the database.
- the training data may be stored in an indexed database or a multi-dimensional hash table, where the index chosen may be the subset itself or another function of
- the set of characterising data When a set of characterising data being generated is training data, two or more subsets of the set of characterising data may be formed, each with slight variations to the binning of the coordinates of the transformed object data points, and/or the binning of the DCT coefficients for example. When the one or more sets of characterising data currently being generated is test data, each set of characterising data will only include a single subset.
- Fig. 27 with Option #2 shows a set of characterising data comprising two subsets, 192 bit long Subset B and 192 bit long Subset C.
- the second of the two subsets has been modified slightly compared to Subset B, such that one of its 4 x 4 (16 total) 4-bit binned DCT coefficient values for the Y (luminosity) "colour” has been increased arbitrarily by 1.
- a value of 0100 for bits 1 - 4 i.e. a value of 4 expressed in base 10
- corresponding to the top-left (lowest frequency) binned DCT coefficient value would increase by 1 to a value of 0101 (a value of 5 expressed in base 10).
- Similar other multiple subsets could also be included in this set of characterising data (not shown), in which other binned coefficient values for any of the three "colours" Y, Cb or Cr, are increased in value by 1 or decreased in value by 1.
- the set of characterising data generated according to the present invention is used for purposes of object recognition, and this involves the matching of one or more sets of characterising data in the test data, with one or more sets of characterising data in the training data, using a comparison algorithm.
- the comparison algorithm comprises a database search or database lookup.
- the comparison algorithm first involves the matching of one or more subsets of the set of characterising data in the test data, with one or more subsets of the set of characterising data in the training data, by performing a database lookup using the subset as an index.
- Fig. 29 shows diagrammatically the comparison of 3 subsets of test data to 1735 subsets of training data with the subsets generated as YCbCr Y (luminosity) binary strings (i.e. similar to the Subset E previously referred to), generating one match: a match between Subset #3 of the training data and Subset #2 of the test data.
- the comparison algorithm may then also compare the previously mentioned viewable map images (or the 3 -dimensional viewable map image) stored in the
- Step 1 in this second embodiment specifically relates to the computer executable step of receiving of an input of scene data for a scene containing the object in a 3- dimensional space defined by an axis system.
- the input of scene data may be a provided by a conventional 3-dimensional imaging system, for example two stereoseopically arranged (laterally offset) digital cameras, or alternatively may be provided by other more complex systems, for example a
- scanning laser interferometric imaging system a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (MRI) scanning systems used in 3-dimensional medical imaging applications.
- CT computed tomographic
- MRI magnetic resonance imaging
- the input scene data could also be computer software generated based upon one or more 3-dimensional models.
- a first computer and associated scene input data system could be used to generate the training data for one or more scenes containing the object during the training data generation stage, and a second computer and another associated scene input data system could be used to generate the test data for one or more other scenes also containing the object during the later test data generation stage.
- One of these computers, or a third computer could be used to compare the training data and the test data, and generate an output of comparison data in the form of a rating (e.g. a numerical recognition rating) or a more simple "yes/no" indication (e.g. a graphic, an indicator light, or an acoustic '"beep") corresponding to the recognition of the.
- first and second computers and the associated scene input data systems, being a common system, and that common computer also being used to compare the training data and the test data and thereby generate an output of the above described comparison data.
- a computer network or cloud could be employed to perform the generation of training data, and/or the generation of test data, and/or the data comparison, where the computer network or cloud could be connected to one or more devices for imaging one or more scenes.
- the scene data input system employed according to this fifth embodiment may be a conventional 3- dimensional imaging system, for example two stereoscopically arranged (laterally offset) digital cameras, or alternatively may comprise other more complex systems, for example a scanning laser interferometric imaging system, a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (M I) scanning systems used in 3- dimensional medical imaging applications.
- the input scene data could also be computer software generated based upon one or more 3-dimensional models.
- the invention may be embodied using devices conforming to other network standards and for other applications, including/for example other WLAN standards and other wireless standards.
- wireless and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a non-solid medium. The term does not imply that the associated devices do not contain any wires, although in some embodiments they might not. In the context of this document, the term “wired” and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a solid medium. The term does not imply that the associated devices are coupled by electrically conductive wires.
- calculating refers to the action and/or processes of a computer ⁇ computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.
- processor may refer to any device or portion of a device that processes electronic data, e.g., from registers and/or memory to transform that electronic data into other electronic data that, e.g., may be stored in registers and/or memory.
- a "computer” or a “computing device” or a “computing machine” or a “computing platform” may include one or more processors.
- the methodologies described herein are, in one embodiment, perforroable by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that when executed by one or more of the processors carry out at least one of the methods described herein.
- processors capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken are included.
- the processing system further may include a memory subsystem including main RAM and/or a static RAM, and/or ROM.
- a computer-readable carrier medium may form, or be included in a computer program product.
- a computer program product can be stored on a computer usable carrier medium, the computer program product comprising a computer readable program means for causing a processor to perform a method as described herein.
- the one or more processors operate as a standalone device or may be connected, e.g., networked to other processor(s), in a networked deployment, the one or more processors may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer or distributed network environment (e.g. cloud).
- the one or more processors may form a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
- each of the methods described herein is in the form of a computer-readable carrier medium carrying a set of instructions, e.g., a computer program that are for execution on one or c
- embodiments of the present invention may be embodied as a methods an apparatus such as a special purpose apparatus, an apparatus such as a data processing system, or a computer-readable carrier medium.
- the computer- readable carrier medium carries computer readable code including a set of instructions that when executed on one or more processors cause a processor or processors to implement a method.
- aspects of the present invention may take the form of a method, an entirely hardware embodiment, an entirely software embodiment or an embodiment combining soft-ware and hardware aspects.
- the present invention may take the form of carrier medium (e.g., a computer program product on a computer-readable storage medium) carrying computer-readable program code embodied in the medium.
- the software may further be transmitted or received over a network via a network interface device.
- the carrier medium is shown in an example embodiment to be a single medium, the term “carrier medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “carrier medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by one or more of the processors and that cause the one or more processors to perform any one or more of the methodologies of the present invention.
- a carrier medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
- a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method.
- an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention. Coupled
- Coupled should not be interpreted as being limitative to direct connections only.
- the terms “coupled” and “connected”, along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other.
- the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means.
- Coupled may mean that two or more elements are either in direct physical or electrical contact or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.
- a vehicle or “vehicular” or other similar term as used herein is inclusive of motor vehicles in general such as passenger automobiles including sports utility vehicles (SUV), buses, trucks, various commercial vehicles, watercraft including a variety of boats and ships, aircraft, and the like, and includes hybrid vehicles, electric vehicles,. plug-in hybrid electric vehicles, hydrogen-powered vehicles and other alternative fuel vehicles (e.g. fuels derived from resources other than petroleum).
- a hybrid vehicle is a vehicle that has two or more sources of power, for example both gasoline-powered and electric-powered vehicles.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
A method of recognising an object is described, which is to be performed on or with the aid of one or more computers, the method comprising the generation of one or more sets of characterising data as training data during an initial training data generation stage, the generation of one or more sets of characterising data as test data during a later test data generation stage, and comparison of the training data and test data using a comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises, for a scene containing the object in a 3-dimensional space defined by an axis system, deriving a set of 3-dimensional object data points for the object, each object data point defined by 3- dimensional coordinates and one or more light intensity values and, based on a first data-processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3-dimensional coordinates, and grouping one or more sets of three feature data points; and for each set of three feature data points, applying a 3- dimensional transformation function to the coordinates of one or more object data points to generate respective transformed object data points with respective new coordinates, each transformed object data point having the same associated one or more light intensity values as the object data point from which it was generated, wherein the 3-dimensional transformation function is a function of the set of three feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
Description
OBJECT RECOGNITION METHOD AND COMPUTER SYSTEM
Technical Field
The present invention relates to a method of 3 -dimensional object recognition, and to the computer executable steps, computer readable code, one or more computer readable mediums, and one or more computers adapted for such 3 -dimensional object recognition. The primary application of the invention is to the field of object recognition in 3 -dimensional space, as may be used as part of an artificial intelligence or robotics system. However, it will be appreciated that the invention is not limited to this particular field of use.
Background
Associated technologies required for 3-dimensional object recognition have already entered the commercial market, such as 3-dimensional laser scanners and other non-laser reticle projectio systems which enable a 3-dimensional object to be scanned and a 3-dimensional digital model to be generated. For example such systems enable the human face and other objects to be scanned for computer games and movie production, and for the generation of 3 -dimensional laser images in crystalline polymers.
Some current 3-dimensional object recognition systems rely upon environment-dependant conditions such as the distinct colouring on objects.
Other current 3-dimensiona object recognition systems rely upon complete 3-dimensional imaging of an object in order to create a normalised set of data pertaining to the object, for example normalisation methods employing an inferred centre of mass of an object. This method is demonstrated in the paper "A Geometric Approach to 3D object Comparison", Novotni et al, stni, pp.0167, International Conference on Shape Modelling and Applications, 2001.
Geometric hashing has been well documented as a means of recognising objects at arbitrary angles and positions within a scene, however the performance of this algorithm is dependent upon the ability to match the geometry of feature point distributions between scenes, and with a limited number of feature points per object (and thus a limited number of unique feature point distributions) this method degrades significantly when required to recognise an object against a large number of candidates. See
"Geometric Hashing; An Overview", Wolfson et al, Computing in Science and Erigineering, Vol. 4, 1997, pp 10-21.
A number of prior art patents describe 2-dimensional object recognition methods in which feature points are identified in a 2-dimensional image, and object triangles are then generated by grouping together sets of three apexes of the 2-dimensionally disposed object triangles. For example US Patent 2009/0167760 (Nokia) describes a method in which 2-dimensional images of 3-dimensional scenes containing objects are compared based on the statistical distribution of the centroids of the object triangles for different images and also their "colour" distribution, and the method then attempts to match objects in images based on similar centroidal and colour distributions in a type of hash-table. Similarly US Patent 2006/0204079 (Toshiba) describes an object recognition method in which 2- dimensional images are generated of a 3-dimensional scene containing objects. Feature points in the 2- dimensional image are identified and sets of object triangles are identified, which are then normalised, enabling 2-dimensional images of cropped regions of the normalised image to be indexed in a hash table and compared. The methods disclosed in these two prior art patents have the disadvantage that large amounts of information relating to a scene is lost when a 2-dimensional image is made of a 3 - dimensional scene, and therefore much of the "information", after normalisation or other forms of statistical analysis, is essentially lost, resulting a degraded object recognition capability.
However, prior art patents such as EP 2133838 (Omron Corp) describe a true 3-dimensional object recognition method. Feature points are identified in the 3-dimensional image, and corresponding 3- dimensionally disposed object triangles are generated, again by grouping together sets of three apexes of the 3-dimensionaUy disposed object triangles. The object triangles are then normalised and the vertices matched geometrically, based on many different spatial orientations, to pre-stored sets of object triangle dispositions. Although this is truly a 3-dimensional object recognition method, the technique disclosed has the significant disadvantage, similar to the other two prior art patents, that the "rich" information in the original 3-dimensional (in this case) image is significantly degraded, this time because the the eventual object identification method relies on comparison of only the transformed feature data points, extracted from the image, with the pre-stored data.
To minimize "information degradation", and therefore enable the method to stand the best chance of recognising objects in a scene, it is apparent that the object recognition method should operate in 3- dimensions and the data comparison should operate on the largest possible data set, rather than being limited to just the feature data points. However such methods have eluded those skilled i the art to date due to the huge amount of data processing and data management required by such methods, and thus preventing such a method to be employed as a useful "real-time" object recognition tool.
The present invention aims to overcome the identified disadvantages of the prior art object recognition methods, and yet provide a method which is "data rich" and extremely efficient, and therefore fast in execution.
Ideally the object recognition method should operate in 3 -dimensional space and operate independently of the object position and orientation, and the viewpoint position and orientation in a given scene, operate in a large range of lighting conditions and operate efficiently in a data-processing sense. The object recognition method should also operate without the need for artificial colouring to be applied to the objects, should not require an object to be viewed from all angles during either the training or testing stage. The method should make use of the appearance (i.e. the data "richness") of a scene and make use of its 3 -dimensional structure, should operate on objects without planar surfaces or sharp edges to aid identification, and should not require complete 3-dimensionally "closed" images of the object to be first obtained during the training stage.
It should be realized that the method according to the present invention, indeed almost any such methods, will reduce in performance where object features cannot be extracted for an object in a scene. This can occur in the scenario where an object being recognised has both no distinct areas of high curvature in shape, for example comers, and no distinct areas of high curvature in surface texture. However there are other well known methods to identify object features. Also some "degenerate" objects may look the same regardless of the viewing angle and may still be recognisable regardless of the accuracy and repeatability of the object feature extraction method so disclosed.
It is to be understood that, if any prior art information is referred to herein, such reference does not constitute an admission that the information forms part of the common general knowledge in the art, in Australia or any other country.
Summary of Invention
According to one aspect of the present invention, there is provided a method of recognising an object, to be performed on or with the aid of one or more computers, comprising: generating one or more sets of characterising data as training data during an initial training data generation stage; generating one or more sets of characterising data as test data during a later test data generation stage; and.comparing the training data and test data using a comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises:
Step 1 : for a scene containing die object in a 3-dimensional space defined by an axis system, deriving a set of 3-dimensional object data points for the object, each object data point defined by 3-dimensional coordinates and one or more light intensity values and, based on a first data- processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3- dimensional coordinates, and grouping one or more sets of three feature data, points; and
Step 2: for each set of three feature data points, applying a 3-dimensional transformation function to the coordinates of one or more object data points to generate respective transformed object data points with respective new coordinates, each transformed object data point having the same associated one or more light intensity values as the object data point from which it was generated, wherein the 3-dimensional transformation function is a function of the set of three feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
Advantageously, the object recognition method operates in 3 -dimensions and this minimizes the "information degradation" which would otherwise occur if the object recognition method operated in only 2-dimensions.
Advantageously, the object recognition method uses the largest possible data set, the derived object data points, rather than being limited to just using the derived set of feature data points, and this also minimizes the "information degradation" associated with using the latter much more smaller and more limited data set.
Advantageously, the object recognition method operates independently of the object position and orientation, and the viewpoint position and orientation in a given scene, and operates in a large range of lighting conditions.
Advantageously, the object recognition method operates efficiently in a data-processing sense, because the sets of characterising data generated as training data during the initial training data generation stage can be of the same format as the sets of characterising data generated as test data during the later test data stage generation stage, thereby facilitating efficient comparison of the training data and test data.
Advantageously, since each object data point is defined by 3-dimensional coordinates and also one or more light intensity values, information relating to light intensity and "colour" of data points is not lost during the subsequent data processing.
Advantageously, a set of feature data points is derived corresponding to object features of the object.
Advantageously, since each feature data point is also defined by 3-dimensional coordinates, the 3- dimensional spatial disposition of the feature data points can be used in the subsequent data processing.
Advantageously, one or more sets of three feature data points are grouped, thereby defining an associated 3-dimensional spatial disposition defined by these three feature data points.
Advantageously, the 3-dimensional transformation function is a function of the set of three feature data points, and hence the 3-dimensional spatial disposition defined by these three feature data points.
Advantageously, each transformed object data point has the same associated one or more light intensity values as the object data point from which it was generated, resulting in the information associated with the one or more light intensity values not being lost in the transformation process.
Advantageously, the second data processing of the transformed object data points is carried out to generate one of the one or more sets of characterising data, in order to generate a predetermined format for the set of characterising data.
Preferably, during the training data generation stage or during the test data generation stage, the scene containing the object is computer software generated.
Preferably, the transformation function is a function of a cyclic permutation of the set of three feature data points.
Preferably, the transformation function in Step 2 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature dat points.
Preferably, the transformation function in Step 2 also involves shearing of the one or more object data points, based upon the set of three feature data points.
Preferably, the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3 -dimensional coordinates corresponding to the respective 3 -dimensional coordinates of the set of three feature data points.
Preferably, the one or more light intensity values correspond to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
Preferably, the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
Preferably, a unique object index is assigned to the object.
Preferably, the first data-processing in Step 1 comprises subdividing the object data points into a region of contiguous object data points with similar one or more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
Preferably, each set of three feature data points are derived from a single region.
Preferably, one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
Preferably, the first function is a luminosity contrast function of the one or more light intensity values.
Preferably, the second function is a geometric 3 -dimensional centroid of the coordinates of the object data points in the region.
Preferably, the third function relates to the distance of one or more of the object data points in the region from the base point.
Preferably, the third function is a relative and/or absolute minima and/or maxima of the distance of one
or more of the object data points in the region from the base point.
Preferably, one more of the object data points in the region are on a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
Preferably, one or more feature points are derived as a function of one more object data points on a boundary of the region, based upon the local curvature of the boundary.
Preferably, the base point is derived as a feature data point.
Preferably, the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
Preferably, the first data-processing in Step 1 comprises detennining at least one point of high local curvature of the object data points, and designating the point as a feature data point.
Preferably, the first data-processing in Step 1 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data-point.
Preferably, the object data points are derived by 3-dimensiohally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3-dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3- - dimensional coordinates in the space.
Preferably, each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
Preferably, either during the calculating of the position coordinates for each viewpoint, or after the calculating of the position coordinates for all viewpoints, the object data points are isolated for the object.
Preferably, the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3- dimensional image of the object
Preferably, if the range distance of a viewable point in the scene is above a certain threshold, one of the coordinates of the corresponding object data point will be set to a predetermined maximum value.
Preferably, the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean dept gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold' to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
Preferably, the position coordinates and the orientation of each viewpoint in the space are predefined for a scene.
Preferably, at least one of the viewable points reside on a surface of the object.
Preferably, the object data points are derived by 3 -dimensional imaging of the scene.
Preferably, one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3-dimensional imaging of the scene.
Preferably, for each viewpoint, the 3-dimensional imaging comprises creating at least two 2- dimensional sub-images from two alternative viewpoints, slightly offset by a predetermined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and/or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
Preferably, the 3-dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
Preferably, the viewing properties comprise a view width angle and a view height angle.
Preferably, the first data processing in Step 1 is performed using hardware acceleration, such as available in graphics processors.
Preferably, the transformation function in Step 2 comprises shifting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system (Χ', Υ', Z1) defined by a first new axis (X a second new axis (Ύ'), and a third new axis (Ζ'), the new axis orientation and new origin position are based on the 3-dimensional orientation of the reference plane and the 3-dimensional orientation and position of the reference side of the object
triangle.
Preferably, the new axis system (Χ', Υ', Ζ') is positioned and orientated such that the third new axis (Ζ') is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis (Χ') is aligned parallel to the reference side, and the second new ■ axis (Υ') is directed through the mid-point o f the reference side in the direction of the remaining apex of the object triangle.
Preferably, the new axis system (Χ', Υ', Ζ') is also scaled or the new origin position is shifted in the Z' direction, based upon the length of the reference side.'
Preferably, the transformation function in Step 2 comprises rotating the one or more object data points based upon the 3-dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-dimensional orientation of the reference side, and translating the one or more object data points based upon the 3-dimensional position of the reference side.
Preferably, the one or more object data points are also scaled or translated, based upon the length of the reference side.
Preferably, the transformation function in Step 2 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made coincident with the apexes of a predefined triangle.
Preferably, the reference side of the object triangle is transformed to be the same length as a predefined side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
Preferably, in Step 2, a unique transformation index is assigned for the transformation function.
Preferably, the transformation function in Step 2 is performed using hardware acceleration, such as available in graphics processors.
Preferably, the second data-processing in Step 2 comprises creating an image of the transformed object data points.
Preferably, the image is created from an interpolated or non-interpolated 3-dimensional surface mesh defined by 3-dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the
coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
Preferably, the set of object data points is used to form an untransformed interpolated or non- interpolated 3-dimensional surface mesh defined by 3-dimensional surface mesh points, the 3- dimensional coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
Preferably, the surface mesh points of the untransformed interpolated or non-interpolated 3- dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non.-interpolated 3-dimensional surface mesh.
Preferably, the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
Preferably, the image comprises a depth map or a function of the depth map calculated based upon the 3-dimensional coordinates of the transformed object data points.
Preferably, the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3- dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
Preferably, the image is a 2-dimensional image.
Preferably, the image is a 3-dimensional image.
Preferably, the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
Preferably, the function of the image is a spatial convolution function, and this is applied to the image to generate one or more convolution coefficients, or a function of the one or more convolution coefficients.
Preferably, the function of the one or more convolution coefficients is a binning function.
Preferably, the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
Preferably, each of the one or more subsets are represented as a 1 -dimensional binary string.
Preferably, the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
Preferably, a new axis system is defined by a first new axis (Χ'), second new axis (Υ') and third new axis (Ζ'), a new axis orientation and a new origin position being a function of the set of three feature data points, the viewable surface mesh points are generated by interpolating the surface mesh at particular coordinate intervals along the first new axis (Χ') and along the second new axis (Υ'), or along the first new axis (Χ') and along the second new axis (Y1) and along the third new axis (Ζ'), each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
Preferably, the second data processing in Step 2 is performed using hardware acceleration, such as available in graphics processors. "
Preferably, the set of characterising data in Step 2 also includes data relating to the object index.
Preferably, the set of characterising data in Step 2 also includes data relating to the transformation index.
Preferably, the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
Preferably, the comparison algorithm comprises usi g a neural network to determine the degree of
equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
Preferably, the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
Preferably, the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
Preferably, the function of the image of the transformed object data points is a neural network or a decision tree.
Preferably, the function of the one or more convolution coefficients is a neural network or a decision tree.
Preferably, the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
Preferably,_the transformation function in Step 2 is a geometric hashing function or an affme transformation.
Preferablys the set of characterising data also includes one or more object data points.
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database. .
Preferably, a first index or set of indices for the indexed database is a function of the respective set of characterising data.
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional hash-table
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database
Preferably, a first inde or set of indices for the indexed database is one of the one or more subsets of the respective set of characterising data.
Preferably, a set of characterising data generated as test data is compared with the.training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
Preferably, a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
According to another aspect of the present invention, there is provided computer executable steps for recognising an object, to be performed on or with the aid of one or more computers, comprising:
generating one or more sets of characterising data as training data during an initial training data generation stage; generating one or more sets of characterising data as test data during a later test data generation stage; and calculating comparison data as a function of the training data and the test data using a comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises:
1. receiving an input of scene data for a scene containing the object in a 3-dimensional space defined by an axis system;
2. deriving a set of 3-dimensional object data points for the object, each object data point defined by 3-dimensional coordinates and one or more light intensity values and, based on a first data- processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3- dimensional coordinates, and grouping one or more sets of three feature data points; and
3. for each set of three feature data points, applying a 3-dimensional transformation function to the coordinates of one or more object data points to generate respective transformed object data points with respective new coordinates, each transformed object data point having the same associated one or more light intensity values as the object data point from which it was generated, wherein the 3 -dimensional transformation function is a function of the set of three
feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
Advantageously, the computer executable steps for recognising an object operate in 3-dimensions and this minimizes the "information degradation" which would otherwise occur if the object recognition method operated in only 2-dimensions.
Advantageously, the computer executable steps for recognising an object use the largest possible data set, the derived object data points, rather than being limited to just using the derived set of feature data points, and this also nrininiizes the "information degradation" associated with using the latter much more smaller and more limited data set.
Advantageously, the computer executable steps for recognising an object operate independently of the object position and orientation, and the viewpoint position and orientation in a given scene, and operates in a large range of lighting conditions.
Advantageously, the computer executable steps for recognising an object operate efficiently in a data- processing sense, because the sets of characterising data generated as training data during the initial training data generation stage can be of the same format as the sets of characterising data generated as test data during the later test data stage generation stage, thereby facilitating efficient comparison of the training data and test data.
Advantageously, since each object data point is defined by 3-dimensional coordinates and also one or more light intensity values, information relating to light intensity and "colour" of data points is not lost during the subsequent data processing.
Advantageously, a set of feature data points is derived corresponding to object features of the object.
Advantageously, since each feature data point is also defined by 3-dimensional coordinates, the 3- dimensional spatial disposition of the feature data points can be used in the subsequent data processing.
Advantageously, one or more sets of three feature data points are grouped, thereby defining an associated 3-dimensional spatial disposition defined by these three feature data points.
Advantageously, the 3-dimensional transformation function is a function of the set of three feature data points, and hence the 3-dimensional spatial disposition defined by these three feature data points;
Advantageously, each transformed object data point has the same associated one or more light intensity
values as the object data point from which it was generated, resulting in the information associated with the one or more light intensity values not being lost in the transformation process.
Advantageously, the second data processing of the transformed object data points is carried out to generate one of the one or more sets of characterising data, in order to generate a predetermined format for the set of characterising data.
Preferably, during the training data generation stage or during the test data generation stage, the scene containing the object is computer software generated.
Preferably, the transformation function is a function of a cyclic permutation of the set of three feature data points.
Preferably, the transformation function in Step 3 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature data points.
Preferably, the transformation function in Step 3 also involves shearing of the one or more object data points, based upon the set of three feature data points.
Preferably, the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3-dimensional coordinates corresponding to the respective 3-dimensional coordinates of the set of three feature data points.
Preferably, the one or more light intensity values correspond to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
Preferably, the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
Preferably, a unique object index is assigned to the object
Preferably, the first data-processing in Step 2 comprises subdividing the object data points into a region of contiguous object data points with similar one or more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
Preferably, each set of three feature data points are derived from a single region.
Preferably, one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
Preferably, the first function is a luminosity contrast function of the one or more light intensity values.
Preferably, the second function is a geometric 3-dimensional centroid of the coordinates of the object data points in the region.
Preferably, the third function relates to the distance of one or more of the object data points in the region from the base point. .
Preferably, the third function is a relative and or absolute minima and/or maxima of the distance of one or more of the object data points in the region from the base point.
Preferably, one more of the object data points in the region are on a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
Preferably, one or more feature points are derived as a function of one more object data points on a boundary of the region, based upon the local curvature of the boundary.
Preferably, the base point is derived as a feature data point
Preferably, the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
Preferably, the first data-processing in Step 2 comprises determining at least one point of high local curvature of the object data points, and designating the point as a feature data point.
Preferably, the first data-processing in Step 2 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data point
Preferably, the object data points are derived by 3-dimensionally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3-dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3- dimensional coordinates in the space.
Preferably, each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
Preferably, either during the calculating of the position coordinates for each viewpoint, or after the calculating of the position coordinates for all viewpoints, the object data points are isolated for the object.
Preferably, the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3- dimensiond image of the object.
Preferably, if the range distance of a viewable point in the scene is above a certain threshold, one of the coordinates of the corresponding object data point will be set to a predetermined maximum value.
Preferably, the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean depth gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
Preferably, the position coordinates and the orientation of each viewpoint in the space are predefined for a scene.
Preferably, at least one of the viewable points reside on a surface of the object.
Preferably, the object data points are derived by 3 -dimensional imaging of the scene.
Preferably, one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3-dimensional imaging of the scene.
*
Preferably, for each viewpoint, the 3-dimensional imaging comprises creating at least two 2- dimensiohal sub-images from two alternative viewpoints, slightly offset by a predetermined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and/or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
Preferably, the 3 -dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
Preferably, the viewing properties comprise a view width angle and a view height angle.
Preferably, the first data processing in Step 2 is performed using hardware acceleration, such as available in graphics processors.
Preferably, the transformation function in Step 3 comprises sMfting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system (X\ Y\ Z') defined by a first new axis (Χ'), a second new axis (Υ'), and a third new axis (Ζ'), the new axis orientation and new origin position are based on the 3 -dimensional orientation of the reference plane and the 3 -dimensional orientation and position of the reference side of the object triangle.
Preferably, the new axis system (Χ', Υ', Ζ') is positioned and orientated such that the third new axis (Z1) is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis (Χ') is aligned parallel to the reference side, and the second new axis (Y*) is directed through the mid-point of the reference side in the direction of the remaining apex of the object triangle.
Preferably, the new axis system Χ', Υ', Ζ') is also scaled or the hew origin position is shifted in the Z' direction, based upon the length of the reference side.
Preferably, the transformation function in Step 3 comprises rotating the one or more object data points based upon the 3 -dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-dimensional orientation of the reference side, and translating the one or more object data points based upon the 3-dimensional position of the reference side.
Preferably, the one or more object data points are also scaled or translated, based upon the length of the reference side.
Preferably, the transformation function in Step 3 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made coincident with the apexes of a predefined triangle.
Preferably, the reference side of the object triangle is transformed to be the same length as a predefined
side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
Preferably, in Step 3, a unique transformation index is assigned for the transformation function.
Preferably, the transformation function in Step 3 is performed using hardware acceleration, such as available in graphics processors.
Preferably, the second data-processing in Step 3 comprises creating an image of the transformed object data points.
Preferably, the image is created from an interpolated or non-interpolated 3 -dimensional surface mesh defined by 3 -dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
Preferably, the set of object data points is used to form an un transformed interpolated or non- interpolated 3 -dimensional surface mesh defined by 3 -dimensional surface mesh points, the 3 - dimensional coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
Preferably, the surface mesh points of the untransformed interpolated or non-interpolated 3- dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non-interpolated 3-dimensional surface mesh.
Preferably, the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
Preferably, the image comprises a depth map or a function of the depth map calculated based upon the
3 -dimensional coordinates of the transformed object data points.
Preferably, the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3 - dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
Preferably, the image is a 2-dimensional image.
Preferably, the image is a 3-dimensional image.
Preferably, the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
Preferably, the function of the image is a spacial convolution function, and this is applied to the image to generate one or more convolution coefficients, or a function of the one or more convolution coefficients.
Preferably, the function of the one or more convolution coefficients is a binning function .
Preferably, the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
Preferably, each of the one or more subsets are represented as a 1 -dimensional binary string.
Preferably, the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
Preferably, a new axis system is defined by a first new axis (Χ'), second new axis (Y1) and third new axis (Ζ') , a new axis orientation and a new origin position being a function o f the set of three feature data points, the viewable surface mesh points are generated by interpolating the surface mesh at partieular coordinate intervals along the first new axis (Χ') and along the second new axis (V), or along the first new axis (Χ') and along the second new axis (Υ') and along the third new axis (Ζ'), each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of
the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
Preferably, the second data processing in Step 3 is performed using hardware acceleration, such as available in graphics processors.
Preferably, the set of characterising data in Step 3 also includes data relating to the object index.
Preferably, the set of characterising data in Step 3 also includes data relating to the transformation index.
Preferably, the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
Preferably, the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
Preferably, the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
Preferably, the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
Preferably, the function of the image of the transformed object data points is a neural network or a decision tree.
Preferably, the function of the one or more convolution coefficients is a neural network or a decision tree.
Preferably, the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial
or intensity variations in the input data.
Preferably, the transformation function in Step 3 is a geometric hashing function or an affine transformation.
Preferably, the set of characterising data also includes one or more object data points.
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
Preferably, a first index or set of indices for the indexed database is a function of the respective set of characterising data.
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional hash-table.
Preferably, the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
Preferably, a first index or set of indices for the indexed database is one of the one or more subsets of the respective set of characterising data.
Preferably, a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
Preferably, a set of characterising data generated as test data is compared with the training data by perforating a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
According to another aspect of the present invention, there is provided computer readable code for carrying out the computer executable steps of any one of the preceding paragraphs when run on the one or more computers.
According to another aspect of the present invention, there is provided one or more computer readable mediums comprising the computer readable code of any one of the preceding paragraphs.
According to another aspect of the present invention, mere is provided one or more computers adapted for recognising an object, comprising:
- respective one or more processors;
- respective computer readable memory operatively connected to the respective one or more processors and storing the computer readable code of any one of the preceding paragraphs ;
- respective one or more scene data input systems for receiving the scene data; and
- an output system for outputting the comparison data.
Advantageously, if more than one computer are employed during the training data generation stage, more than one scene data input system can be employed during this stage for receiving the scene data, resulting in a correspondingly "richer" data set.
Advantageously, if more than one computer are employed during the test data generation stage, more than one scene data input system can be employed during this stage for receiving the scene data, resulting in a correspondingly "richer" data set.
Advantageously, the scene data input systems employed according to the present invention may be a conventional 3 -dimensional imaging system, for example two stereoscopically arranged (laterally offset) digital cameras, or alternatively may comprise other more complex systems, for example a scanning laser interferometric imaging system, a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (MRI) scanning systems used in 3-dimensional medical imaging applications.
Advantageously one of the computers has an output system for outputting the comparison data.
Advantageously, the comparison data may be in the form of a rating (e.g. a numerical recognition rating) or a more simple "yes/no" indication (e.g. a graphic, an indicator light, or an acoustic "beep") corresponding to the recognition of the object.
Brief Description of Drawings
Fig. 1 shows a cubic object residing a 3-dimensional space and viewed from two viewpoints,
Fig. 2 shows a flow diagram of the data generation method according to the present invention,
Fig. 3 shows a 2-dimensional image produced from one of the viewpoints in Fig. 1,
Fig. 4 shows diagrammatieally a method of calculating the distance of a viewable point from a viewpoint using a parallax technique,
Fig. 5 shows a depth map based on the image in Fig. 3 and other depth data,
Fig. 6 shows the object data points produced from the image in Fig. 3,
Fig. 7 shows an RGB map of the image in Fig. 3,
Pig. 8 shows a luminosity map based on the RGB map in Fig. 7,
Fig. 9 shows a luminosity contrast map based on the luminosity map in Fig. 8,
Fig. 10 shows a depth contrast map based on the depth map in Fig. 5,
Fig. 11 shows depth gradient map based on the depth map in Fig. 5,
Fig. 12 shows a depth gradient contrast map based on the depth gradient map in Fig. 11 and the depth contrast map in Fig. 10,
Fig. 13 shows a region based upon the luminosity contrast map in Fig. 9,
Fig. 14 shows a combination of relative minimas and maximas from the Centroidal base point of the region in Fig. 13,
Fig. 15 shows the corners map in Fig. 14, also with relevant object triangles
Fig. 16 shows the start of a 3-dimeasional transformation function with the object triangle in the original coordinates system,
Fig. 17 shows step A of me transformation function in which a third new axis (Ζ') is aligned perpendicular to me reference plane of the object triangle,
Fig. 18 shows step B of the transformation function in which the third new axis (Ζ') is positioned such that it passes through the mid-point between the two apexes at the extremities of the reference side,
Fig. 19 shows step C of the transformation function in which a first new axis (Χ') is aligned parallel to the reference side, and a second new axis (Υ') is directed through the mid-point in the direction of the third apex of the object triangle,
Fig. 20 shows step D of the transformation function in which the coordinates (x, y, z) of the object data points are transformed into transformed object data points, with new coordinates (χ', /, ζ') in the new axis system,
Fig. 21 shows an RGB representation of a set of object data points for a similar object containing a distinct surface pattern, also showing the location of an object triangle.
Fig. 22 shows an interpolated 3 -dimensional surface mesh with surface mesh points,
Fig. 23 shows an interpolated 3-dimensional surface mesh in Fig. 22 with light intensity values of the mesh surface polygons,
Fig. 24 shows the interpolated 3-dimensional surface mesh in Fig. 22 with light intensity values of the mesh surface polygons, also showing the location of the object triangle,
Fig. 25 shows a 2 -dimensional image of the interpolated 3 -dimensional surface mesh in Fig. 22,
Fig.26 shows a 2-dimensional image of transformed object data points of the object shown in Fig. 21 based upon the object triangle shown in Fig. 21,
Fig. 27 shows a set of characterising data comprising either one or two subsets,
Fig. 28 shows a 2-dimensional 8 8 and a 2-dimensional 4 x 4 DCT basis function,
Fig. 29 shows the comparison of 3 subsets of test data to 1735 subsets of training data based on YCbCr Y-only (i.e. luminosity-only) binary strings, and a match existing between Subset 3 of the training data and Subset 2 of the test data,
Fig. 30 shows an RGB representation of a set of object data points for another object, containing a distinct surface pattern, also showing the location of an object triangle, shown with respect to the new axis system,
Fig. 31 shows the object in Fig. 30 after transformation using the transformation function, in which the coordinates of the object data points in Fig. 30 are transformed into transformed object data points, with new coordinates, shown with respect to the new axis system,
Fig. 32 shows the start of the scaling transformation function as applied to the transformed object data in Fig. 31 , with the transformed object triangle and the predefined equilateral triangle shown with respect to the new coordinates system, where a predefined side of the predefined triangle is parallel to the new X' axis, and centred about the new coordinates system,
Fig. 33 shows Step A of the scaling transformation function in which the object data is scaled such that the reference side of the object triangle is of same length as a predefined side of the predefined triangle,
Fig. 34 shows Step B of the scaling transformation function in which the object data is scaled in the new Y ' axis direction such that the perpendicular distance between the reference side of the object triangle and the third apex of the object triangle and perpendicular distance between the predefined side of the predefined triangle and the third apex of the predefined triangle are the same, and
Fig. 35 shows Step G of the scaling transformation function in which the object data is sheared along the new X' axis.
Best Mode for Carrying Out the Invention
Referring to Fig. 1, an object recognition method according to a first embodiment of the present invention will be described based, for simplicity, on a single cubic object 1 residing in a single scene 2 in a 3 -dimensional cartesian space defined by an X, Y, Z axis system passing through an origin 3. Fig.
1 also shows two alternative viewpoints 4 and 5 of the object 1 , each with respective viewpoint positions 6 and 7 and respective viewpoint orientations 8 and 9 in the space. It will be recognised by those skilled in the art of object recognition that the same object recognition method could also be employed in the much more complex environment in which multiple different objects reside in one or more different scenes, and for much more complex objects such as those with many locations of high curvature or complex surface texture (pattern).
Referring to the flow diagram for the object recognition method in Fig. 2, it can be seen that the method herein described can therefore be "looped through" for more than one scene, for more than one viewpoint and for more than one object. For example, it might be necessary in a particular application for a vision system of an industrial robot to be able to recognise a set of three objects, say objects Ol,
02 and 03 in relatively complex scenes in which one or more of the objects 01, 02 and 03 are resident. During a "training data generation stage", for example, the vision system could be arranged to successively view Ol in a first scene from a number of viewpoint positions and orientations in that scene, then view 02 in a second scene also from a number of viewpoint positions and orientations, and then view 03 in the third scene also from a number of viewpoint positions and orientations. The one or more sets of characterising data so generated for each object according to the present invention would then constitute the "training data". It should be noted that the one or more scenes each containing the one or more objects need not be real-life scenes as viewed by a camera-based vision system. Rather the "views" of the various scene/object combinations could be computer software generated based on 3-, dimensional models using relevant viewing utilities in well known 3D design packages such as Catia, ProEngineer, SolidWorks or Autocad.
It should be recognised that an object recognition method according to one or more embodiments of this present invention could be used to recognise a like object, not necessarily the original object
After this training data generation stage, the capability of the object recognition system of the industrial robot could be measured by arranging the (or another) vision system to view a different scene comprising one Or more of the objects Ol, 02 and 03, positioned and orientated differently to before,
during a "test data generations stage". Also, during this stage, potentially previous "untrained" objects could be introduced in an attempt to "contaminate" the scene. The one or more sets of characterising data so generated for each object would constitute the "test data". The method according to the present invention allows the segmentation of the sets of characterising data pertinent to a particular object (say 02), and then the comparison of this "test data" with the previously generated "training data" for that object (say 02) using a comparison algorithm, in order to quantifiably measure the "recognition level' of that object. Again it should be noted that the relevant test data need not be generated using a a camera-based vision system. Rather the "views" of the various scene/object combinations could be computer software generated based on 3 -dimensional solid models.
For reason stated above, the method according to the first embodiment of the present invention will herein be described by passing through the procedure of generating a set of characterising data once, based on a single object 1 residing in a single scene 2 and viewed from a single viewpoint 4, with associated viewpoint position 6 and viewpoint orientation 8. For clarity, repeatable procedures will be identified in the description by the designation "Perform the following procedure X" and "End of procedure X".
Perform the following procedure A for each scene containing the object in a 3-dimensional space defined by an axis system.
Procedure A comprises of 2 steps:
Step l:
Set up a digital camera (or other digital imaging system) at an arbitrary original viewpoint 4, with a viewpoint position 6 and a viewpoint orientation 8 in space, to image the scene 2, with known optics parameters, for example a view width angle 10 and a view height angle 11. Also set a 2-dimensional pixel resolution for the camera.
Perform the following Procedure B for each new viewpoint.
Each new viewpoint, during each pass through this loop, must be at a known relative position and orientation from the original viewpoint 4. This loop firstly involves generation of a 2D image 12 of object 1 as shown in Fig. 3, comprising an array of pixels, each pixel 14 (for example) corresponding to a viewable point 13 (for example) in the scene 2. Viewable points are shown here in Fig. 3 as all being positioned on the surface of the object 1. However, in the general sense, viewable points can also be
located somewhere else in the scene 2, not on the object 1. In this first embodiment each pixel has associated with it Red-Green-Blue (RBG) light intensity values, however other well known colour or monochrome light intensity value parameter sets could also be employed, for example, for colour representation in the image, the well known YCbCr light intensity value model.
A range distance 15 of each viewable point 13 from the viewpoint 4 is now calculated using a parallax technique. This is facilitated by creating at least two 2D sub-images 12A & 12B from two alternative laterally separated viewpoints, each alternative viewpoint itself of known position and orientation and, by implication, offset by a predetermined separation distance. Then, either sequentially or using different cameras, the range distance 15 can be calculated by techniques well known in the art, using the parallax offset between the different positions of pixel 14 in the two sub-images 12A & 12B of the corresponding viewable point 13.
Now referring in more detail to one particular embodiment of this technique in Figs. 4a - 4d, the calculation of the range distance 15 involves creation of a contrast map of the 2D image (refer to Fig. 4a), and then performing sub-pixel edge detection on this contrast map (refer to Fig. 4b). This may be repeated for a number of pixel value lateral offset positions, adjusting the position in the direction of the sub-viewpoint separation distance, of pixel values in one sub -image (refer to Fig. 4c), testing the sub-image associated with each sub-viewpoints, and measuring how similar they are (refer to Fig. 4d). Upon locating the most similar sub-images 12A & 12B, the range distance 15 between the viewable > point 13 on the surface of the object 1 and the viewpoint 4 can be calculated based upon their pixel value offset separation in the two sub-image 12A & 12B.
A depth map as shown in Fig. 5 can now be generated based on the range distances calculated for each of the viewable points 13 as visible from the viewpoint 4.
The cartesian coordinates (x, y, z) in the space of each viewable point 13 in scene 2 can no w be calculated based on the location of the- corresponding pixel 14 in the array, its corresponding range distance 1 , viewpoint position 6 and viewpoint orientation 8 in the space, and the viewing properties of the viewpoint 4 such as the relevant view width angle 10 and the view height angle 11. A set of 3- dimensional object data points 17 are thereby generated for object 1 residing in the scene 2 based on the "view" from the viewpoint 4 in the space as shown in Fig. 6. Each object data point is defined by corresponding 3 -dimensional coordinates (x, y, z) in the 3 -dimensional space and corresponding one or more light intensity values.
It should be noted, that if for example the above described parallax technique is used to calculate range distance for each viewable point, there will be an increasingly larger absolute error in this range distance for viewable points increasingly further from the viewpoint. Consequently, in some other embodiments (not shown), the range distance values used to calculate the 3 -.dimensional (x, y, z) coordinates of the set of object data points in the space may be capped to a predefined saturation value rather, for example, than the range distance being allowed to vary from very low values (for viewable points very close to the viewpoint) up to effectively infinity (for viewable points at great distance in the far background). This capping of the range distance values reduces the dynamic range of the range distance values so derived, and hence allows a greater distance resolution to be achieved for viewable points closer to the viewpoint, for example viewable points associated with an objects in the foreground of the scene.
Returning to the first embodiment herein described, a first data-processing is now applied to these coordinates (x, y, z) of the 3-dimensional object data points to derive a set of feature data points corresponding to object features of the object 1.
Based on the light intensity values (RGB values in this case) and corresponding coordinates of each pixel position 14 in the image 12, a set of such "object features" can be now identified. In the first embodiment described herein with respect to the present invention, one type of object feature, specifically "corners" of the object, or regions of high curvature on the object 1, are derived as follows.
Using the RGB map shown in Fig. 7 (the light intensity RGB values of every pixel 14 in the image 12), a corresponding luminosity map shown in Fig. 8 is generated, in which each luminosity map pixel value is equal to the red plus green plus blue component values of the corresponding RGB map pixel.
Using the luminosity map in Fig.8, a luminosity contrast map shown in Fig. 9 is generated, in which each pixel value is equal to a function of the difference in luminosity between neighbouring pixels.
Also, using the depth map already described in reference to Fig. 5 (the range distance values of every pixel in the image 12), a depth contrast map is generated shown in Fig. 10, in which each pixel value is equal to a function of the difference in depth between neighbouring pixels.
Also, using the depth map in Fig. 5, a depth gradient map is generated shown in Fig. 11 , in which each pixel value has a vector associated with it which is equal to the gradient of the change in depth between neighbouring pixels.
Using this depth gradient map in Fig. 11, a depth gradient contrast map is generated shown in Fig. 12, in which each pixel value is equal to a function of the difference in depth gradient (in all directions) between neighbouring pixels.
Note that at this point in the procedure, in another embodiment of the present invention, sets of object data points corresponding to unique objects may be isolated based upon, for example, contiguous regions of zero or near-zero depth contrast or contiguous regions of zero or near-zero luminosity contrast. For every object isolated, a unique object index is assigned. It should be noted that if procedure B is executed for more than one viewpoint, the indices of the objects isolated with the indices of the objects isolated for previous viewpoints are mapped based upon the coordinates of the viewable points in the current image and the coordinates of the viewable points in previous images and their corresponding object indices.
Returning to the description of the first embodiment, a function of the original object data, such as the luminosity contrast map, the depth contrast map, or depth gradient contrast map, is used to identify a region 40 of contiguous object data points shown in Fig. 13, corresponding in this case to a side of the cubic object 1. The luminosity contrast map in Fig. 9 is chosen to be used in this case to generate the single contiguous region of zero or near-zero luminosity contrast.
A base point is then calculated based upon one or more points contained within this region 40. The base point calculated is chosen i the case of this embodiment to be the centroid 41 of the region, corresponding to the average 3 -dimensional position (x, y, z) of all pixels in the region 40 - or all pixels in the contiguous region's 40 outer boundary, which happens to provide the same centroid position in this particular case.
The coordinates (x,y,z) of the centroid 41 provides an initial feature data point for object 1.
Additional object features may now be calculated based upon a function of the relative position of the centroid 41 and the coordinates of all other points in the region 40. Sequentially, taking adjacent points, the position (x,y,z) of every point 42, 43, 44 on the outer boundary of the region 40 is then compared to the position (x, y, z) of the centroid 41 measuring, for example, a distance 45, and it is noted where relative mdnimas and maximas occur in this distance 45 on the outer boundary of the region 40. Note that, in other embodiments, a routine could look for relative minimas or maximas oh the outer boundary of the region 40 without respect to this distance 45 or the centroid 41. In still other embodiments the relative minimas or maximas could be looked for in the surface of the contiguous
region 40 with respect to the centroid 41, or the relative minimas and maximas could be looked for in the surface of the contiguous region 40.
These relative minimas and maximas are used to generate 3 -dimensional coordinates (x,y,z) of additional feature data points 18 as shown in Fig. 14. In this particular case, the identified feature data points 18 are object features corresponding to virtual comers of the object 1.
For each of the objects 1 isolated from this particular viewpoint 4, two sets of 3-dimensional "data points" are created: (a) a set of object data points where each object data point is defined by the 3- dimensional coordinates (x, y, z) of the respective viewable points 13 in the image 12 pertaining to the particular object 1 when viewed from this particular viewpoint 4 in the space, plus its associated RBG light intensity values, and (b) a set of feature data points, each defined by a 3-dimensional coordinate, and corresponding to the object features of the object - the virtual corners 18 plus the centroids 41 in this case for each of the identified regions 40.
This is now the end of Procedure B.
Referring to Fig. 15, for each of the obj ects, combinations of the set of feature data points 18 (virtual comers in this case) pertaining to object 1 are now grouped as multiple sets of three feature data points, where each group of three feature data points constitutes an "object triangle" defined by three apexes, with the coordinates of the apexes equivalent to the coordinates of the respective feature data points. For example, in Fig. 15, the three apexes 19, 20 and 21 define the object triangle 22. The possible obj ect triangles, four in the case of Fig. 15, are differently 3 -dimensionally orientated in the space, although all four reside in a single plane in this case. The 3 -dimensional orientation of reference plane 26 which is coplanar with object triangle 22 is fully defined by the 3-dimensional coordinates of the three respective apexes 19, 20 and 21. The 3-dimensional orientations of each of the three reference sides 23, 24 and 25 of object triangle 22 are also each defined by the 3-dimensional coordinates of two of the three apexes 19, 20 and 21 of object triangle 22, for example the 3-dimensional orientation of reference side 23 is defined by the 3-dimensional coordinates of apexes 20 and 21.
To minimise the number of the feature point combinations and the number of object triangles so defined, all three virtual corners of each set of three virtual corners must originate from the virtual corner calculations for a single contiguous region, rather than originating from more than one contiguous region's virtual corner/feature point calculation process. In other embodiments of the present invention, the Delaunay triangulation method, a well known basic calculation in the field of
object recognition, can be used to generate sets of three 3 -dimensional feature data points for object 1. Step 2:
For each of the three reference sides 23, 24 and 25 of each of the object triangles 22 of each of the objects 1 for each of the viewpoints 4, a unique transformation index assigned, and the coordinates of the set of object data points are transformed into transformed object data points with respective new coordinates in a new axis system, where the transformation function is a function of the coordinates of the three feature data points, and hence the corresponding coordinates of the three apexes 19, 20 and 21 of the particular object triangle 22. In order to accelerate the data processing, the transformation performed here is preferably carried out at a computer hardware or firmware level, for example by using a PC 3D graphics card.
In some embodiments each transformation function may involve use of an affine transformatio or a transformation function used within in a standard geometric hashing algorithm. For example translation and rotation in 3D may be implemented (3D rigid transformation) where the set of three feature data points define the plane of rotation and translation of the set of object data points, or translation, rotation and scaling in 3D may be implemented where the set of three feature data points define the translation, rotation, and scaring of the set of object data points.
However, as detailed below, each transformation function in this first embodiment is a function of the orientation of the particular reference side 23, 24 or 25 and the orientation of the reference plane 26 of the respective object triangle 22. The transfonnation function can therefore be considered as a function of a cyclic permutation of the set of three feature data points (i.e. a cyclically ordered set of the three feature data points), these three feature data points corresponding to the three apexes 19, 20 and 21 of the given object triangle 22 - since the three apexes 19, 20 and 21 define the orientation of the reference plane 26 and the cyclically ordered three sets of two apexes (i.e. apexes 21 - 20, 20 - 19, and 19 - 21) define the orientation of the respective three reference sides 23, 24 and 25.
The transformation function comprises, shifting the position of the origin to a new origin position and reorientating the axis system to a new axis orientation such that, in the new axis system (Χ', Υ', Ζ'), defined by a first new axis (Χ'), a second new axis (V), and a third new axis (Ζ'), the new axis orientation, and new origin position, are a function of the apexes of the respective object triangle, and hence the set of three feature data points corresponding to these apexes.
Referring to the axis transformation sequence shown in Fig. 16, 17, 18 and 19, the origin 3 is shifted to
a new origin position 27 and reorientated to a new axis orientation 28 such that, in the new X', Y', Z' axis system, a third new axis (Ζ') is aligned perpendicular to the reference plane 26 of the object triangle 22 (Fig. 17), this third new axis (Ζ') passes through a mid-point 29 between the apexes 1 and 21 at the extremities of reference side 25 (Fig. 18), a first new axis (Χ') of the new axis system is aligned parallel to this reference side 25, and a second new axis (Υ') of the new axis system is directed through mid-point 29 in the direction 35 of the third apex 20 of the object triangle (Fig. 19).
In another embodiment, the coordinates of the set of object data may be further transformed using a scaling transformation function. The scaling transformation function may involve scaling the object data based upon the length of reference side 25. Alternatively, the scaling transformation function may involve further transforming the set of object data, such that in the new axis system, the three apexes of the object triangle are made coincident with the three apexes of a predefined triangle. This scaling transformation function is demonstrated using another object, where referring to Fig. 30 and 31, this object is pentagonal, and the pentagonal object is first transformed using the transformation function illustrated in Fig. 16, 17, 18 and 19. Referring to the sequence shown in Fig. 32, 33, 34, and 35, the three apexes of the obj ect triangle in Fig. 31 are made coincident with the three apexes of a predefined equilateral triangle 46 in the new X', Y', Z' axis system, where a predefined side 49 of the predefined triangle 46 is parallel to the new X' axis (Fig. 32), by scaling the object data such that the reference side 25 of the object triangle is of same length 47 as a predefined side 49 of the predefined triangle 46 (Fig. 33), and scaling the object data in the new Y' axis direction such that the perpendicular distance 48 between the reference side of the object triangle and the third ape of the object triangle and perpendicular distance between the predefined side 49 of the predefined triangle 46 and the third apex of the predefined triangle 46 are the same (Fig. 34), and by shearing the object data along the new X' axis (Fig. 35). Fig. 32, 33, 34, and 35 only show the object data bound by the object triangle, but all of the object data may be transformed using this scaling transformation function.
The coordinates (x, y, z) of the set of object data points are therefore appropriately transformed into new coordinates ( , y1, z') in the new X', V, Z' axis system, hence forming a set of transformed object data points as shown in Fig. 20.
It should be noted that, unlike the simple cubic object described herein in reference to the first embodiment, the surfaces of real world objects may comprise distinctive surface patterns and other "micro features" which will greatly aid the eventual object recognition process. Fig. 21 shows an RGB
representation of a set of object data points of a similar cubic object, the similar object containing such a distinct surface pattern, with the location of an object triangle also shown.
Referring to Figs. 22 and 23, an interpolated 3-dimensional surface mesh 30 is formed after the transformation of the set of object data points is carried out, defined by surface mesh points 31. The coordinates of each surface mesh point 31 corresponds to the coordinates of a transformed object data point 33 associated with the object 1 and the relevant axis transformation or, as in the case of surface mesh point 3 IB, corresponds to an interpolation of the coordinates of one or more of the transformed object data points 33. Each polygon 32 of the surface mesh 30 is formed by proximate surface mesh points 31 and 31 B and has associated with it the light intensity values (or "RGB values" in this first embodiment) derived as a function of the RGB values associated with nearby transformed object data points. Fig. 24 shows the interpolated 3-dimensional surface mesh 30 (as shown in Fig. 22) including light intensity values of the mesh surface polygons (as shown in Fig. 23), and also showing the location of the object triangle.
In another (not shown) embodiment, an as yet untransformed interpolated 3-dimensional surface mesh defined by surface mesh points may be formed before the transformation of the set of object data points is carried out. The coordinates of each surface mesh point will correspond to the coordinates of an ' object data point associated with the object or correspond to an interpolation of the coordinates of one or more of the object data points. Each polygon of the surface mesh would again be formed by proximate surface mesh points and have associated with it the light intensity values (or "RGB values" in this first embodiment) derived as a. function of the RGB values associated with nearby object data points. The 3-dimensional surface mesh points, together with the relevant polygons, would then be transformed according to the same axis transformation as described previously. Alternatively, the 3- dimensional transformation function applied to the object data points may be equivalent to viewing the untransformed interpolated 3-dimensional surface mesh from a virtual viewpoint, wherei this virtual viewpoint is a function of the set of three feature data points. The transformed object data points in this case would correspond to one or more of the viewable map images defined below (a 2-dimensional viewable RGB map image, a 2-dimensional viewable depth map image, or a viewable 3-dimensional mesh image).
Returning to the first embodiment described, for each of the axis transformations, a 2-dimensional viewable RGB map image shown in Fig. 25 of the surface mesh 30 shown in Fig. 24 is generated. The
image is generated using computer graphics by generating a virtual image of the surface mesh 30 from a particular virtual viewpoint, the virtual image comprising virtual image data points 34. A virtual mesh viewpoint (not shown) is orientated along the Z' axis passing through the new origin position 27 corresponding to the midpoint 29 of one of the three reference sides 23, 24 or 25 (25 as shown in Fig. 23), and the X' axis aligned with the relevant reference side (25 as shown).
The virtual image data points 34 in the 2-dimensional virtual image are generated by interpolating the (already potentially interpolated) 3-dimensional surface mesh 30 at particular coordinate intervals along the X' axis and along the Y' axis, each virtual image data point 34 corresponding to a viewable surface mesh point 36 on the surface mesh and having one or more light intensity values derived as a function of the light intensity values (or "RGB values" in this first embodiment) of the transformed object data points proximate to the point i.e. the light intensity values of the polygon of surface mesh 30 at the viewable surface mesh point 36. Each virtual image data point 34 corresponds to a viewable surface mesh point 36 on the surface mesh 30 and has RGB values derived as a function of the RGB values of transformed object data points proximate to the viewable surface mesh point 36, and coordinates derived as a function of the transformed coordinates of the object data points proximate to the viewable surface mesh point 36. It will be recognised by those skilled in the art of object recognition that this 2-dimensional image of the mesh may also be referred to as a "pattern".
Note that the viewing properties of the virtual mesh viewpoint and Z* position of the object triangle determine the viewable area of the image (i.e. the size of the object of part thereof displayed in the 2- dimensional image). Therefore, the viewing properties of the virtual mesh viewpoint may be defined, or object data may be translated on the Z' axis, or the object data may be scaled, or the new axis system (Χ', Υ', Ζ') may be scaled, such that the image formed is dependent upon the size of the object triangle (or length of reference side 25). Alternatively, no such dependency may be enforced, and the image formed will be independent of the size of the object triangle, and thus only dependent upon the size/scale of the object data in the coordinate system, meaning the image generated of a large object may only contain a small part of the object, centered at the midpoint of the reference side of the object triangle.
A 2-dimensional viewable depth map image (a 2-dimensional array including Z' values of each of the viewable surface mesh points 31) may also be created along with the above described 2-dimensional viewable RGB map image (a 2-dimensional array of RGB values of each of the viewable 3-
dimensional surface mesh points 31).
For both the viewable RGB map image, and viewable depth map image, the virtual image data points 34 may also be referred to as "pixels".
In another embodiment, a viewable 3 -dimensional mesh image may be created which encapsulates the viewable transformed object data points in its 3 -dimensional format.
Fig. 26 shows a 2-dimensional viewable RGB map image of the set of transformed object data points (or associated interpolated 3-dimensional mesh) of the similar object shown in Fig. 21, where the transformation of the object data points for this object has occurred based upon the object triangle shown in Fig. 21.
In still another embodiment, one or more viewable maps may be generated at a different scale. Like .above, this may also be achieved by a number of methods; the viewing properties of the virtual mesh viewpoint may be modified, or object data may be transformed (translated) on the Z' axis, or the new axis system (Χ', Υ', Z") may be scaled.
A set of characterising data is then generated based on the transformed object data points and the associated viewable map images - in the case of this first embodiment, the viewable RGB map image and the viewable depth map image. In the case of this embodiment, the set of characterising data comprises the respective unique object index and the unique transformation index, the transformed object data points themselves, the one or more viewable map images, along with one or more functions of one or more of the viewable map images. A "function of one or more of the viewable map images" is henceforth referred to in this specification as a "subsef of the set of characterising data.
The method of second data-processing, of the transformed object data points will determine the format of the set of characterising data generated, and this in turn is dependent on which type of comparison algorithm is eventually used to compare the training data and test data. In this embodiment a "database comparison algorithm" will be used, and this makes use of an ordinary database structure where fast indexing and lookup may only be performed on a single key (or row/dimension within the database). In accordance with this scenario, the subset is generated as a 1 -dimensional binary string and is used for database indexing purposes. The database comparison algorithm is described after the end of procedure A in this specification.
In an alternate embodiment, the function of one or more of the viewable map images may involve an
image processing function to generate the subset, such as a colour saturation contrast map or gaussian filter of the viewable RGB map and/or viewable depth map image. Ideally this image processing function should result in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
In another alternate embodiment, the function of one or more of the viewable map images may involve a neural network or decision tree to generate the subset.
However, in this first embodiment the function of one or more of the viewable map images involves a spatial convolution function. Binned coefficient values of a spatial convolution of the viewable RGB map image, and binned coefficient values of a spatial convolution of the viewable depth map image, may be recorded as the subset. The subset, represented as a 1 -dimensional binary string, may contain, for example, 64 x 3 bits, representing a 4 x 4 array of spatial convolution coefficient values binned at a 4 bit resolution, for each light intensity value type (e.g. RGB) or function thereof (e.g. YCbCr). The spatial convolution technique utilised may be a low resolution DCT (discreet cosine transformation) such is implemented in image compression algorithms well known to the art of image processing (e.g. JPEG). Note that, as per this embodiment, it is useful to convert data from the RGB color model to YCbCr color model for the purposes of generating the DCT coefficients. Fig. 28 shows 2 -dimensional 8 8 and 4 DCT basis functions for such a DCT calculation, well known in the art of image data processing, and an integral part of the universally used JPEG and MPEG data compression algorithms.
All DCT coefficients of a 4 x 4 DCT basis function have been utilized in the first embodiment however, in another embodiment, a higher order DCT may be performed (e.g. 8 x 8), and only a selection of the coefficients (e.g. 7 higher order coefficients out of the 64 coefficients) may be binned and used in the formation of the subset. The exact coefficients selected from a given DCT may depend upon a number of factors, including the light intensity value type, variations in object lighting conditions, and recognition accuracy and repeatability. For example, the highest order coefficient may be dropped to cater for variations in lighting conditions.
In still another embodiment, additional bits maybe used to generate the subset, unrelated to the viewable map data.
There are many possible structures for the set of characterising data according to the present invention. Fig. 27 shows four alternative possible structures:
Fig. 27, with Option #1, shows a set of characterising data comprising a 192 bit long Subset A, which has been generated based upon the viewable RGB map image.
Fig. 27, with Option #2, shows an alternative set of characterising data comprising two subsets, 192 bit long Subset B and 192 bit long Subset C, Subset B generated based upon the viewable RGB map image and Subset C generated based upon the viewable RGB map image also.
Fig. 27, with Option #3, shows a still alternative set Of characterising data comprising a 384 bit long Subset D, which has been generated based upon both the viewable RGB map image and the viewable depth map image.
Fig. 27, with Option #4, shows a still alternative set of characterising data comprising a 64 bit long Subset E, in which colour information is ignored in the generation of the subset. Subset E has been generated based upon the viewable RGB map image, but only the Y (luminosity) DCT coefficients derived therefrom have been unused in the formation of the subset.
In a more complex embodiment (not shown), the raw DCT coefficients (i.e. unbinned) may be stored separately in the set of characterising data along with subset (binned coefficient values) for later verification purposes only.
Alternatively, in a not sho wn embodiment, one or more bits of the 1 -dimensional binary string may be generated as a function of the spatial convolution coefficient values. This function could be a neural network, or decision tree, where these are input data to the neural network(s) or decision tree(s).
Alternatively, in another not shown embodiment, a subset could be generated using a neural network or decision tree directly in conjunction with the viewable RGB map image or the viewable depth map image to produce one or more bits of the 1 -dimensional binary string.
Alternatively, in still another not shown embodiment, a subset could be generated using an image processing function (such as a Gaussian filter) of the viewable RGB map image, or the viewable depth map image where it has been configured to result in significant quantization (spatial resolution and intensity resolution reduction) of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data. The quantization should ideally be independent of aliasing or the pixel map structure, and produce a repeatable (the same) output with slight variations to the input maps, and hence ensure me training data will match the test data.
Alternatively, in still another not shown embodiment, where a multi -dimensional database is available
and can provide fast lookups, DCT coefficient values could be stored separately without the need for a 1 -dimensional binary string.
This is now the end of Procedure A.
Each set of characterising data generated according to the present invention is used for the purposes of object recognition. As mentioned at the start of the description of this first embodiment, the characterising data generating procedure according to the present invention, can be executed sequentially for multiple scenes, each with multiple objects and viewpoints. Training data can be initially generated during a training data generation stage and this training data later compared with test data, also generated using the same procedure during a test data generation stage, and a comparison algorithm then employed to compare the training data and test data, and enabling the recognition of one or more of the test objects.
In a rudimentary embodiment, for each of the one or more reference sides of each of the object triangles of each of the objects of the test data, and for each of the one or more reference sides of each of the object triangles of each of the objects of the training data, the one or more sets of characterising data pertaining to the test data are compared with the one or more sets of characterising data pertaining to the training data.
In another embodiment, a network based artificial intelligence algorithm maybe used to train one or more networks (such as neural networks or decision trees) with the training data. In a network based artificial intelligence algorithm the input values into the network are the values of one or more
> viewable map images or a function of one or more of viewable map images, and the output values of the network are a function of an object index or an object transformation index (for example the object index + the transformation index). The network based artificial intelligence algorithm is used to test the network against the test data object(s), thereby determining whether or not the testing object(s) is (or are) recognised and which of the trained objects this (or these) test data object(s) corresponds to.
> However, in this first embodiment, an efficient method is implemented using a database comparison algorithm, where an indexed database is created of training data, and where the subset of each set of characterising data of the training data is used to index, and hence identify, the corresponding sets of characterising data stored in the database. The training data may be stored in an indexed database or a multi-dimensional hash table, where the index chosen may be the subset itself or another function of
) the set of characterising data.
When a set of characterising data being generated is training data, two or more subsets of the set of characterising data may be formed, each with slight variations to the binning of the coordinates of the transformed object data points, and/or the binning of the DCT coefficients for example. When the one or more sets of characterising data currently being generated is test data, each set of characterising data will only include a single subset. The generation of such multiple subsets within each set of characterising data can enable instantaneous database lookup in scenarios where slight variations exist between two matching sets of characterising data since, for example, a very fast (data processing efficient) direct compariso (lookup) is possible between the binary string of the subset pertaining to the test data and the binary strings of the one or more subsets pertaining to the training data. As referred to above, Fig. 27 with Option #2 shows a set of characterising data comprising two subsets, 192 bit long Subset B and 192 bit long Subset C. In this scenario, the second of the two subsets (Subset C in this case) has been modified slightly compared to Subset B, such that one of its 4 x 4 (16 total) 4-bit binned DCT coefficient values for the Y (luminosity) "colour" has been increased arbitrarily by 1. Thus, as per the example shown for Option #2 in Fig. 27, a value of 0100 for bits 1 - 4 (i.e. a value of 4 expressed in base 10), corresponding to the top-left (lowest frequency) binned DCT coefficient value, would increase by 1 to a value of 0101 (a value of 5 expressed in base 10). Similar other multiple subsets could also be included in this set of characterising data (not shown), in which other binned coefficient values for any of the three "colours" Y, Cb or Cr, are increased in value by 1 or decreased in value by 1.
As referred to above, the set of characterising data generated according to the present invention is used for purposes of object recognition, and this involves the matching of one or more sets of characterising data in the test data, with one or more sets of characterising data in the training data, using a comparison algorithm. In mis first embodiment the comparison algorithm comprises a database search or database lookup. The comparison algorithm first involves the matching of one or more subsets of the set of characterising data in the test data, with one or more subsets of the set of characterising data in the training data, by performing a database lookup using the subset as an index.
Every set of characterising data in the test data is therefore searched for, to produce relevant match sets of characterising data in the training data. The object indices corresponding to the the sets of characterising data in the training data which result in matches are recorded, and tallied. An object is considered recognised based upon a function of the number of sets of characterising data associated with that object that produced matches.
By way of example, Fig. 29 shows diagrammatically the comparison of 3 subsets of test data to 1735 subsets of training data with the subsets generated as YCbCr Y (luminosity) binary strings (i.e. similar to the Subset E previously referred to), generating one match: a match between Subset #3 of the training data and Subset #2 of the test data. Note that the subsets numbers in Fig. 29 do not correspond ; to the transformation indices. The training data subsets have been ordered in the database according to their binary string value (index). For every subset listed in this diagram, there exists a mapping to its corresponding set of characterising data.
Upon locating a match using the index lookup, the comparison algorithm may then also compare the previously mentioned viewable map images (or the 3 -dimensional viewable map image) stored in the
> set of characterising data for verification of the match.
In accordance with a second embodiment of the present invention there are provided computer executable steps, these computer executable steps corresponding substantially to the steps in thv method as previously described for recognising an object in reference to the first embodiment. Note that Steps 2 and 3 in this second embodiment respectively correspond to, and employ the methodology
> of, Steps 1 and 2 in the first embodiment. Step 1 in this second embodiment specifically relates to the computer executable step of receiving of an input of scene data for a scene containing the object in a 3- dimensional space defined by an axis system. The input of scene data may be a provided by a conventional 3-dimensional imaging system, for example two stereoseopically arranged (laterally offset) digital cameras, or alternatively may be provided by other more complex systems, for example a
) . scanning laser interferometric imaging system, a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (MRI) scanning systems used in 3-dimensional medical imaging applications. The input scene data could also be computer software generated based upon one or more 3-dimensional models.
> In accordance with a third embodiment of the present invention there is provided computer readable code for carrying out the computer executable steps as described in reference to the second
embodiment, when run on the one or more computers.
In accordance with a fourth embodiment of the present invention there is provided one or more ) computer readable mediums comprising the computer readable code as described in reference to the third embodiment.
In accordance with a fifth embodiment of the present invention there is provided one or more computers adapted for recognising an object, comprising:
- respective one or more processors;
- respective computer readable memory operatively connected to the respective one or more processors' and storing the computer readable code described in reference to the third embodiment;
- respective one or more scene data input systems for receiving the scene data; and
- an output system for outputting the comparison data.
In accordance with this fifth embodiment, a first computer and associated scene input data system could be used to generate the training data for one or more scenes containing the object during the training data generation stage, and a second computer and another associated scene input data system could be used to generate the test data for one or more other scenes also containing the object during the later test data generation stage. One of these computers, or a third computer, could be used to compare the training data and the test data, and generate an output of comparison data in the form of a rating (e.g. a numerical recognition rating) or a more simple "yes/no" indication (e.g. a graphic, an indicator light, or an acoustic '"beep") corresponding to the recognition of the. object." Of course there are many different possibilities, including the first and second computers, and the associated scene input data systems, being a common system, and that common computer also being used to compare the training data and the test data and thereby generate an output of the above described comparison data. In another embodiment a computer network or cloud could be employed to perform the generation of training data, and/or the generation of test data, and/or the data comparison, where the computer network or cloud could be connected to one or more devices for imaging one or more scenes.
The scene data input system employed according to this fifth embodiment may be a conventional 3- dimensional imaging system, for example two stereoscopically arranged (laterally offset) digital cameras, or alternatively may comprise other more complex systems, for example a scanning laser interferometric imaging system, a scanning laser or LED-light time-of-flight imaging system, or computed tomographic (CT) or magnetic resonance imaging (M I) scanning systems used in 3-
dimensional medical imaging applications. The input scene data could also be computer software generated based upon one or more 3-dimensional models.
Interpretation
Wireless
The invention may be embodied using devices conforming to other network standards and for other applications, including/for example other WLAN standards and other wireless standards.
Applications that can be accommodated include IEEE 802.1 1 wireless LANs and links, and wireless Ethernet.
In the context of this document, the term "wireless" and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a non-solid medium. The term does not imply that the associated devices do not contain any wires, although in some embodiments they might not. In the context of this document, the term "wired" and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a solid medium. The term does not imply that the associated devices are coupled by electrically conductive wires.
Processes:
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing", "computing",
"calculating", "determining'', "analysing", "generating", "deriving" or the like, refer to the action and/or processes of a computer οτ computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.
Processor:
In a similar manner, the term "processor" may refer to any device or portion of a device that processes electronic data, e.g., from registers and/or memory to transform that electronic data into other electronic data that, e.g., may be stored in registers and/or memory. A "computer" or a "computing device" or a "computing machine" or a "computing platform" may include one or more processors. The methodologies described herein are, in one embodiment, perforroable by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that
when executed by one or more of the processors carry out at least one of the methods described herein. Any processor capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken are included. Thus, one example is a typical processing system that includes one or more processors. The processing system further may include a memory subsystem including main RAM and/or a static RAM, and/or ROM.
Computer-Readable Medium:
Furthermore, a computer-readable carrier medium may form, or be included in a computer program product. A computer program product can be stored on a computer usable carrier medium, the computer program product comprising a computer readable program means for causing a processor to perform a method as described herein.
Networked or Multiple Processors:
In alternative embodiments, the one or more processors operate as a standalone device or may be connected, e.g., networked to other processor(s), in a networked deployment, the one or more processors may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer or distributed network environment (e.g. cloud). The one or more processors may form a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
Note that while some diagram(s) only show(s) a single processor and a single memory that carries the computer-readable code, those in the art will understand that many of the components described above are included, but not explicitly shown or described in order not to obscure the inventive aspect. For example, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
Additional Embodiments:
Thus, one embodiment of each of the methods described herein is in the form of a computer-readable carrier medium carrying a set of instructions, e.g., a computer program that are for execution on one or c
more processors. Thus, as will be appreciated by those skilled in the art, embodiments of the present invention may be embodied as a methods an apparatus such as a special purpose apparatus, an apparatus such as a data processing system, or a computer-readable carrier medium. The computer- readable carrier medium carries computer readable code including a set of instructions that when
executed on one or more processors cause a processor or processors to implement a method.
Accordingly, aspects of the present invention may take the form of a method, an entirely hardware embodiment, an entirely software embodiment or an embodiment combining soft-ware and hardware aspects. Furthermore, the present invention may take the form of carrier medium (e.g., a computer program product on a computer-readable storage medium) carrying computer-readable program code embodied in the medium.
Carrier Medium:
The software may further be transmitted or received over a network via a network interface device. While the carrier medium is shown in an example embodiment to be a single medium, the term "carrier medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "carrier medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by one or more of the processors and that cause the one or more processors to perform any one or more of the methodologies of the present invention. A carrier medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
Implementation:
It will be understood that the steps of methods discussed are performed in one embodiment by an appropriate processor (or processors) of a processing (i.e., computer) system executing instructions (computer-readable code) stored in storage. It will also be understood that the invention is not limited to any particular implementation or programming technique and that the invention may be
implemented using any appropriate techniques for implementing the functionality described herein. The invention is not limited to any particular programming language or operating system.
Means For Carrying out a Method or Function
Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a processor device, computer system, or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
Coupled
Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limitative to direct connections only. The terms "coupled" and "connected", along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. "Coupled" may mean that two or more elements are either in direct physical or electrical contact or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.
Embodiments:
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
Similarly it should be appreciated that in the above description of example embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description of Specific Embodiments are hereby expressly incorporated into this Detailed Description of Specific Embodiments, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art.
For example, in the following claims, any of the claimed embodiments can be used in any combination. Specific Details
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practised without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Terminology
In describing the preferred embodiment of the invention illustrated in the drawings, specific terminology will be resorted to for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents which operate in a similar manner to accomplish a similar technical purpose. Terms such as "forward", "rearward", "radially", "peripherally", "upwardly", "downwardly", and the like are used as words of convenience to provide reference points and are not to be construed ais limiting terms.
Different Instances of Objects
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
Comprising and Including
In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word "comprise" or variations such as "comprises" or "comprising" are used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.
Any one of the terms: "including" or 4twhich includes" or "that includes" as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, "including" is synonymous with and means "comprising".
Scope of Invention
Thus, while there has been described what are believed to be the preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made
thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as fall within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.
Although the invention has been described with reference to specific examples, it will be appreciated by those skilled in the art that the invention may be embodied in many other forms. For example, the term "vehicle" or "vehicular" or other similar term as used herein is inclusive of motor vehicles in general such as passenger automobiles including sports utility vehicles (SUV), buses, trucks, various commercial vehicles, watercraft including a variety of boats and ships, aircraft, and the like, and includes hybrid vehicles, electric vehicles,. plug-in hybrid electric vehicles, hydrogen-powered vehicles and other alternative fuel vehicles (e.g. fuels derived from resources other than petroleum). As referred to herein, a hybrid vehicle is a vehicle that has two or more sources of power, for example both gasoline-powered and electric-powered vehicles.
Industrial Applicability
It is apparent from the above, that the arrangements described are applicable to the robotics and manufacturing industries.
Claims
1. A method of recognising an object, to be performed on o
comprising: generating one or more sets of characterising data as training data during an initial training data generation stage; generating one or more sets of characterising data as test data during a later test data generation stage; and comparing the training data and test data using a. comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises:
1. for a scene containing the object in a 3 -dimensional space defined by an axis system, deriving a set of 3-dimensional object data points for the object, each object data point defined by 3 -dimensional coordinates and one or more light intensity values and, based on a first data-processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3-dimensional coordinates, and grouping one or more sets of three feature data points; and
2. for each set of three feature data points, applying a 3 -dimensional transformation function to the coordinates of one or more object data points to generate respective transformed object datapoints with respective new coordinates, each transformed object data point having the same associated one or more light intensity values as the object data point from which it was generated, wherein the 3 -dimensional transformation function is a function of the set of three feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
2. The method as claimed in Claim 1 wherein, during the training data generation stage or during the test data generation stage, the scene containing the object is computer software generated.
3. The method as claimed in Claim 1 wherein the transformation function is a function of a cyclic permutation of the set of three feature data points.
4. The method as claimed in Claim 1 , wherein the transformation function in Step 2 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature data points.
5. The method as Claimed in Claim 4, wherein the transformation function in Step 2 also involves shearing of the one or more object data points, based upon the set of three feature data points.
6. The method as claimed in Claim 3 wherein the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3 -dimensional coordinates corresponding to the respective 3 -dimensional coordinates of the set of three feature data points.
7. The method as claimed in Claim 1 wherein the one or more light intensity values correspond to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
8. The method as claimed in Claim 1 wherein the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
9. The method as claimed in Claim 1 wherein a unique object index is assigned to the object
10. The method as claimed in Claim 1 wherein the first data-processing in Step 1 comprises subdividing the object data points into a region of contiguous object data points with similar one or
more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
1 1. The method as claimed in Claim 10 wherein each set of three feature data points are derived from a single region.
12. The method as claimed in Claim 10, wherein one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
13. The method as claimed in Claim 10, wherein the first function is a luminosity contrast function of the one or more light intensity values.
1 . The method as claimed in Claim 10, wherein the second function is a geometric 3-dimensional centroid of the coordinates of the object data points in the region.
15. The method as claimed in Claim 12, wherein the third function relates to the distance of one or more of the object data points in the region from the base point.
16. The method as claimed in Claim 12, wherein the third function is a relative and/or absolute minima and/or maxima of the distance of one or more of the object data points in the region from the base point.
17. The method as claimed in Claim 10, wherein one more of the object data points in the region are or a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
18. The method as claimed in Claim 10, wherein one or more feature points are derived as a function of one moire object data points on a boundary of the region, based upon the local curvature of the boundary.
19. The method as claimed in Claim 10, wherein the base point is derived as a feature data point.
20. The method as claimed in Claim 1, wherein the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
21. The method as claimed in Claim 1 wherein the first data-processing in Step 1 comprises determining at least one point of high local curvature of the object data points, and designating the point as a feature data point.
22. The method as claimed in Claim 1 wherein the first data-processing in Step 1 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data point.
23. The method as claimed in Claim 1 , wherein the obj ect data points are derived by 3 -dimensionally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3- dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3-dimensional coordinates in the space.
24. The method as claimed in Claim 23, wherein each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
25. The method as claimed in Claim 24, wherein, either during the calculating of the position coordinates for each viewpoint, or after the calculating of the position coordinates for all viewpoints, the object data points are isolated for the object.
26. The method as claim in Claim 23, wherein the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3 -dimensional image of the object.
27. The method as claim in Claim 23, wherein if the range distance of a viewable point in the scene is above a certain threshold, one of the coordinates of the corresponding object data point will be set to a predetermined maximum value.
28. The method as claimed in Claim 23, wherein the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean depth gradient contrast map derived by applying an arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
29. The method as claimed in Claim 24, wherein the position coordinates ana the orientation ot each viewpoint in the space are predefined for a scene.
30. The method as claimed in Claim 23, wherein at least one of the viewable points reside on a surface of the object.
31. The method as claimed in Claim 1, wherein the object data points are derived by 3 -dimensional imaging of the scene.
32. The method as claim in Claim 31, wherein one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3-dimensional imaging of the scene.
33. The method as claimed in Claim 23 wherein, for each viewpoint, the 3-dimensional imaging comprises creating at least two 2-dimensional sub-images from two alternative viewpoints, slightly offset by a predetennined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
34. The method as claimed in Claim 24, wherein the 3-dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
35. The method as claimed in Claim 24, wherein the viewing properties comprise a view width angle and a view height angle.
36. The method as claimed in Claim 1 wherein the first data processing in Step 1 is performed using hardware acceleration, such as available in graphics processors.
37. The method as claimed in Claim 6, wherein the transfonnation function in Step 2 comprises shifting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system (Χ', Υ', Ζ') defined by a first new axis (X1), a second new axis V), and a third new axis (Ζ'), the new axis orientation and new origin position are based on the 3-dimensional orientation of the reference plane and the 3-dimensional orientation and
position of the reference side of the object triangle.
38. The method as claimed in Claim 37, wherein the new axis system (Χ', Y1, Z') is positioned and orientated such that the third new axis (Ζ') is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis (Χ') is aligned parallel to the reference side, and the second new axis (Υ') is directed through the mid-point of the reference side in the direction of the remaining apex of the object triangle.
39. The method as claimed in Claim 37, wherein the new axis system (Χ', Υ', Ζ') is also scaled or the new origin position is shifted in the Z' direction, based upon the length of the reference side.
40. The method as claimed in Claim 6, wherein the transformation function in Step 2 comprises rotating the one or more object data points based upon the 3 -dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-drmensional orientation of the reference side, and translating the one or more object data points based upon the 3 -dimensional position of the reference side.
41. The method as claimed in Claim 40, wherein the one or more object data pomts are also scaled or translated, based upon the length of the reference side.
42. The method as claimed in Claim 40, wherein the transformation function in Step 2 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made coincident with the apexes of a predefined triangle.
43. The method as Claimed in Claim 42, wherein the reference side of the object triangle is transformed to be the same length as a predefined side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the
same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
44. The method as claimed in Claim I wherein, in Step 2, a unique transformation index is assigned for the transformation function.
45. The method as claimed in Claim 1 wherein the transformation function in Step 2 is performed using hardware acceleration, such as available in graphics processors.
46. The method as claimed in Claim 1 wherein the second data-processing in Step 2 comprises creating an image of the transformed object data points.
47. The method as claimed in Claim 46 wherein tne image is created from an interpolated or non- interpolated 3 -dimensional surface mesh defined by 3-dimensional surface mesh points, the 3- dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
48. The method as claimed in Claim 1 wherein the set of object data points is used to form an untransformed interpolated or non-interpolated 3-dimensional surface mesh defined by 3-dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
49. The method as claimed in Claim 48, wherein the. surface mesh points of the untransformed interpolated or non-interpolated 3-dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non-interpolated 3-dimensional surface mesh.
50. The method as claimed in Claim 48, wherein the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
51. The method as claimed in Claim 46 wherein the image comprises a depth map or a function of the depth map calculated based upon the 3-dimensional coordinates of the transformed object data points.
52. The method as claimed in Claim 46 wherein the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3-dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
53. The method as claimed in Claim 46 wherein the image is a 2-dimensional image.
54: The method as claimed in Claim 46 wherein the image is a 3-dimensional image.
55. The method as claimed in Claim 46 wherein the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
56. The method as claimed in Claim 55 wherein the function of the image is a spacial convolution function, and this is applied to the image to generate one or more convolution coefficients, or a function of the one or more convolution coefficients.
57. The method as claimed in Claim 56 wherein the function of the one or more convolution coefficients is a binning function.
58. The method as claimed in Claim 57 wherein the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
59. The method as claimed in Claim 55 wherein each of the one or more subsets are represented as a 1- dimensional binary string.
60. The method as claimed in Claim 47 wherein the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
61. The method as claimed in Claim 60 wherein a new axis system is defined by a first new axis (X1), second new axis (Υ') and third new axis (Ζ'), a new axis orientation and a new origin position being a function of the set of three feature data points, the viewable surface mesh points are generated by interpolating the surface mesh at particular coordinate intervals along the first new axis (Χ') and along the second new axis (Υ'), or along the first new axis (Χ') and along the second new axis (Υ') and along the third new axis (Ζ'), each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
62. The method as claimed in Claim 1 wherein the second data processing in Step 2 is performed using
hardware acceleration, such as available in graphics processors.
63. The method as claimed in Claim 9 wherein the set of characterising data in Step 2 also includes data relating to the object index.
64. The method as claimed in Claim 44 wherein the set of characterising data in Step 2 also includes data relating to the transformation index.
65. The method as claimed in Claim 1 wherein the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
66. The method as claimed in Claim 1 wherein the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
67. The method as claimed in Claim 1 wherein the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the trainin data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
68. The method as claimed in Claim 1 wherein the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
69. The method as claimed in Claim 55 wherein the function of the image of the transformed object
data points is a neural network or a decision tree.
70. The method as claimed in Claim 56 wherein the function of the one or more convolution coefficients is a neural network or a decision tree.
71. The method as claimed in Claim 55 wherein the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
72. The method as claimed in Claim 1 wherein the transformation function in Step 2 is a geometric hashing function or an affine transformation.
73. The method as claimed in Claim 1 wherein the set of characterising data also includes one or more object data points.
74. The method as claimed in Claim 1 wherein the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
75. The method as claimed in Claim 74 wherein a first index or set Of indices for the indexed database is a function of the respective set of characterising data.
76. The method as claimed in Claim 1 wherein the one or more sets of characterising data generated as training data are stored in a single or multi-dimensional hash-table
77. The method as claimed in Claim 55 wherein the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database
78. The method as claimed in Claim 77 wherein a first index or set of indices for the indexed database
is one of the one or more subsets of the respective set of characterising data.
79. The method as claimed in Claim 75 wherein a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
80. The method as claimed in Claim 78 wherein a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
81. Computer executable steps for recognising an object, to be performed on or with the aid of one or more computers, comprising: generating one or more sets of characterising data as training data during an initial training data generation stage; generating one or more sets of characterising data as test data during a later test data generation stage; and calculating comparison data as a function of the training data and the test data using a comparison algorithm, wherein, during each of the training data generation and the test data generation stages, the generation of the one or more sets of characterising data comprises:
1. receiving an input of scene data for a scene containing the object in a 3*dimensional space defined by an axis system;
2. deriving a set of 3-dimensional object data points for the object, each object data point defined by 3 -dimensional coordinates and one or more light intensity values arid, based on a first data-processing of the coordinates of the object data points, deriving a set of feature data points corresponding to object features of the object, each feature data point also defined by 3-dimensional coordinates, and grouping one or more sets of three feature data points; and
3. for each set of three feature data points, applying a 3-dimensional transformation
function to the coordinates of one or more object data points to generate respective transformed object data points with respective new coordinates, each transformed object data point having the same associated cine or more light intensity values as the object data point from which it was generated, wherein the 3-dimensional transformation function is a function of the set of three feature data points, and carrying out a second data processing of the transformed object data points to generate one of the one or more sets of characterising data.
82. Computer executable steps as claimed in Claim 81 wherein, during the training data generation stage or during the test data generation stage* the scene containing the object is computer software generated.
83. Computer executable steps as claimed in Claim 81 wherein the transformation function is a function of a cyclic permutation of the set of three feature data points.
84. Computer executable steps as claimed in Claim 81 , wherein the transformation function in Step 3 involves rotation and translation, or rotation, translation and scaling, of the one or more object data points, based upon the set of three feature data points.
85. Computer executable steps as Claimed in Claim 84, wherein the transformation function in Step 3 also involves shearing of the one or more object data points, based upon the set of three feature data points.
86. Computer executable steps as claimed in Claim 83 wherein the function of a cyclic permutation of the set of three feature data points is a function of the orientation of a reference plane and the orientation and position of a reference side of an object triangle, the three apexes of the object triangle having 3-dimensional coordinates corresponding to the respective 3-dimensional coordinates of the set of three feature data points.
87. Computer executable steps as claimed in Claim 81 wherein the one or more light intensity values correspond, to one or more light frequencies, or frequency bands, or a function of these frequencies or frequency bands.
88. Computer executable steps as claimed in Claim 81 wherein the object data points correspond to locations on a surface of the object and the one or more light intensity values characterise the light radiating from the object data points.
89. Computer executable steps as claimed in Claim 81 wherein a unique object index is assigned to the object.
90. Computer executable steps as claimed in Claim 81 wherein the first data-processing in Step 2 comprises subdividing the object data points into a region of contiguous object data points with similar one or more light intensity values or a similar first function of their one or more light intensity values, deriving a base point for the region as a second function of the coordinates of the object data points in the region and deriving one or more feature data points of the set of feature data points.
91. Computer executable steps as claimed in Claim 90 wherein each set of three feature data points are derived from a single region.
92. Computer executable steps as claimed in Claim 90, wherein one or more feature data points are calculated as a third function of the base point and the contiguous object data points of the region.
93. Computer executable steps as claimed in Claim 90, wherein the first function is a luminosity contrast function of the one or more light intensity values.
94. Computer executable steps as claimed in Claim 90, wherein the second function is a geometric 3- dimensional centroid of the coordinates of the object data points in the region.
95. Computer executable steps as claimed in Claim 92, wherein the third function relates to the distance of one or more of me object data points in the region from the base point.
96. Computer executable steps as claimed in Claim 92, wherein the third function is a relative and/or absolute minima and/or maxima of the distance of one or more of the object data points in the region from the base point
97. Computer executable steps as claimed in Claim 90, wherein one more of the object data points in the region are on a boundary of the region, and one or more feature data points are derived as a function of these object data points and the base point.
98. Computer executable steps as claimed in Claim 90, wherein one or more feature points are derived as a function of one more object data points on a boundary of the region, based upon the local curvature of the boundary.
99. Computer executable steps as claimed in Claim 90, wherein the base point is derived as a feature data point.
100. Computer executable steps as claimed in Claim 81, wherein the 3-dimensional transformation function is equivalent to viewing the object data points or a function of the object data points from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
101. Computer executable steps as claimed in Claim 81 wherein the first data-processing in Step 2 comprises determining at least one point of high local curvature of the object data points, and
designating the point as a feature data point.
102. Computer executable steps as claimed in Claim 81 wherein the first data-processing in Step 2 comprises determining at least one point of high local curvature of the one or more light intensity values, and designating the point as a feature data point.
103. Computer executable steps as claimed in Claim 81, wherein the object data points are derived by 3-dimensionally imaging the scene from one or more viewpoints, and, for each of the viewpoints, generating a 3■'dimensional image of the object, each image comprising an array of pixels, each pixel corresponding to a viewable point in the scene at a range distance from the viewpoint and being characterised by one or more light intensity values and 3 -dimensional coordinates in the space.
104. Computer executable steps as claimed in Claim 103, wherein each viewpoint is characterised by position coordinates, an orientation in the space, and predefined viewing properties.
105. Computer executable steps as claimed in Claim 104, wherein, either during the calculating of the position coordinates for each viewpoint, or after the calculating of the position coordinates for all viewpoints, the object data points are isolated for the object.
106. Computer executable steps as claim in Claim 103, wherein the range distance between each viewpoint and viewable point in the scene is capped, so as to always be equal or below a predetermined maximum range distance value when calculating the 3 -dimensional image of the object.
107. Computer executable steps as claim in Claim 103, wherein if the range distance of a viewable point in the scene is above a certain threshold, one of the coordinates of the corresponding object data point will be set to a predetermined maximum value.
108. Computer executable steps as claimed in Claim 103, wherein the object data points are isolated by determining boundaries of the object in the image based on a boolean depth contrast map derived by applying an arbitrary threshold to a depth contrast map of the image, or a boolean depth gradient contrast map derived by applying an. arbitrary threshold to a depth gradient contrast map of the image, or a boolean luminosity (overall light intensity) contrast map derived by applying an arbitrary threshold to a luminosity contrast map of the image, or a linear or a non-linear functional combination of these maps.
109. Computer executable steps as claimed in .Claim 104, wherein the position coordinates and the orientation of each viewpoint in the space are predefined for a scene.
110. Computer executable steps as claimed in Claim 103 , wherein at least one of the viewable points reside on a surface of the object.
111. Computer executable steps as claimed in Claim 81, wherein the object data points are derived by 3 -dimensional imaging of the scene.
112. Computer executable steps as claim in Claim 111, wherein one or more of the object data points have one of their coordinates set to a predetermined maximum value when 3 -dimensional imaging of the scene.
1 13. Computer executable steps as claimed in Claim 103 wherein, for each viewpoint, the 3- dimensional imaging comprises creating at least two 2-dimensional sub-images from two alternative viewpoints, slightly offset by a predetermined distance either side of the viewpoint, either sequentially or using at least two corresponding imaging systems, and a resulting parallax offset between the locations of the corresponding viewable point in the resulting pixels arrays of the sub-images, and/or the difference between the corresponding one or more light intensity values, is used to calculate the range distance of the viewable point from the viewpoint.
1 14. Computer executable steps as claimed in Claim 104, wherein the 3-dimensional coordinates in the space of the viewable point is derived based on the location of its corresponding pixel in the array, the range distance of the viewable point from the viewpoint, and the position coordinates, the orientation, and the viewing properties of the viewpoint.
115. Computer executable steps as claimed in Claim 104, wherein the viewing properties comprise a view width angle and a view height angle.
1 16. Computer executable steps as claimed in Claim 81 wherein the first data processing in Step 2 is performed using hardware acceleration, such as available in graphics processors.
1 17. Computer executable steps as claimed in Claim 86, wherein the transformation function in Step 3 comprises shifting the position of an origin of the axis system to a new origin position and reorientating the axis system to a new axis orientation such that, in a new axis system (Χ', Υ', Z") defined by a first new axis (X')> a second new axis (Υ'), and a third new axis (Ζ'), the new axis orientation and new origin position are based on the 3-dimensional orientation of the reference plan and the 3-dimensional orientation and position of the reference side of the object triangle.
118. Computer executable steps as claimed in Claim 117, wherein the new axis system (Χ', Υ', Ζ') is positioned and orientated such that the third new axis (Ζ') is aligned perpendicular to the reference plane of the object triangle and passes through a mid-point of the reference side, the first new axis (Χ') is aligned parallel to the reference side, and the second new axis (Y1) is directed through the mid-point of the reference side in the direction of the remaining apex of the object triangle.
119. Computer executable steps as claimed in Claim 117, wherein the new axis system (Χ', Υ', Ζ') is also scaled or the new origin position is shifted in the Z' direction, based upon the length of the reference side.
120. Computer executable steps as claimed in Claim 86, wherein the transformation function in Step 3 comprises rotating the one or more object data points based upon the 3 -dimensional orientation of the reference plane, rotating the one or more object data points based upon the 3-dimensional orientation of the reference side, and translating the one or more object data points based upon the 3-dimensional position of the reference side.
121. Computer executable steps as claimed in Claim 120, wherein the one or more object data points are also scaled or translated, based upon the length of the reference side.
122. Computer executable steps as claimed in Claim 120, wherein the transformation function in Step 3 also comprises scaling and shearing the one or more object data points based upon the three apexes of the object triangle, such that the apexes of the object triangle, as transformed, are made Coincident with the apexes of a predefined triangle.
123. Computer executable steps as claimed in Claim 122, wherein the reference side of the object triangle is transformed to be the same length as a predefined side of the predefined triangle, and the perpendicular distance between the reference side and the third apex of the object triangle is transformed to be the same as the perpendicular distance between the predefined side and the third apex of the predefined triangle.
124. Computer executable steps as claimed in Claim 81 wherein, in Step 3 , a unique transformation index is assigned for the transformation function.
125. Computer executable steps as claimed in Claim 81 wherein the transformation function in Step 3 is performed using hardware acceleration, such as available in graphics processors.
126. Computer executable steps as claimed in Claim 81 wherein the second data-processing in Step 3
comprises creating an image of the transformed object data points.
127. Computer executable steps as claimed in Claim 126 wherein the image is created from an interpolated or non-interpolated 3-dimensional surface mesh defined by 3 -dimensional surface mesh points, the 3-dimensional coordinates of each surface mesh point corresponding to the coordinates of a transformed object data point, or an interpolation of the coordinates of one or more transformed object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby transformed object data points.
128. Computer executable steps as claimed in Claim 81 wherein the set of object data points is used to form an untransformed interpolated or non-interpolated 3-dimensional surface mesh defined by 3- dimensional surface mesh points, the 3-dimensipnal coordinates of each surface mesh point corresponding to the coordinates of an object data point, or an interpolation of the coordinates of one or more object data points, each polygon of the surface mesh formed by proximate surface mesh points and having associated with it one or more light intensity values derived as a function of the one or more light intensity values of the nearby object data points.
129. Computer executable steps as claimed in Claim 128, wherein the surface mesh points of the untransformed interpolated or non-interpolated 3-dimensional surface mesh are transformed with the set of object data points to produce an interpolated or non-interpolated 3-dimensional surface mesh.
130. Computer executable steps as claimed in Claim 128, wherein the 3-dimensional transformation function is equivalent to viewing the untransformed interpolated or non-interpolated 3-dimensional surface mesh from a virtual viewpoint, wherein the virtual viewpoint is a function of the set of three feature data points.
131. Computer executable steps as claimed in Claim 126 wherein the image comprises a depth map or
a function of the depth map calculated based upon the 3-dimensional coordinates of the transformed object data points.
132. Computer executable steps as claimed in Claim 126 wherein the image comprises a luminosity map of one or more light intensity values (such as an RGB map or a YCbCr map) or a function of the luminosity map calculated based upon the 3-dimensional coordinates of, and the one or more light intensity values of, the transformed object data points.
133. Computer executable steps as claimed in Claim 126 wherein the image is a 2-dimensional image.
134. Computer executable steps as claimed in Claim 126 wherein the image is a 3-dimensional image.
135. Computer executable steps as claimed in Claim 126 wherein the set of characterising data comprises one or more subsets, each subset comprising the image, or a function of the image.
136. Computer executable steps as claimed in Claim 135 wherein the function of the image is a spacial convolution function, and this is applied to the image to generate one or more convolution coefficients; or a function of the one or more convolution coefficients.
137. Computer executable steps as claimed in Claim 136 wherein the function of the one or more convolution coefficients is a binning function.
138. Computer executable steps as claimed in Claim 137 wherein the binning function is also a function of a subset index number uniquely identifying a subset of the set of characterising data such that, for one or more of the convolution coefficients, an adjacent bin is filled instead of an optimum bin.
139. Computer executable steps as claimed in Claim 135. wherein each of the one or more subsets are represented as a 1 -dimensional binary string.
140. Computer executable steps as claimed in Claim 127 wherein the image is generated by generating a virtual image of the surface mesh points from a particular virtual viewpoint, the virtual image comprising virtual image data points, each virtual image data point comprising one or more light intensity values of a viewable surface mesh point.
141. Computer executable steps as claimed in Claim 140 wherein a new axis system is defined by a first new axis (Χ'), second new axis (Υ') and third new axis (Ζ'), a new axis orientation and a new origin position being a function of the set of three feature data points, the viewable surface mesh points are generated by interpolating the surface mesh at particular coordinate intervals along the first new axis (X1) and along the second new axis (V), or along the first new axis (X1) and along the second new axis (Υ') and along the third new axis (Ζ'), each viewable surface mesh point corresponding to a point on the surface mesh and having one or more light intensity values derived as a function of the light intensity values of the transformed object data points or polygon of the surface mesh in close proximity to the point, and coordinates derived as a function of the coordinates of the transformed object data points or surface mesh points in close proximity to the point.
142. Computer executable steps as claimed in Claim 81 wherein the second data processing in Step 3 is performed using hardware.acceleration, such as available in graphics processors.
1 3. Computer executable steps as claimed in Claim 89 wherein the set of characterising data in Step 3 also includes data relating to the object index.
144. Computer executable steps as claimed in Claim 124 wherein the set of characterising data in Step 3 also includes data relating to the transformation index.
145. Computer executable steps as claimed in Claim 81 wherein the comparison algorithm comprises determining if one of the one or more sets of characterising data comprising the training data, and one of one or more sets of characterising data comprising the test data, are substantially equal.
146. Computer executable steps as claimed in Claim 81 wherein the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more sets of characterising data comprising the training data, and the one or more sets of characterising data comprising the test data.
147. Computer executable steps as claimed in Claim 81 wherein the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises determining if one or more of the subsets of the sets of characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data, are substantially equal.
148. Computer executable steps as claimed in Claim 81 wherein the one or more sets of characterising data each comprise one or more subsets, and the comparison algorithm comprises using a neural network to determine the degree of equality of the one or more subsets of the sets characterising data comprising the training data, and the one or more subsets of the sets of characterising data comprising the test data.
•149. Computer executable steps as claimed in Claim 135 wherein the function of the image of the' transformed object data points is a neural network or a decision tree.
150. Computer executable steps as claimed in Claim 136 wherein the function of the one or more convolution coefficients is a neural network or a decision tree.
151. Computer executable steps as claimed in Claim 135 wherein the function of the image of the transformed object data points is an image processing function (such as a Gaussian filter) resulting in
significant quantization, and hence spatial resolution or intensity resolution reduction, of the output, and where this quantization is insensitive to slight spatial or intensity variations in the input data.
152. Computer executable steps as claimed in Claim 81 wherein the transformation function in Step 3 is a geometric hashing function or an affine transformation.
153. Computer executable steps as claimed in Claim 81 wherein the set of characterising data also includes one or more object data points.
154. Computer executable steps as claimed in Claim 81 wherein the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database.
155. Computer executable steps as claimed in Claim 154 wherein a first index or set of indices for the indexed database is a function of the respective set of characterising data.
156. Computer executable steps as claimed in Claim 81 wherein the one or more sets of characterising data generated as training data are stored in a single or multidimensional hash-table
157. Computer executable steps as claimed in Claim 135 wherein the one or more sets of characterising data generated as training data are stored in a single or multidimensional indexed database
158. Computer executable steps as claimed in Claim 157 wherein a first index or set of indices for the indexed database is one Of the one or more subsets of the respective set of characterising data.
159. Computer executable steps as claimed in Claim 155 wherein a set of characterising data generated as test data is compared with the training data by performing a lookup o search within the indexed database based upon a second index or set of indices, where the second index or set of indices is also a function of the set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
160. Computer executable steps as claimed in Claim 158 wherein a set of characterising data generated as test data is compared with the training data by performing a lookup or search within the indexed database based upon a second index or set of indices, where the second index or set of indices is one of the one or more subsets of the respective set of characterising data, and the first index or set of indices and the second index or set of indices are either identical or similar.
161. Computer readable code for carrying out the computer executable steps of Claim 81 when run on the one or more computers.
162. One or more computer readable mediums comprising the computer readable code of Claim 161.
163. One or more computers adapted for recognising an object, comprising:
- respective one or more processors;
- respective computer readable memory operatively connected to the respective one or more processors and storing the computer readable code of Claim 161
- respective one or more scene data input systems for receiving the scene data; and
- an output system for outputting the comparison data.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2010900180 | 2010-01-19 | ||
AU2010900180A AU2010900180A0 (en) | 2010-01-19 | Object Recognition Method | |
AU2010903967 | 2010-09-05 | ||
AU2010903967A AU2010903967A0 (en) | 2010-09-05 | Object Recognition Method | |
AU2010904309A AU2010904309A0 (en) | 2010-09-24 | Object Recognition Method | |
AU2010904309 | 2010-09-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011088497A1 true WO2011088497A1 (en) | 2011-07-28 |
Family
ID=44306288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2011/000028 WO2011088497A1 (en) | 2010-01-19 | 2011-01-12 | Object recognition method and computer system |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2011088497A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016141282A1 (en) * | 2015-03-04 | 2016-09-09 | The Regents Of The University Of California | Convolutional neural network with tree pooling and tree feature map selection |
CN112639819A (en) * | 2018-07-05 | 2021-04-09 | 优创半导体科技有限公司 | Object detection using multiple sensors and reduced complexity neural networks |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724447A (en) * | 1994-05-17 | 1998-03-03 | Olympus Optical Co., Ltd. | Optical transform system for three-dimensional object recognition |
US20060204079A1 (en) * | 2005-03-10 | 2006-09-14 | Kabushiki Kaisha Toshiba | Pattern recognition apparatus and method |
US20080247651A1 (en) * | 2007-04-09 | 2008-10-09 | Denso Corporation | Apparatus for recognizing object in image |
US20090167760A1 (en) * | 2007-12-27 | 2009-07-02 | Nokia Corporation | Triangle Mesh Based Image Descriptor |
EP2133838A1 (en) * | 2007-03-09 | 2009-12-16 | Omron Corporation | Recognition processing method and image processing device using the same |
-
2011
- 2011-01-12 WO PCT/AU2011/000028 patent/WO2011088497A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5724447A (en) * | 1994-05-17 | 1998-03-03 | Olympus Optical Co., Ltd. | Optical transform system for three-dimensional object recognition |
US20060204079A1 (en) * | 2005-03-10 | 2006-09-14 | Kabushiki Kaisha Toshiba | Pattern recognition apparatus and method |
EP2133838A1 (en) * | 2007-03-09 | 2009-12-16 | Omron Corporation | Recognition processing method and image processing device using the same |
US20080247651A1 (en) * | 2007-04-09 | 2008-10-09 | Denso Corporation | Apparatus for recognizing object in image |
US20090167760A1 (en) * | 2007-12-27 | 2009-07-02 | Nokia Corporation | Triangle Mesh Based Image Descriptor |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016141282A1 (en) * | 2015-03-04 | 2016-09-09 | The Regents Of The University Of California | Convolutional neural network with tree pooling and tree feature map selection |
CN112639819A (en) * | 2018-07-05 | 2021-04-09 | 优创半导体科技有限公司 | Object detection using multiple sensors and reduced complexity neural networks |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Suhail et al. | Light field neural rendering | |
Yang et al. | Go-ICP: A globally optimal solution to 3D ICP point-set registration | |
TWI569229B (en) | Method for registering data | |
US20030091227A1 (en) | 3-D reconstruction engine | |
CN109544681B (en) | Fruit three-dimensional digitization method based on point cloud | |
US7149368B2 (en) | System and method for synthesis of bidirectional texture functions on arbitrary surfaces | |
CN101866497A (en) | Binocular stereo vision based intelligent three-dimensional human face rebuilding method and system | |
CN104794713B (en) | Chamber crop digitalized image method based on ARM and binocular vision | |
WO2015006224A1 (en) | Real-time 3d computer vision processing engine for object recognition, reconstruction, and analysis | |
KR20140060454A (en) | Backfilling points in a point cloud | |
CN111145338B (en) | Chair model reconstruction method and system based on single-view RGB image | |
Bewley et al. | Advantages of exploiting projection structure for segmenting dense 3D point clouds | |
US11935209B1 (en) | Systems and methods for dynamic backfilling of a three-dimensional object | |
CN112330530A (en) | Image processing method, device, equipment and storage medium | |
JP7432793B1 (en) | Mapping methods, devices, chips and module devices based on three-dimensional point clouds | |
WO2011088497A1 (en) | Object recognition method and computer system | |
Heisterklaus et al. | Image-based pose estimation using a compact 3d model | |
Giachetti et al. | SHREC15 Track: Retrieval of non-rigid (textured) shapes using low quality 3D models | |
Madhusudana et al. | Revisiting dead leaves model: Training with synthetic data | |
Hlubik et al. | Advanced point cloud estimation based on multiple view geometry | |
US12039771B2 (en) | Methods and apparatuses for detecting a change in an area of interest | |
CN114723915A (en) | Dense point cloud generation method based on multi-view infrared | |
CN117523547B (en) | Three-dimensional scene semantic perception method, system, equipment and medium | |
US12125164B1 (en) | Systems and methods for dynamic backfilling of a three-dimensional object | |
CN117291944B (en) | Image processing method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11734239 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11734239 Country of ref document: EP Kind code of ref document: A1 |