DESCRIPTION
Method for Describing Planar Curves Using Morphological Scale Spaces Field of the invention
The invention relates to a method to construct a descriptor from binary silhouette images. The input image is a black-and-white (binary) image including the silhouette (a complete closed region of white pixels) of an object in front of a complete black background. The method calculates curvatures for both the Gaussian Scale-space and the morphological scale-space of the curve. Then these planar orientations are mapped to a color image, in order to describe the silhouette object. This descriptor can be used for object recognition purposes. Background of the invention
The United States patent document US6711293, an application in the state of the art, discloses a method to detect salient features on images, in which difference of Gaussians are used to construct a scale-space. The present invention resembles the mentioned method in the sense that it performs feature extraction using a scale- space constructed from difference of Gaussians. However the present invention generates a scale-space using (a continuous mathematical representation of) closed curves and a curvature operator. However the mentioned method generates the scale-space using difference of Gaussians of image pixels. In addition, the output of the mentioned method is a list of salient points on the image, whereas the output of the present invention is a list salient point on the contours of a silhouette specifically.
The academic paper "Scale-based Description and Reconition of Planar Curves and Two-dimensional Shapes" (F.Mokhtarian and A.Macworth) discloses a method for constructing a representation for closed curves using a scale-space
representation. The mentioned method resembles the present invention in the sense that it uses "circle of curvature" values to construct the scale-space. However the present method uses difference of the levels of the scale-space and unlike the mentioned method, identifies salient points over the curve, with their scale-information.
A patent application PCT/IB2012/050883 System and Method for Identifying Scale Invariant Features of Object Outlines on Images resembles the present application in the sense that it constructs a descriptor of the silhouette image using the orientations of the extracted feature points. However the present application also uses the morphological scale-space of the curve, thus introduces more informative description.
The United State patent application US2010080469 discloses system and method of generating feature descriptors for image identification. Input image is Gaussian-blurred at different scales. A difference of Gaussian space is obtained from differences of adjacent Gaussian-blurred images. Key points are identified in the difference-of-Gaussian space. For each key point, primary sampling points are defined with three dimensional relative positions from key point and reaching into planes of different scales. Secondary sampling points are identified for each primary sampling point. Secondary image gradients are obtained between an image at a primary sampling point and images at secondary sampling points corresponding to this primary sampling point. Secondary image gradients form components of primary image gradients at primary sampling points. Primary image gradients are concatenated to obtain a descriptor vector for input image. Descriptor vector thus obtained is scale invariant and requires a number of additions equal to number of primary sampling points multiplied by a number of secondary sampling points. The United State patent application US2013223730 discloses a feature descriptor extracting method in a feature descriptor extracting apparatus is provided. The
feature descriptor extracting method involves receiving an image from which a feature descriptor will be extracted, extracting a point at which a change in a pixel statistical value of the image is large as a feature point, and extracting a patch centered on the feature point, blocking the patch to calculate a statistical value of each of a plurality of patch blocks, calculating a morphological gradient by using a statistical value of the block-converted patch, and extracting a feature descriptor by using the morphological gradient in consideration of required feature descriptor complexity. The United State patent application US20040184677 discloses a method detects silhouette edges in images. An ambient image is acquired of a scene with ambient light. A set of illuminated images is also acquired of the scene. Each illuminated image is acquired with a different light source illuminating the scene. The ambient image is combined with the set of illuminated to detect cast shadows, and silhouette edge pixels are located from the cast shadows.
Objects of the invention
The object of the invention is to provide a method to construct a descriptor from binary silhouette images.
Another object of the invention is to construct the orientations of all points of the silhouette in all morphological scale levels. Another object of the invention is to provide fast recognition with the learning distance vector.
Detailed description of the invention A method for describing planar curves using morphological scale spaces in order to fulfill the objects of the present invention is illustrated in the attached figures,
where:
Figure 1 is the flowchart of the method for describing planar curves using morphological scale spaces
Figure 2 is the flowchart of the checking the type of distance vector and calculating the distance
Figure 3 is the transaction for GSS of the curve to Orientation vector calculation to Orientation scale-space
A method for describing planar curves using morphological scale spaces (100) comprises the steps of;
taking input data from the camera means and creating the curve from input data (101),
sampling the arc-length of the curve by using continuous representation with the formula of parametric curve (102),
constructing the orientation scale-space with the variable- scale Gaussian function with parametric curve and orientation angle (103),
- combining all local information, which are created in step 102 and step
103, and creating silhouette orientation images (104),
finding the minimum distance match for two silhouettes which are created in step 104 (105),
applying closing operation to the multiple levels of the silhouette's morphological scale-space and obtaining new scale-space which has the binary silhouette with operators with increasing size (106),
matching the calculations which are found in step 105 and step 106 (107), checking the type of distance vector and calculating the distance (108), sending output to the imaging means (109).
In the method for describing planar curves using morphological scale spaces (100), the step "checking the type of distance vector and calculating the distance (108)" comprises the sub-steps of;
if the distance is linear, then the weighed linear sum of the distance vector is calculated to obtain a scalar distance value (201),
if the distance is non-linear, then training an artifical neural network is used on the non-linear distance (202).
The shape of an object is usually obtained via a segmentation operation which outputs a binary silhouette and/or a contour. This contour is a closed planar curve sampled in pixel coordinates. In step 102, "sampling the arc-length of the curve by using continuous representation with the formula of parametric curve", a uniform-length parametrization is useful if a scale-space of the curve is to be constructed. For this purpose a continuous representation (B-spline) is used with the equation 1;
^ (Eqn. 1)
In the equation 1, C(r) stands for the parametric curve, whereas Pi is the ith control point and Bi;k is the kth order basis function for the ith control point. If the equation is written in matrix form; (» = J{tr ■■ P
(Eqn. 2) In the equation 2, P is the N-by-2 control point matrix and J is the L-by-N basis matrix (for L number of pixels of the silhouette). Thus, by using the L number of pixels of the silhouette (i.e. C(r)) and by calculating basis function for L silhouette pixels (i.e. J(k, r) matrix), we can calculate the control points matrix P, which is our continuous representation. For each row of J matrix, the r parameter of each pixel must be known. For this purpose, first the chain code of the closed curve is
extracted. The chain code carries the distance between two neighboring pixels (1 or 2 units). Starting from an arbitrary point, the r parameter is calculated using the arc-length (i.e. the chain code). We make pseudo-inversing to calculate the control points, assuming no ill-conditions with equation 3; (Eqn. 3)
If the silhouette is obtained via an active contours based method (i.e. as a result of an automatic or semi-automatic object segmentation operation), in which the curve is already defined with a parametric model (such as in Brigger et al. (2000) [1]), curve fitting step is not needed. Using a parametric representation such as a continuous representation, it is very easy to uniformly sample the curve. If the r parameter is chosen uniformly between 0 to rmax, arcs of uniform length can be obtained. Each object contour is sampled into 512 numbers of points, which divide the curve into 512 equal length arcs. It is also possible to use affine-length parametrization (such as in Awrangjeb et al. (2007) [2]); however since it is not the CSS (contour scale-space) but the GSS (Gaussian scale-space) that the method requires and since affine-length parametrization is more fragile under noise, arc- length parametrization is preferred. In addition, the proposed method performs better when the curves are sampled in arc-length.
In step 103, "constructing the orientation scale-space with the variable-scale Gaussian functions with parametric curve and orientation angle"; the orientation angle at a point is defined as the angle between the orientation vector and the x- axis, where the orientation vector is the unit vector perpendicular to the tangential line at that point;
(Eqn. 4)
In the equation 4, 'x and 'y denote the first derivatives of the x and y components of the closed curve C(r) along the curve parameter r. Since O(r) can take values
from 0 to 2π radian; atan2 function (a two argument variation of the arctangent function that can distinguish diametrically opposite directions) is used. Consequently, the scale- space of a curve L(r, σ) is defined as:
In the equation 5, L(r, σ) is the convolution of the variable-scale Gaussian function g(r, σ) (σ being the standard deviation) with the parametric curve C(r). Similarly, the orientation scale-space (OSS) 0(r, σ) can be defined as in equation
(Eqn. 6) The initial curve is incrementally convolved with Gaussian to produce curves separated by a constant factor k in scale- space, shown stacked in the left column (Figure 3). Similar to Lowe (2004) [3], each octave of scale-space (i.e. doubling of σ) is divided into s number of intervals. Once a complete octave is constructed, the Gaussian curve that has twice the initial value of σ is re-sampled into half. The middle column shows the orientation vectors calculated for each sampled point at each interval of each octave. Consequently for higher octaves (o>l), the sequence of orientation angle values are up sampled into highest resolution (512). Then starting from the same point (r=0) at each interval, 512 orientation angle values are stacked on top of each other and a (o- s)-by-(512) matrix of orientation angle values is obtained. This matrix is called the orientation scale-space (OSS) and is depicted in the right column (Figure 3).
In the step 104, "combining all local information, which are created in step 102 and step 103, and creating silhouette orientation images"; extracting orientation angle at a point provides local information. In order to globally define the silhouette, all the local information should be combined in such a way that the representation carries all that the local pieces posses, while staying invariant under
certain transformations.
In step 105, "finding the minimum distance match for two silhouettes which are created in step 104", the distance D between two SOIs are calculated, the hue differences between the corresponding pixels (at most 0.5 along the hue circle) are accumulated and normalized;
In the equation 7, the overall distance Da,b between two SOIs takes values from 0 to 1. SOIs are scale and resolution invariant. Hence, for a scaled or sampled version of the silhouette image, the curve fitting and arc-length parametrization steps virtually construct the same OSS. However the starting point invariance, that is to say, the uncertainty of the position of the first point r0=0 while fitting the curve and the rotation invariance under planar rotations must be handled before two SOIs can be matched. The radial SOI of two identical silhouettes with different starting points will be rotated versions of each other because for radial SOI the radial axis determines the parametric position r. Thus, we may satisfy starting point invariance by searching for a minimum distance match by rotating one of the SOIs.
When a silhouette is introduced with an in-plane rotation, theoretically the relative positions of the contour pixels do not change. However the orientation angles of all pixels are rotated with the same amount. Thus, the hue values for each pixel of the SOI change by the same amount along the hue circle. Since the hue values linearly map to orientation angles, by checking the hue shift between two SOIs, the amount of in-plane rotation can be retrieved. An in-plane rotation may affect the curve fitting algorithm and the starting point may probably change for a rotated version of the silhouette. For this reason, whenever a hue shift check is carried out, a starting point invariance search should also be applied. Thus the
search becomes two dimensional, where both hue channel and the radial SOI is rotated in order to find the minimum distance match for two silhouettes with the equation 8;
I ^ £s¾, y¾ I = ergtn r)}
(Eqn. 8)
Two silhouettes and their radial SOIs are depicted. The first silhouette is the 20° rotated version of the other. Experiments show that for a 20 ° rotation, the best a obtained from equation 8, corresponds to the transformed silhouette whose hue channel is shifted by 20/360 (i.e. approximately 2 pixels shift when M is 32). The rotation angle can be retrieved as accurately as the resolution of the SOI permits since positions are quantized into M.
In some cases, limited robustness to rotation can be sufficient. For instance, an in- stabilized platform may experience in-plane rotation (camera roll). In that case, the hue channel search can be limited to +/- 1/12 so that the rotation invariance capability is adjusted according to the needs of the problem. This way the computation burden is lightened as well.
In step 106, "applying closing operation to the multiple levels of the silhouette's morphological scale-space and obtaining new scale-space which has the binary silhouette with operators with increasing size"; the silhouettes of the same class will have similar orientation distribution along their boundaries. Although this happens to be true for most of the cases, when silhouettes have small articulated parts or unexpected discontinuities, matching may not be performed. In order to overcome this problem, the proposed representation is applied to multiple levels of the silhouette's morphological scale-space (MSS). This new scale-space is obtained simply by closing (dilation + erosion) the binary silhouette with operators of increasing size (Equation 9). The closing operation is applied on the binary image before the chain code is extracted.
MO
(Eqn. 9)
In the equation 9, the · operator denotes the morphological closing operation which is applied to binary silhouette B(x, y). The structuring element f (·, ·) is parametrized by pixel size o. At each MSS level, o is increased such that the closing operations affect a larger region. In our experiments o is k-20 pixels, where k is the MSS level starting from 0.
By applying minimum distance formula to mutually corresponding levels of the MSS of the two silhouettes, an extended distance feature vector can be obtained:
£¾ ¾(% ?¾), £i¾s f; rm m> rs
(Eqn. 10) In the equation 10, Di a; b (a; r) denotes the distance between the SOIs of silhouettes a and b extracted from their corresponding ith MSS level, which ranges from 0 to m.
In the step 108, "checking the type of distance vector and calculating the distance", the distance feature vector is calculated between two silhouettes and search in rotation invariance dimension can be limited according to the needs of the problem, the computational complexity of this step is trivial compared to other methods in the literature that include dynamic programming and inner distance calculation (Ling and Jacobs, 2007 [4] ). The mutual distance between two planar curves is defined by the vector Da;b. Using this vector, a classifier can be trained which will cluster different categories of silhouettes. Since the vector to be learned is not a self-descriptor, but a mutual distance definition; these types of problems are referred as distance learning problems. In the step 201, "if the distance is linear, then the weighed linear sum of the distance vector is calculated to obtain a scalar distance value", the weighted linear
sum of the distance vector Da;b is calculated to obtain a scalar distance value da;b.
(Eqn. 11)
In order to estimate the optimum weight vector w, the cost function equation 11 is solved for a training set of distance vectors.
In the equation 12, la;b is the label of the training vector da;b. If a and b have the same category, la;b is 0. It is 1 if they are not.
In the step 202, "if the distance is non-linear, then training an artifical neural network is used on the non-linear distance", using a linearly weighted sum of the distance vectors Da;b, the distance categories within the Da;b space are linearly separable. However this complex space that is constructed by using the Gaussian and morphological space-spaces of curves may be consisting of categories which are expectedly clustered in a nonlinear geometry. For this reason, it is logical to check the performance of a non-linear distance classifier and compare it with the linearly weighted model. For this purpose, an artifical neural network with 3(m+l) input nodes (where (m+1) = 4 is the number of MSS layers in Equation 10), h number of hidden layer nodes and a single output layer node, is trained.
References
[IJ.Brigger, P., Hoeg, J., Unser, M., 2000. B-spline snakes: A flexible tool for parametric contour detection. IEEE Transactions on Image Processing 9, 1484-1496.
[2]. Awrangjeb, M., Lu, G., Murshed, M., 2007. An affine resilient curvature scale-space corner detector, in: 32. IEEE Int. Conf . Acoustics, Speech and Signal Processing (ICASSP 2007), pp. 1233-1236.
[3]. Lowe, D., 2004. Distinctive image features from local scale-invariant keypoints. International Journal of Computer Vision 2, 91-110.
[4]. Ling, H., Jacobs, D.W., 2007. Shape classification using the inner- distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 286-299.