US7412429B1  Method for data classification by kernel density shape interpolation of clusters  Google Patents
Method for data classification by kernel density shape interpolation of clusters Download PDFInfo
 Publication number
 US7412429B1 US7412429B1 US11/940,739 US94073907A US7412429B1 US 7412429 B1 US7412429 B1 US 7412429B1 US 94073907 A US94073907 A US 94073907A US 7412429 B1 US7412429 B1 US 7412429B1
 Authority
 US
 United States
 Prior art keywords
 cluster
 estimate value
 density estimate
 clusters
 method
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 238000004458 analytical methods Methods 0 description 1
 238000004422 calculation algorithm Methods 0 description 1
 238000006243 chemical reaction Methods 0 description 1
 238000004891 communication Methods 0 description 1
 238000004590 computer program Methods 0 description 6
 238000010276 construction Methods 0 description 1
 230000000875 corresponding Effects 0 description 1
 238000007405 data analysis Methods 0 description 1
 238000007418 data mining Methods 0 description 6
 238000003066 decision tree Methods 0 description 2
 230000001419 dependent Effects 0 description 2
 238000002059 diagnostic imaging Methods 0 description 1
 238000009826 distribution Methods 0 description 2
 230000000694 effects Effects 0 description 1
 230000004438 eyesight Effects 0 description 1
 230000014509 gene expression Effects 0 description 1
 238000010191 image analysis Methods 0 description 1
 239000011133 lead Substances 0 description 1
 239000010912 leaf Substances 0 description 1
 230000013016 learning Effects 0 description 1
 230000000670 limiting Effects 0 description 2
 238000004519 manufacturing process Methods 0 description 2
 239000000463 materials Substances 0 description 1
 238000005259 measurements Methods 0 description 1
 239000002609 media Substances 0 description 8
 230000015654 memory Effects 0 description 3
 238000000034 methods Methods 0 description 15
 239000000203 mixtures Substances 0 description 5
 238000006011 modification Methods 0 description 1
 230000004048 modification Effects 0 description 1
 230000003287 optical Effects 0 description 1
 238000005192 partition Methods 0 claims description 2
 238000003909 pattern recognition Methods 0 description 1
 230000002093 peripheral Effects 0 description 2
 239000000047 products Substances 0 description 1
 230000004224 protection Effects 0 description 1
 230000000717 retained Effects 0 description 1
 238000005070 sampling Methods 0 description 2
 239000004065 semiconductor Substances 0 description 1
 238000000926 separation method Methods 0 description 1
 238000000638 solvent extraction Methods 0 description 1
 230000003595 spectral Effects 0 description 1
 238000003860 storage Methods 0 description 3
 230000001131 transforming Effects 0 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/62—Methods or arrangements for recognition using electronic means
 G06K9/6267—Classification techniques
 G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches
 G06K9/627—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches based on distances between the pattern to be recognised and training or reference patterns
 G06K9/6271—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches based on distances between the pattern to be recognised and training or reference patterns based on distances to prototypes
 G06K9/6272—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches based on distances between the pattern to be recognised and training or reference patterns based on distances to prototypes based on distances to cluster centroïds
 G06K9/6273—Smoothing the distance, e.g. Radial Basis Function Networks

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/62—Methods or arrangements for recognition using electronic means
 G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
 G06K9/6218—Clustering techniques
 G06K9/622—Nonhierarchical partitioning techniques
 G06K9/6226—Nonhierarchical partitioning techniques based on the modelling of probability density functions

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N20/00—Machine learning
Abstract
Description
1. Field of the Invention
Exemplary embodiments of the present invention relate to data classification, and more particularly, to shape interpolation of clustered data.
2. Description of Background
Data mining involves sorting through large amounts of data and extracting relevant predictive information. Traditionally used by business intelligence organizations and financial analysts, data mining is increasingly being used in the sciences to extract information from the enormous datasets that are generated by modern experimental and observational methods. Data mining can be used to identify trends within data that go beyond simple analysis through the use of sophisticated algorithms.
Many data mining applications depend on the partitioning data elements into related subsets. Therefore, classification and clustering are important tasks in data mining. Clustering is the unsupervised categorization of objects into different groups, or more precisely, the organizing of a collection of patterns (usually represented as a vector of measurements, or a point in a multidimensional space) into clusters based on similarity. A cluster is a collection of objects that are “similar” between them and “dissimilar” to the objects belonging to other clusters. The goal of clustering is to determine an intrinsic grouping, or structure, in a set of unlabeled data. Clustering can be used to perform statistical data analysis in many fields, including machine learning, data mining, document retrieval, pattern recognition, medical imaging and other image analysis, and bioinformatics.
Classification is a statistical procedure in which individual items are placed into groups based on quantitative information on one or more traits inherent in the items and based on a training set of previously labeled (or preclassified) patterns. As with clustering, a dataset is divided into groups based upon proximity such that the members of each group are as “close” as possible to one another, and different groups are as “far” as possible from one another, where distance is measured with respect to specific trait(s) that are being analyzed.
An important difference should be noted when comparing clustering and classification. In classification, a collection of labeled patterns is provided, and the problem is to label a newly encountered, yet unlabeled, pattern. Typically, the given training patterns are used to learn the descriptions of classes, which in turn are used to label a new pattern. In the case of clustering, the problem is to group a given collection of unlabeled patterns into meaningful clusters. In a sense, clusters can be seen as labeled patterns that are obtained solely from the data. Therefore, classification often succeeds clustering, although classification may also be performed without explicit clustering (for example, Support Vector Machine classification, described below). In situations in which classification is performed once the clusters have been identified, new data is typically classified by projecting the data into the multidimensional space of clusters and classifying the new data point based on proximity, that is, distance, to the nearest cluster centroid. The centroid of cluster having a finite set of points can be computed as the arithmetic mean of each coordinate of the points.
The variety of techniques for representing data, measuring proximity between data elements, and grouping data elements has produced a rich assortment of classification and clustering methods.
In Support Vector Machine classification (SVM), when classifying a new data point based on proximity, the distance is taken to the nearest data points coming from the clusters (even though there is no explicit representation of the cluster) called support vectors. Each new data point is represented by a pdimensional input vector (a list of p numbers) that is mapped to a higher dimensional space where a maximal separating hyperplane is constructed. Each of these data points belongs to only one of two classes. Two parallel hyperplanes are constructed on each side of the hyperplane that separates the data. SVM aims to separate the classes with a “p minus 1”dimensional hyperplane. To achieve maximum separation between the two classes, a separating hyperplane is selected that maximizes the distance between the two parallel hyperplanes. That is, the nearest distance between a point in one separated hyperplane and a point in the other separated hyperplane is maximized.
In fuzzy clustering, data elements can belong to more than one cluster, and cluster membership is based on proximity test to each cluster. Associated with each element is a set of membership levels that indicate the strength of the association between that data element and the particular clusters of which it is a member. The process of fuzzy clustering involves assigning these membership levels and then using them to assign data elements to one or more clusters. Thus, points on the edge of a cluster may be in the cluster to a lesser degree than points in the center of cluster.
In categorical classification methods based on decision tree variants, the classification is based on the likelihood of the data point coming from any of the clusters based on the sharing of attribute values. Using a decision tree model, observations about an item are mapped to conclusions about its target cluster. In these tree structures, leaves represent classifications and branches represent conjunctions of features that lead to those classifications.
Classification using proximity to either centroids of clusters or support vectors is generally inadequate to properly classify data points. To provide for more accurate classification, the shape of the cluster should be taken into account.
The shortcomings of the prior art can be overcome and additional advantages can be provided through exemplary embodiments of the present invention that are related to a method for obtaining a shape interpolated representation of shapes of one or more clusters in an image of a dataset that has been clustered. The method comprises generating a density estimate value of each grid point of a set of grid points sampled from the image at a specified resolution for each cluster in the image using a kernel density function; evaluating the density estimate value of each grid point for each cluster to identify a maximum density estimate value of each grid point and a cluster associated with the maximum density estimate value of each grid point; and adding each grid point for which the maximum density estimate value exceeds a specified threshold to the cluster associated with the maximum density estimate value for the grid point to form a shape interpolated representation of the one or more clusters.
The shortcomings of the prior art can also be overcome and additional advantages can also be provided through exemplary embodiments of the present invention that are related to computer program products and data processing systems corresponding to the abovesummarized method are also described and claimed herein.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
As a result of the summarized invention, technically we have achieved a solution that can be implemented to interpolate cluster shapes by utilizing kernel density estimation to create a smoother approximation in a manner that is able to preserve the overall perception of the shapes given by the data points in a multidimensional feature space. Exemplary embodiments can be implemented to perform precise classification by more accurately identifying outlier data points.
The subject matter that is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description of exemplary embodiments of the present invention taken in conjunction with the accompanying drawings in which:
The detailed description explains exemplary embodiments of the present invention, together with advantages and features, by way of example with reference to the drawings. The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified. All of these variations are considered a part of the claimed invention.
While the specification concludes with claims defining the features of the invention that are regarded as novel, it is believed that the invention will be better understood from a consideration of the description of exemplary embodiments in conjunction with the drawings. It is of course to be understood that the embodiments described herein are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific structural and functional details disclosed in relation to the exemplary embodiments described herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriate form. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of the invention.
Exemplary embodiments of the present invention described herein can be implemented to perform data classification using shape interpolation of clusters. Shape interpolation is the process of transforming one object continuously into another. Modeling of cluster shapes has thus far been limited to representations either as a collection of isolated points within the same cluster label or through global parametric models such as mixtures of Gaussians. Cluster structure, however, cannot adequately be described as collection of isolated points, and the parametric models typically operate to smooth the arbitrary distributions that characterize clusters by approximately fitting the distributions to a geometric shape having predetermined boundaries and therefore also cannot accurately represent the perceptible regions of the shape of a cluster. All parametric densities are unimodal, that is, they have a single local maximum, while many practical problems involve multimodal densities. Furthermore, traditional surface interpolation methods used in computer vision are not applicable to considerations of higherdimensional point distributions.
Exemplary embodiments described herein can be implemented to interpolate cluster shapes in a manner that is able to preserve the overall perception of the shapes given by the data points in a multidimensional feature space. In exemplary embodiments of the present invention, to generate a continuous manifold characterizing a cluster, the given sample points already present in the cluster are treated as anchor points and a probability density function, which is a function that represents a probability distribution in terms of integrals, is hypothesized from observed data. More specifically, exemplary embodiments can be implemented to represent cluster shapes using a model that is based on density estimation. Density estimation involves the construction of an estimate, based on observed data, of an unobservable underlying probability density function. The unobservable density function is viewed as the density according to which a large population is distributed, and the data are usually thought of as a random sample from that population.
Because of the sparseness of multidimensional datasets in comparison to feature space dimensions, it can be useful for exemplary embodiments to first obtain a clustering of the dataset that provides dense representation of the shapes of the clusters in which the clusters are viewed as regions of the pattern space in which the patterns are dense, separated by regions of low pattern density. Clusters can then be identified by searching for regions of high density, called modes, in the pattern space. The close fit provided by a dense representation of the cluster shapes would help in later classification of new data points, as the classification would be based on membership within multidimensional manifolds rather than distance alone.
Even more specifically, exemplary embodiments as described herein utilize kernel density estimation, which is a method of estimating the probability density function of a random variable. Kernel density estimation is a nonparametric technique for density estimation in which a known density function, the kernel, is averaged across the observed data points to create a smooth approximation. Nonparametric procedures can be used with arbitrary distributions and without the assumption that the forms of the underlying densities are known. Although it is possible for less smooth density estimators such as the histogram density estimator to be made to be asymptotically consistent, other density estimators are often either discontinuous or converge at slower rates than the kernel density estimator. Rather than grouping observations together in bins, the kernel density estimator can be thought of as placing small “bumps” at each observation determined by the kernel function. As a result, the estimator consists of a “sum of bumps” and creates a smoother, finer approximation or the regions of cluster shapes that does not depend on end points or bounded, predetermined shapes.
In exemplary embodiments, to obtain a dense representation of the shapes of the clusters at block 110, two stages of clustering can be performed. In the first stage, an unsupervised, nonparametric clustering method, such as, for example, perceptual clustering, can be performed on the initial dataset, to determine the number of cluster shapes. In the second stage, the data points in each separate cluster shape are clustered a second time using a supervised, partitional clustering method such as, for example, kmeans or kmediod algorithms, to partition each cluster shape into a desired number of smaller cluster regions to provide a dense representation of the clusters.
After clustering is performed in exemplary process 100, a smooth interpolation of the shapes of the clusters is obtained at block 120 by using a kernel density function that will be described in greater detail below. First, however, some terminology for the model used in the present exemplary embodiment will be outlined.
In the model of the present exemplary embodiment, given n sample points {X_{1}, X_{2}, . . . X_{n}} belonging to a cluster c, the contribution of each data point can be smoothed out over a local neighborhood of that data point. The contribution of data point X_{i }to the estimate at some point X depends on how apart X_{i }and X are. The extent of this contribution is dependent upon the shape of the kernel function adopted and the bandwidth, which determines the range of the local estimation neighborhood for each data point. In the present exemplary embodiment, denoting the kernel function as K and its bandwidth by h, the equation for determining the estimated density at any point x is provided by
where ∫K(t)dt=1 to ensure that the estimate P(x) integrates to 1.
In exemplary embodiments, the kernel function K can be chosen to be a smooth unimodal function such as a Gaussian kernel. It should be noted that choosing the Gaussian as the kernel function is different from fitting the distribution to a mixture of Gaussian model. In the present situation, the Gaussian is only used as a function that weights the data points. In exemplary embodiments, a multivariate Gaussian could be used. In the present exemplary embodiment, a simpler approximation in terms of a product of onedimensional kernels is used. Thus, the shape of a cluster c consisting of sample points {X_{1}, X_{2}, . . . X_{n}} at any arbitrary point X in the Mdimensional space is given by the approximation equation
where (f_{1i}, f_{2i}, . . . f_{Mi}) are the values along the feature dimensions and (f_{1i}, f_{2i}, . . . f_{Mi}) are the sample means along the respective dimensions.
In exemplary embodiments, any suitable choice of bandwidth that is not too small or too large for performing kernel density estimation can be used. In the present exemplary embodiment, the bandwidth estimation formula that is used is one that is typically adopted for most practical applications and can be expressed by the following equation:
where f_{j}=(f_{ji}, f_{j2}, . . . f_{jn}) are features assembled from dimension j for all samples in the cluster. Here, iqr(f_{j}) is the interquartile range of f_{j }and n is the number of samples in the cluster. This bandwidth may generally produce a less smooth but more accurate density estimate.
At block 120 of exemplary process 100, the kernel density interpolation of the above approximation equation is applied by sampling the image size on a neighborhood of a specified image resolution for each selected clustering level. To interpolate the shape of clusters, the multidimensional image can be sampled with a fine grid having as much resolution as desired for the interpolation. For example, the image resolution could be specified as 256×256, 128×128, 64×64, etc. in exemplary embodiments. In the present exemplary embodiment, the sampling resolution is selected as 256×256 so that a dense representation of shape will be obtained. This can eliminate small, noisy samples that are in single connected components, as the bandwidth will reduce to zero when applying the kernel density approximation equation for such samples.
In exemplary embodiments in which a twostage clustering is performed at step 110 to generate a number of cluster shapes and a desired number of smaller cluster regions for each cluster shape, the kernel density interpolation performed at block 120 can be applied to interpolate the shape of each smaller cluster region. A close fit estimation of the cluster shapes that resulted from the first clustering stage can then be obtained by uniting the interpolated shapes of the secondstage smaller cluster regions for each firststage cluster shape. As a result, classification can be performed based upon more accurate approximations of regions of cluster shapes, rather simply based on proximity to a centroid or according to the boundary points of a predetermined shape.
At block 130, after performing the kernel density interpolation, the kernel density estimate is evaluated from each cluster at each grid point using the above equation for determining the estimated density, and the maximum value of the estimate for each grid point is retained as an estimate along with the associated cluster label for the grid point. At block 140, for each grid point, if the maximum value of the density estimate for that grid point is above a chosen threshold, the grid point is classified as belonging to the associated cluster and therefore added to that cluster. At block 150, for each cluster, the new shape of the cluster is formed as the set of grid points added to that cluster at block 140, along with the sample points of the cluster that were previously isolated at block 110.
As a result of the exemplary shape interpolation process described above, a dense representation of clusters can be obtained. The resulting shape of each cluster will resemble the original cluster shape and therefore can be more indicative of a classification region around the cluster than the use of support vectors alone.
Although the exemplary embodiments described thus far have involved performing an explicit computation, in other exemplary embodiments, shape interpolation using kernel density estimation can be carried out dynamically during classification to find the nearest cluster. As a result, instead of using the centroid of the cluster as a prototypical member for computing the nearest distance, a new sample can be assigned to the cluster with the highest kernel density estimate.
The exemplary shape interpolation processes described above can be implemented to classify new data points by testing membership in a shape interpolated from a cluster of data points using kernel density estimation. Kernel density estimation as described herein utilizes a nonparametric function to provide a good dense interpolation of shape around a cluster. The details of the exemplary shape interpolation process illustrated in
1. Perform clustering of the data points using any clustering algorithm.
2. Let there be n sample points {X_{1}, X_{2}, . . . X_{n}} belonging to a cluster c.
3. Perform a dense shape interpolation using a kernel density function. That is, at a point X in the multidimensional space surrounding c, the contribution of data point X_{i }to the estimate at some point X depends on how apart X_{i }and X are. The extent of this contribution is dependent upon the shape of the kernel function adopted and the bandwidth in exemplary embodiments. Denoting the kernel function as K and its bandwidth by h, the estimated density at any point x is
where ∫K(t)dt=1 to ensure that the estimate P(x) integrates to 1. In exemplary embodiments, the kernel function K is can be chosen to be a smooth unimodal function.
4. Given any new point X, the class that X belongs is the one for which the value of the approximation equation
is the maximum.
By approximating the shape of clusters at a chosen level through a dense kernel density functionbased interpolation of sparse datasets, noise and region merging inconsistencies can also be removed in exemplary embodiments.
The capabilities of exemplary embodiments of present invention described above can be implemented in software, firmware, hardware, or some combination thereof, and may be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods and/or functions described herein—is suitable. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. Exemplary embodiments of the present invention can also be embedded in a computer program product, which comprises features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods.
Computer program means or computer program in the present context include any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation, and/or reproduction in a different material form.
Therefore, one or more aspects of exemplary embodiments of the present invention can be included in an article of manufacture (for example, one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately. Furthermore, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the exemplary embodiments of the present invention described above can be provided.
For instance, exemplary embodiments of the present invention can be implemented within the exemplary embodiment of a hardware configuration provided for a computer system in
The abovedescribed program or modules implementing exemplary embodiments of the present invention can work on processor 12 and the like to perform shape interpolation. The program or modules implementing exemplary embodiments may be stored in an external storage medium. In addition to system disk 27, an optical recording medium such as a DVD and a PD, a magnetooptical recording medium such as a MD, a tape medium, a semiconductor memory such as an IC card, and the like may be used as the storage medium. Moreover, the program may be provided to computer system 10 through the network by using, as the recording medium, a storage device such as a hard disk or a RAM, which is provided in a server system connected to a dedicated communication network or the Internet.
While exemplary embodiments of the present invention have been described, it will be understood that those skilled in the art, both now and in the future, may make various modifications without departing from the spirit and the scope of the present invention as set forth in the following claims. These following claims should be construed to maintain the proper protection for the present invention.
Claims (5)
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

US11/940,739 US7412429B1 (en)  20071115  20071115  Method for data classification by kernel density shape interpolation of clusters 
Applications Claiming Priority (3)
Application Number  Priority Date  Filing Date  Title 

US11/940,739 US7412429B1 (en)  20071115  20071115  Method for data classification by kernel density shape interpolation of clusters 
US12/142,949 US7542953B1 (en)  20071115  20080620  Data classification by kernel density shape interpolation of clusters 
US12/164,532 US7542954B1 (en)  20071115  20080630  Data classification by kernel density shape interpolation of clusters 
Related Child Applications (1)
Application Number  Title  Priority Date  Filing Date 

US12/142,949 Continuation US7542953B1 (en)  20071115  20080620  Data classification by kernel density shape interpolation of clusters 
Publications (1)
Publication Number  Publication Date 

US7412429B1 true US7412429B1 (en)  20080812 
Family
ID=39678812
Family Applications (3)
Application Number  Title  Priority Date  Filing Date 

US11/940,739 Active US7412429B1 (en)  20071115  20071115  Method for data classification by kernel density shape interpolation of clusters 
US12/142,949 Active US7542953B1 (en)  20071115  20080620  Data classification by kernel density shape interpolation of clusters 
US12/164,532 Active US7542954B1 (en)  20071115  20080630  Data classification by kernel density shape interpolation of clusters 
Family Applications After (2)
Application Number  Title  Priority Date  Filing Date 

US12/142,949 Active US7542953B1 (en)  20071115  20080620  Data classification by kernel density shape interpolation of clusters 
US12/164,532 Active US7542954B1 (en)  20071115  20080630  Data classification by kernel density shape interpolation of clusters 
Country Status (1)
Country  Link 

US (3)  US7412429B1 (en) 
Cited By (11)
Publication number  Priority date  Publication date  Assignee  Title 

US20060242610A1 (en) *  20050329  20061026  Ibm Corporation  Systems and methods of data traffic generation via density estimation 
US20100033745A1 (en) *  20050609  20100211  Canon Kabushiki Kaisha  Image processing method and apparatus 
US20100332425A1 (en) *  20090630  20101230  Cuneyt Oncel Tuzel  Method for Clustering Samples with Weakly Supervised Kernel Mean Shift Matrices 
US8280484B2 (en)  20071218  20121002  The Invention Science Fund I, Llc  System, devices, and methods for detecting occlusions in a biological subject 
US20150019464A1 (en) *  20130709  20150115  Robert Bosch Gmbh  method and apparatus for supplying interpolation point data for a databased function model calculation unit 
CN104715460A (en) *  20150330  20150617  江南大学  Quick image superresolution reconstruction method based on sparse representation 
US20160232658A1 (en) *  20150206  20160811  International Business Machines Corporation  Automatic ground truth generation for medical image collections 
US20160283490A1 (en) *  20130204  20160929  TextWise Company, LLC  Method and System for Visualizing Documents 
US9672471B2 (en)  20071218  20170606  Gearbox Llc  Systems, devices, and methods for detecting occlusions in a biological subject including spectral learning 
US10229347B2 (en) *  20170514  20190312  International Business Machines Corporation  Systems and methods for identifying a target object in an image 
US10417530B2 (en) *  20160930  20190917  Cylance Inc.  Centroid for improving machine learning classification and info retrieval 
Families Citing this family (9)
Publication number  Priority date  Publication date  Assignee  Title 

US20090232355A1 (en) *  20080312  20090917  Harris Corporation  Registration of 3d point cloud data using eigenanalysis 
US8571331B2 (en) *  20091130  20131029  Xerox Corporation  Content based image selection for automatic photo album generation 
US20110200249A1 (en) *  20100217  20110818  Harris Corporation  Surface detection in images based on spatial data 
EP2715614A1 (en) *  20110527  20140409  Telefonaktiebolaget LM Ericsson (PUBL)  A method of conditioning communication network data relating to a distribution of network entities across a space 
CN103729539B (en) *  20121012  20170616  国际商业机器公司  Describe a method for detecting and visualizing the visual characteristics of the system and 
US9311899B2 (en)  20121012  20160412  International Business Machines Corporation  Detecting and describing visible features on a visualization 
CN106484758B (en) *  20160809  20190806  浙江经济职业技术学院  A kind of realtime stream Density Estimator method based on grid and cluster optimization 
CN106339416A (en) *  20160815  20170118  常熟理工学院  Gridbased data clustering method for fast researching density peaks 
US10026014B2 (en)  20161026  20180717  Nxp Usa, Inc.  Method and apparatus for data set classification based on generator features 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US5671294A (en)  19940915  19970923  The United States Of America As Represented By The Secretary Of The Navy  System and method for incorporating segmentation boundaries into the calculation of fractal dimension features for texture discrimination 
US20030147558A1 (en)  20020207  20030807  Loui Alexander C.  Method for image region classification using unsupervised and supervised learning 
US20060217925A1 (en)  20050323  20060928  Taron Maxime G  Methods for entity identification 
US20070003137A1 (en)  20050419  20070104  Daniel Cremers  Efficient kernel density estimation of shape and intensity priors for level set segmentation 
US7202791B2 (en)  20010927  20070410  Koninklijke Philips N.V.  Method and apparatus for modeling behavior using a probability distrubution function 

2007
 20071115 US US11/940,739 patent/US7412429B1/en active Active

2008
 20080620 US US12/142,949 patent/US7542953B1/en active Active
 20080630 US US12/164,532 patent/US7542954B1/en active Active
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US5671294A (en)  19940915  19970923  The United States Of America As Represented By The Secretary Of The Navy  System and method for incorporating segmentation boundaries into the calculation of fractal dimension features for texture discrimination 
US7202791B2 (en)  20010927  20070410  Koninklijke Philips N.V.  Method and apparatus for modeling behavior using a probability distrubution function 
US20030147558A1 (en)  20020207  20030807  Loui Alexander C.  Method for image region classification using unsupervised and supervised learning 
US20060217925A1 (en)  20050323  20060928  Taron Maxime G  Methods for entity identification 
US20070003137A1 (en)  20050419  20070104  Daniel Cremers  Efficient kernel density estimation of shape and intensity priors for level set segmentation 
Cited By (17)
Publication number  Priority date  Publication date  Assignee  Title 

US7684963B2 (en) *  20050329  20100323  International Business Machines Corporation  Systems and methods of data traffic generation via density estimation using SVD 
US20060242610A1 (en) *  20050329  20061026  Ibm Corporation  Systems and methods of data traffic generation via density estimation 
US20100033745A1 (en) *  20050609  20100211  Canon Kabushiki Kaisha  Image processing method and apparatus 
US7936929B2 (en) *  20050609  20110503  Canon Kabushiki Kaisha  Image processing method and apparatus for removing noise from a document image 
US9672471B2 (en)  20071218  20170606  Gearbox Llc  Systems, devices, and methods for detecting occlusions in a biological subject including spectral learning 
US8280484B2 (en)  20071218  20121002  The Invention Science Fund I, Llc  System, devices, and methods for detecting occlusions in a biological subject 
US20100332425A1 (en) *  20090630  20101230  Cuneyt Oncel Tuzel  Method for Clustering Samples with Weakly Supervised Kernel Mean Shift Matrices 
US8296248B2 (en) *  20090630  20121023  Mitsubishi Electric Research Laboratories, Inc.  Method for clustering samples with weakly supervised kernel mean shift matrices 
US20160283490A1 (en) *  20130204  20160929  TextWise Company, LLC  Method and System for Visualizing Documents 
US9805313B2 (en) *  20130709  20171031  Robert Bosch Gmbh  Method and apparatus for supplying interpolation point data for a databased function model calculation unit 
US20150019464A1 (en) *  20130709  20150115  Robert Bosch Gmbh  method and apparatus for supplying interpolation point data for a databased function model calculation unit 
US20160232658A1 (en) *  20150206  20160811  International Business Machines Corporation  Automatic ground truth generation for medical image collections 
US9842390B2 (en) *  20150206  20171212  International Business Machines Corporation  Automatic ground truth generation for medical image collections 
CN104715460A (en) *  20150330  20150617  江南大学  Quick image superresolution reconstruction method based on sparse representation 
US10417530B2 (en) *  20160930  20190917  Cylance Inc.  Centroid for improving machine learning classification and info retrieval 
US10229347B2 (en) *  20170514  20190312  International Business Machines Corporation  Systems and methods for identifying a target object in an image 
US10395143B2 (en) *  20170514  20190827  International Business Machines Corporation  Systems and methods for identifying a target object in an image 
Also Published As
Publication number  Publication date 

US7542953B1 (en)  20090602 
US7542954B1 (en)  20090602 
US20090132594A1 (en)  20090521 
US20090132568A1 (en)  20090521 
Similar Documents
Publication  Publication Date  Title 

Sheikh et al.  Modeseeking by medoidshifts  
Comaniciu et al.  Distribution free decomposition of multivariate data  
Srivastava et al.  Statistical shape analysis: Clustering, learning, and testing  
Zhou et al.  Manifold elastic net: a unified framework for sparse dimension reduction  
Kaski et al.  Bankruptcy analysis with selforganizing maps in learning metrics  
Kristan et al.  Multivariate online kernel density estimation with Gaussian kernels  
Chen et al.  Semisupervised learning via regularized boosting working on multiple semisupervised assumptions  
Dueck et al.  Nonmetric affinity propagation for unsupervised image categorization  
Zass et al.  A unifying approach to hard and probabilistic clustering  
EP1073272B1 (en)  Signal processing method and video/audio processing device  
EP1739593A1 (en)  Generic visual categorization method and system  
Jayasumana et al.  Kernel methods on Riemannian manifolds with Gaussian RBF kernels  
Bouveyron et al.  Robust supervised classification with mixture models: Learning from data with uncertain labels  
Ghasedi Dizaji et al.  Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization  
WO2006113248A2 (en)  Partially supervised machine learning of data classification based on localneighborhood laplacian eigenmaps  
Awad et al.  Efficient learning machines: theories, concepts, and applications for engineers and system designers  
US7724961B2 (en)  Method for classifying data using an analytic manifold  
Tang et al.  ENN: Extended nearest neighbor method for pattern recognition [research frontier]  
US7558425B1 (en)  Finding structures in multidimensional spaces using imageguided clustering  
Jia et al.  Baggingbased spectral clustering ensemble selection  
Pang et al.  Fast Haar transform based feature extraction for face representation and recognition  
US8725660B2 (en)  Applying nonlinear transformation of feature values for training a classifier  
Horn et al.  The method of quantum clustering  
Wang et al.  CLUES: A nonparametric clustering method based on local shrinking  
US20050114382A1 (en)  Method and system for data segmentation 
Legal Events
Date  Code  Title  Description 

AS  Assignment 
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAAS, PETER J.;LAKE, JOHN M.;LOHMAN, GUY M.;AND OTHERS;REEL/FRAME:020124/0696;SIGNING DATES FROM 20071108 TO 20071109 

AS  Assignment 
Owner name: INTERNATIONAL BUSINESS MACHINES, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SYEDAMAHMOOD, TANVEER;HAAS, PETER J.;LAKE, JOHN M.;AND OTHERS;REEL/FRAME:020159/0543;SIGNING DATES FROM 20071113 TO 20071119 

STCF  Information on status: patent grant 
Free format text: PATENTED CASE 

REMI  Maintenance fee reminder mailed  
SULP  Surcharge for late payment  
FPAY  Fee payment 
Year of fee payment: 4 

AS  Assignment 
Owner name: SAP AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:028540/0522 Effective date: 20120629 

AS  Assignment 
Owner name: SAP SE, GERMANY Free format text: CHANGE OF NAME;ASSIGNOR:SAP AG;REEL/FRAME:033625/0334 Effective date: 20140707 

FPAY  Fee payment 
Year of fee payment: 8 