WO2001016880A2 - Topographic map and methods and systems for data processing therewith - Google Patents
Topographic map and methods and systems for data processing therewith Download PDFInfo
- Publication number
- WO2001016880A2 WO2001016880A2 PCT/BE2000/000099 BE0000099W WO0116880A2 WO 2001016880 A2 WO2001016880 A2 WO 2001016880A2 BE 0000099 W BE0000099 W BE 0000099W WO 0116880 A2 WO0116880 A2 WO 0116880A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- ofthe
- implemented method
- computer implemented
- input data
- topographic map
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2137—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on criteria of topology preservation, e.g. multidimensional scaling or self-organising maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present invention relates to systems and methods suitable for the analysis of multi- dimensional data sets from which it is desired to obtain relational information indicating attribute relationships in groups of data. Such information may be used for prediction, decision making, market analysis, fraud detection etc.
- the present invention relates to methods and systems, especially distributed processing systems.
- Clustering is aimed at partitioning a given data set into subsets of "similar” data, ideally without using a priori knowledge about the properties or even the existence of these subsets.
- Many approaches have been suggested in the past and various optimality criteria developed in the statistics literature (Duda and Hart, 1973; for a recent overview, see Theodoridis and Koutroubas, 1998). Some rely on an estimation ofthe density function, others rely on a similarity criterion that needs to be optimized.
- the allocation of new neurons is performed when needed, depending on a "vigilance" parameter: when the current input is not sufficiently similar to any ofthe already allocated neuron weights, then a new neuron is recruited and its weight set equal to the current input.
- vigilance the choice of which is often quite sensitive to input noise.
- Clustering has also been formalized in a more probabilistic framework, in an attempt not to make assumptions about the shape ofthe input distribution: the main principle is that each datum is assigned in probability to each cluster, a feature that is usually called 'fuzzy membership in clusters.” This membership function definition has been adopted by Rose and co-workers (1990), in their optimal vector quantizer, and more recently by Graepel etal. (1997), in their soft topographic vector quantizer.
- Density-based clustering is much less considered for unsupervised competitive learning algorithms, albeit that the SOM has been used for non-parametric density estimation purposes, but not for capturing the fine-structure ofthe input density (see Kohonen, 1995, p. 152) : the weight density yields an estimate ofthe input density, provided that there is a linear relationship between the two. Density function estimation should not be confused with probability distribution estimation or with (cumulative) distribution function estimation (also called repartition function estimation). However, contrary to what was originally assumed (Kohonen, 1984), the weight density is not a linear function ofthe input density (Ritter, 1991; Dersch and Tavan, 1995) and hence, the Voronoi regions will not be active with equal probability (no equiprobabilistic maps).
- Examples ofthe first category are Conscience Learning (CL; DeSieno, 1988), The Integer Markovian Artificial Neural Network (TInMANN, Van den Bout and Miller, 1989), and Frequency-Sensitive Competitive Learning (FSCL; Ahalt et al. 1990; Galanopoulos and Ahalt, 1996); an example ofthe second category is the learning scheme Bauer, Der and Herrmann (1996) (BDH algorithm) introduced for topographic map formation under the continuum limit.
- BDH algorithm Frequency-Sensitive Competitive Learning
- a still different approach to density estimation is provided by the popular (Gaussian) mixture model: the parameters ofthe model can be trained, in an unsupervised manner, so that they become the most likely ones for the given data set (maximum likelihood learning, Redner and Walker, 1984).
- An aspect ofthe present invention is an unsupervised competitive learning rule for equiprobabilistic topographic map formation, called the kernel-based Maximum
- Entropy learning Rule since it maximizes the information-theoretic entropy of the map's output as well as systems, especially distributed processing systems for carrying out the rule. Since kMER adapts not only the neuron weights but also the radii ofthe kernels centered at these weights, and since these radii are updated so that they model the local input density at convergence, these radii can be used directly, in variable kernel density estimation.
- the data density function at any neuron is assumed to be convex and a cluster of related data comprises one or more neurons.
- the data density function may have a single radius, e.g. a hypersphere.
- Another aspect ofthe present invention is a processing engine and a method for developing a kernel-based topographic map which is then used in data model-based applications.
- the receptive field of each kernel is disjunct from the others, i.e. overlapping.
- the engine may include a tool for self-organising and unsupervised learning, a monitoring tool for maximising the degree of topology achieved, for example, by using the overlapping kernels, and a tool for automatically adjusting the kernel widths to achieve equiprobabilism.
- Applications include variable kernel density estimation with the equiprobabalistic topographic maps, with density based cluster maps and with equiprobabalistic variable kernel-based regression.
- the receptive fields ofthe kernels may be convex, e.g. hyperspheroids or hyperspheres but the present invention is not limited thereto.
- Another aspect of the present invention is a processing engine and a method for two-step data mining of numerical data using a kernel-based topographic map.
- the map may be equiprobabalistic.
- the engine proceeds in two steps: in the first step the engine develops in an unsupervised manner a kernel-based and topographic map-based data model using numerical input data.
- the data model generated may be used for a variety of applications, such as clustering analysis, statistical feature extraction and model-based regression applications.
- a monitoring tool maximises the degree of topology preservation achieved during kernel-based topographic map formation.
- the engines and methods described above may be local or distributed.
- the data model and the processing on the data model may be local or distributed.
- Another aspect ofthe present invention is density-based clustering and unsupervised classification for topographic maps as well as systems, especially distributed processing systems for carrying out the classification.
- the topographic map learning rule called the kernel-based Maximum Entropy learning Rule (kMER) in accordance with the present invention
- kMER Kernel-based Maximum Entropy learning Rule
- all neurons of a neural network have an equal probability to be active (equiprobabilistic map) and, in addition, pilot density estimates are obtained that are compatible with the variable kernel density estimation method.
- the neurons receive their cluster labels by performing a suitable method, such as hill-climbing, on the density estimates which are located at the neuron weights only.
- the present invention also includes several methods for determining the cluster boundaries and the clustering may be used for the case where the cluster regions are used for unsupervised classification purposes.
- Another aspect ofthe present invention is a modular data analysis system and method for the detection of clusters of related data, comprising processing engine running an unsupervised competitive learning algorithm to generate a kernel-based topographic map, wherein each kernel represents a data density value, the receptive field of any kernel being assumed to be convex and a cluster comprising one or more neurons.
- unsupervised classification of the clusters may be carried out.
- Topographic maps have been widely used for (supervised) classification purposes: the neurons ofthe converged maps are given class labels depending on their activation in response to samples drawn from each class.
- the present invention includes in one embodiment a method in which the neuron weights are labeled by performing hill-climbing on the density estimates located at the neuron weights.
- the learning rule called the kernel-based Maximum Entropy learning Rule (kMER)
- pilot density estimates can be readily obtained since kMER is aimed at producing an equiprobabilistic topographic map.
- kMER a pilot density estimate can be readily obtained but, more importantly, it is expressed in terms ofthe kernel radii used in the VK method: as a result of this, the RF radii ⁇ in kMER can be simply exchanged for the kernel radii in the VK method.
- Figure 1 shows a distributed network with which the present invention can be used.
- Figure 2 shows a further distributed network with which the present invention can be used.
- FIG. 4 shows: Receptive field (RF) definition used in kMER.
- the arrow indicates the update ofthe RF center w,., given the present input v (not to scale); the dashed circles indicate the updated RF regions S 1 ,. and S j .
- the range of the neighborhood function is assumed to have vanished so that w. is not updated.
- the shaded area indicates the overlap between S,- and S j before the update.
- Figure 5 shows an optimized kMER algorithm, batch versionin accordance with an embodiment of the present invention.
- Figure 6 shows: Figs. 6A and B show Three Gaussians example. Dependency ofthe number of clusters found on the free parameters L ofthe SKIZ method (A) and k of the hill-climbing method (B). The presence of large plateaus are indicative ofthe actual number of clusters in the input distribution.
- Fig. 6 C shows Iris Plants database example. Number of clusters as a function of the free parameter k.
- Figure 7 shows a computationally-efficient tree-based hill-climbing algorithm in accordance with an embodiment ofthe present invention.
- ID the neuron's lattice number
- Top lattice number ofthe neuron which has the highest density estimate of all k + 1 neurons in the current hypersphere
- Ultimate Top lattice number ofthe neuron which represents a local maximum in the input density.
- Figure 8 shows a scatter plot of a two-dimensional "curved" distribution.
- Figure 9 shows monitoring the overlap variability (OV) during kMER learning on the sample distribution shown in Fig. 1 in accordance with an embodiment ofthe present invention.
- TP thin dashed line
- the thin dashed line indicates the optimal TP- value (zero).
- Fig. 11 shows system components at Level 1 during training in accordance with an embodiment ofthe present invention.
- Figure 12 shows system components at Level 1 during application and at Level 2 during training. Shown is the th Level 2 subsystem: it is trained on the spectra that receive the zth cluster label at Level 1, F 1 (1, i, s). The Level 2 subsystem's components during application are similar to those shown for Level 1.
- Figure 13 shows a Plot of M S E[p ⁇ , p p * j as a function of p s at Level 1, for all
- the neurons of the 20 x 20 lattice that belong to the same cluster are labeled with the same gray scale.
- Figure 16 shows a thresholded and normalized spectrogram of eight notes played on three musical instruments consecutively with uniform white noise added
- the lattice is sized 7 x 7 neurons.
- Figure 19 shows a clustering tree.
- the instruments are indicated with subscripts; the "F” note with the higher pitch is labeled as F.
- the leaf nodes are highlighted with an ellipse drawn around the labels.
- Figure 20 shows embodiments ofthe present invention demonstrating Vertical modularity.
- A The data model and the regression model are developed separately and operate in sequence.
- B Several regression models can be grafted onto a common data model.
- Figure 21 shows Kernel-based maximum entropy learning and the data model in accordance with embodiments ofthe present invention.
- A Definition of the type of formal neurons used. Shown are the activation regions 5, and 5, of neurons i and (large circles) of a lattice A (not shown). The shaded area indicates the intersection between regions 5, and S j (overlap). The receptive fields centers w and w,, are indicated with small open dots.
- Neuron i has a localized receptive field kernel K(x - w strig ⁇ ;), centered at w, in input space V ⁇ z ⁇ iR d . The kernel radius corresponds with the radius of activation region S ⁇ .
- the present input x e V is indicated by the black dot and falls outside S,.
- Figure 22 shows an optimized kMER algorithm, batch version, adapted for the case where the values of some input vector components are missing (marked as "don't cares" or x).
- Figure 23 shows an algorithm for determining missing input vector components in accordance with an embodiment ofthe present invention.
- the data model comprises a topographic map of N neurons of which the activation regions are determined with kMER (location w, and radii ⁇ j).
- the regression model consists of N Gaussian kernels, with the same location and (scaled) radii as the activation regions in the data model (cf. the icons next to g, and g N ).
- the outputs of these kernels, g ls ..., g N are weighted by the parameters W , ..., W N so as to produce the regression model's output O.
- Figure 25 shows embodiments of the present invention having horizontal modularity. Partitioning ofthe data set into subsets ofthe input space data set (A), or into subspace data sets (B), and a mixture of the two (C).
- Figure 26 shows a further embodiment ofthe present invention having horizontal modularity. Case of several subsets and one regression module.
- Figure 27 shows an embodiment ofthe present invention having multiple subsets and multiple regression modules. The latter are arranged at two levels.
- Figure 28 shows an embodiment ofthe present invention having multiple subsets and regression modules and a single decision module.
- Figure 29 shows an embodiment ofthe present invention having multiple subspaces and multiple regression modules at two levels.
- Figure 30 shows an embodiment ofthe present invention involving a mixed case, heuristic strategy: multiple subsets of complete and incomplete input vectors. The presence of incomplete data vectors is ignored.
- Figure 31 shows a further embodiment ofthe present invention involving a mixed case, correct strategy: the presence of complete data vectors is ignored.
- Fig. 1 shows a basic network 10 in accordance with an embodiment ofthe present invention. It comprises two major subsystems which may be called data processing nodes 2 and data intelligence nodes 3, 4.
- a data processing node 2 comprises one or more microprocessors that have access to various amounts of data which may include very large amounts of data.
- the microprocessor(s) may be comprised in any suitable processing engine which has access to peripheral devices such as memory disks or tapes, printers, modems, visual display units or other processors.
- Such a processing engine may be a workstation, a personal computer, a main frame computer, for example or a program running on such devices.
- the data may be available locally in a data store 1 or may be accessible over a network connection such as via the Internet, a company Intranet, a microwave link, a LAN or WAN, etc.
- the data may be structured such as is usually available in a data warehouse or may be "as is", that is unstructured provided it is stored in an electronically accessible form, e.g. stored on hard discs in a digital or analogue format.
- the processing engine is provided with software programs for running an algorithm in accordance with the present invention to develop a topographic representation ofthe input data.
- a competitive learning algorithm for producing a topographic equiprobabalistic density representation (or map) of the input data having a linear mapping of real input data densities to the density represented by the topographic representation (the algorithm is described in more detail below).
- This topographic map may be described as a graph 7 of data models that represent the data processed.
- Each topographic map is a data structure in accordance with the present invention.
- the node 2 can use a persistent medium to store the graph of data models 7, or it can send it directly to other nodes in the network, e.g. nodes 3 and 4.
- the data processing node 2 can be distributed over a network, for example a LAN or WAN.
- a data processing node 2 can run on most general purpose computer platforms, containing one or more processors, memory and (optional) physical storage.
- the supported computer platforms include (but are not limited to) PC's, UNIX servers and mainframes.
- Data processing nodes 2 can be interconnected with most existing and future communication links, including LAN and WAN systems.
- a data processing node 2 can also have a persistent storage system capable of storing all the information (data) that is been used to generate the graph 7 of data models that are stored or maintained by this node or a sub-graph of the graph 7.
- the graph 7 of data models can be saved regularly to the persistent medium for the analysis and monitoring of changes and the evolutions in the data (e.g. trend detection).
- a data processing node 2 can also run other or additional algorithms that can be used to process data or pre-process or prepare data ready for analysis (e.g. to capture time dynamics in data sets).
- the data sets that are used can be both structured data (e.g. databases) or unstructured data (e.g. music samples, samples of pictures, ).
- the only limitations is that the data should be offered in a format that can be processed by the chosen computer platform as described above.
- a data processing node 2 can provide a data intelligence node 3, 4 with an individual data model, a sub-graph or the complete graph 7 of data models. Normally, only the data models are returned not the data itself. However, it is (optionally) possible to return also the data that is used to generate the data models.
- the graph of data models that is build up and maintained by a data processing node 2 contains: 1) A number of datanodes that contain the data models that describe at least a part of the data set. 2) A number of directed links between the datanodes. Note: datanodes should not be confused with nodes of a distributed system such as 2,
- Datanode refers to a neural network which models at least a portion ofthe input data to be processed.
- a datanode may be a software node.
- a datanode is a neural network which models a part of the topographic equiprobabilistic density map, for example, a part generated by clustering after application of the novel competitive learning algorithm in accordance with the present invention.
- the datanodes and directed links are preferably organized in a hierarchical system, that is in a tree, in which topographic maps from two or more levels overlap.
- a tree of datanodes is shown schematically in Fig. 19 and its generation described with reference thereto.
- the top level datanode contains the data model from the complete data set and from this node, all the other datanodes in the tree can be reached from this top level via the directed links. It has only originating directed links - it has no terminating single directed link from another datanode.
- a leaf datanode is a datanode that has no originating, single directed links to other datanodes. In a system that has gone through the complete initial training phase in accordance with the competitive learning algorithm in accordance with the present invention, this means that the data in this datanode has a substantially homogeneous distribution and that it is not relevant to split the data further. It can be said that a leaf node cannot be resolved into clusters which can be separated, that is it is "homogeneous", or any discernible clusters are of such minor data density difference that this difference lies below a threshold level.
- All the other (intermediate) datanodes in the tree between the top datanode and a leaf datanode describe: a) A subset ofthe complete dataset with common characteristics, but that can be refined further (i.e. without a uniform distribution). b) The data model ofthe data described by this datanode.
- the top-level model is generated (i.e. generation ofthe top datanode).
- This top-level model is divided in several parts in accordance with rules described in detail below. The division is not a simple tiling of the top level topographic map.
- Each of these parts describe a subset ofthe complete dataset and a data model is generated for this subset, i.e. either intermediate or leaf datanodes or a mixture of the two is generated.
- the data described by an intermediate datanode can be divided further into other intermediate datanodes and/or leaf datanodes. If it is not possible to divide an intermediate datanode further, this intermediate datanode is a leaf datanode.
- the graph produced is a tree, from top-datanode to leaf datanode.
- One ofthe additional advantages ofthe data processing nodes 2 in accordance with the present invention is the capability to distribute functions over the network 10.
- Integration or cost reasons It is sometimes not feasible to install a complete data store, e.g. a data warehouse in one location.
- a system 20 designed for master/slave processing of sub-graphs with common data source in accordance with an embodiment ofthe present invention, new data or data that is used to retrain the data models is provided to a master data processing node such as 15.
- this node 15 retrains the main data model (top datanode in the graph of data models).
- it may determine to which intermediate datanode(s) this data point belongs and send it to the processor(s) responsible for this (these) intermediate datanode(s).
- Such an intermediate datanode can belong to another data processing node, somewhere else in the network 20, called a slave data processing node 12 - 14.
- a slave processing node can also act as a master processing node for another slave processing node depending on the network configuration, e.g. in Fig. 2, the nodes 11, 12 are slaves ofthe master data processing node 13 on top ofthe data, but this node 13 is also in a slave relationship to the master node 15.
- a distributing master processing node 16 collects all the initial data and carries out top level clustering. It determines a plurality of independent clusters, e.g. two. It decides to process the first ofthe clusters and send the second cluster with its associated data to the processing node 15 for parallel processing of the data. Data updates will be received by the distributing master node 16 and the master node 16 determines if the data is to be processed by itself or by the alternative processing engine in node 15. To do this, the master node 16 keeps a mapping between the records and the relevant processing engine.
- the tree structure ofthe graph 7 of data models may be at least partly mapped to a hierarchical master/slave network 20 as described above.
- a master/slave network A specific embodiment of the use of a master/slave network will now be described.
- the complete graph 7 of data models is set up on every processing node.
- the individual processing nodes assume their allotted task of processing their specific part ofthe tree. For this purpose they each receive the relevant data subset from the master processing node. From the other nodes in the system, each processing node receives the weighting factor updates for the neurons other than the ones the node processes. With the updated factors introduced into the graph 7, each node can process new data while accurately with the influence of other intermediate datanodes and leaf nodes being modeled correctly.
- This aspect of the present invention is a direct consequence ofthe linear density mapping of the topographic map generated by the competitive learning algorithm in accordance with the present invention. Only if the data which is associated with a cluster or clusters can be separated cleanly and accurately from the total data is it possible to part process safely the graph 7 in a distributed way. If the data density estimation is only approximate, the labeling of data to be associated with a specific cluster/neurons is inaccurate. This means that a significant proportion of wrong data is shipped to each slave node (that is data which should not be contributing in the processing ofthe datanode processed on this processing node but should be contributing on another node). This falsifies the results at each node.
- slave/slave processing of a complete graph without a common data source but with a centralization point is provided.
- the total graph 7 of data models can be generated by several data processing nodes in a slave/slave configuration that have individual access to different (separated) data sets.
- the different slave data processing units process their own data and send the updates (revised neural weighting factors) regularly to a central data processing node, that unifies the different updates into a single consistent data model.
- a distributed network 20 as shown in Fig. 2 may be operated with vertically or horizontally distributed data form separate embodiments ofthe present invention.
- each local processing node processes its own data. If it is necessary to query the whole data model, it is possible to devise queries which are sent around the network and collect the answers from each processing node and bring it to the questioning node, e.g. the master node.
- Schemes in accordance with the present invention which involve local processing of distributed data and local generation ofthe data model may be applied for cost or time-to-install reasons, i.e.
- an aspect ofthe present invention which is to leave data where it is and to only ship queries, answers or at most abstracted versions ofthe data (data models) around the network.
- an alternative embodiment involves generating a data model locally from the local data at one node and retrieving only the data models from other processing nodes. This would mean, in the example above that the data models from various countries would be brought together at one processing node. This would allow querying of all these models on a single computing platform. This may seem wasteful in data transfer in comparison to only sending the query around the network and collecting the answers.
- a data processing node 16 can serve only as a device that collects the graph of data models from the other data processing nodes, and updates the data analysis nodes if required.
- Data intelligence nodes provide the additional logic (called application components) needed to run the applications such as the following.
- the data analysis system in accordance with embodiments ofthe present invention can be used in the following non-limiting list of applications: direct marketing, quality assurance, predictions, data analysis, fraud detection, optimizations, behavior analysis, decision support systems, trend detection and analysis, intelligent data filtering, intelligent splitting of data sets, data mining, outlier detection.
- Application components relate to programs run to analyze the graph 7 of data models or part of it to obtain a useful, concrete and tangible result. It is an aspect ofthe present invention that the data model generation is kept separate from the applications. When the application is partly mixed into the data model generation, the data model becomes restricted in its use.
- One aspect of the present invention is that the data model is as neutral as possible, i.e. it is not determined by the specifics ofthe data nor by the specifics ofthe application.
- the Data intelligence node 17, 18 for machine-machine interaction is designed to offer data mining intelligence to other applications.
- the Data intelligence node 4 for exploratory data mining applications that allows a human analyst 5 to analyze data interactively.
- a data Intelligence node 17, 18 for machine-machine interactions is a node containing one or more processors that has access to at least a sub-graph ofthe graph 7 of data models as generated by a data processing node.
- the node 17, 18 contains the predefined logic to execute a specific application as needed by another computer system.
- this node 17, 18 offers the possibility to answer queries relating to these specific applications through a number of standardized machine-to- machine interfaces, such as a database interface (ODBC/JDBC), middle-ware interfaces (CORBA, COM) and common exchange formats (text files, XML streams, HTML, SGML).
- ODBC/JDBC database interface
- CORBA middle-ware interfaces
- COM common exchange formats
- Some ofthe applications that can be offered through this system include:
- Intelligent splitting of data sets save a (huge) amount of data in such a way that data with the same characteristics is saved together, and that these groups of data can be distributed over the network.
- This node 17, 18 can be connected to a data processing server 16 that serves at least a sub-graph ofthe graph 7 of data models using any LAN or WAN connection.
- a data intelligence node with a persistent storage system can store the (graph of) data models locally and an application component can be made available that can detect if the data model has to be resynchronized.
- a node 17, 18 can run on the same platform as a data processing node, it can run as a part of another application or it can run on small handheld devices (small computers, mobile phones). It is also possible to combine a data processing node and a data intelligence node on a single (physical) machine.
- a data Intelligence node 4 for exploratory data mining is a node containing one or more processors that has access to a at least a sub-graph of the graph 7 of data models, analogue to a data intelligence node 17 18 for machine- machine interface, but this node 4 also requires a visualization device that allows the user to browse in the graph 7 of data models and the results provided by the data models.
- the user can analyze the data and run an application component for selected data (e.g. any ofthe application described above). Specific application components can help the analyst to detect specific behavior patterns.
- Fraud detection detect exceptional behavior that can be an indication of fraud.
- Trend detection and analysis analyze how the evolution in the graphs of data models that are stored regularly to detect and analyze trends and evolutions.
- the data to be processed is first prepared (if this necessary) and then subjected to the novel competitive unsupervised competitive learning rules (kMER) in accordance as part of data density-based clustering. Then a non-parametric density model is built up from the neuron weights and the radii obtained by kMER.
- the identification ofthe clusters in the density estimate and the regions spanned by them, i.e. their "influence zones" may be performed by any of a number of clustering techniques all of which represent individual embodiments ofthe present invention.
- the SKIZ technique Herbin et al., 1996; Skeleton by Influence Zones, see Serra, 1982
- the preferred embodiment (best mode) ofthe present invention using hill- climbing In this novel clustering procedure for kMER, first the RF regions are clustered by applying a hill-climbing technique on the density estimates located at the neuron weights, and then the cluster regions are demarcated.
- the clustering is done to reveal occult relationships within the data, i.e. the clustering is done without assuming knowledge about the number of clusters or their shapes beforehand.
- the cluster regions are then used for unsupervised classification purposes.
- Density-based clustering with kMER and SKIZ Clustering in accordance with an embodiment ofthe present invention may be performed starting from either structured or unstructured data. Firstly, a topographic map is developed with kMER for a given set of M data samples. Secondly, a non- parametric density model will be built from the neuron weights and their radii obtained with kMER at convergence. Third, the SKIZ method will be used for identifying the number of clusters in the density estimate and their "influence zones".
- M 900 samples are drawn from three equally- probable Gaussians centered at (-0.4, -0.3), (0.4, -0.3), and (0., 0.3) in the unit square [-1, l] 2 , with the standard deviations all equal to 0.2 (Fig. 3 A).
- the weights w, and radii ⁇ are adapted by using two learning rules which together form kMER.
- the mathematical details, including a proof of convergence are given elsewhere (Van Hulle, 1998).
- the RF centers w, are updated proportional to ⁇ , and in the general direction of v (Fig. 4):
- ⁇ w, r7 ⁇ A(i,J, ⁇ A ) ⁇ j (v) Sgn( ⁇ - , ), V/ e A, (3) with Sgn(.) the sign function taken componentwise, ⁇ A the neighborhood range (in lattice space coordinates) ofthe neighborhood function A (cf. position ofthe dashed circle S t in Fig. 4).
- ⁇ A the neighborhood range (in lattice space coordinates) ofthe neighborhood function A (cf. position ofthe dashed circle S t in Fig. 4).
- the kernel radii ⁇ are updated in such a ways that, at convergence, the
- N - p generate, at convergence, an equiprobabilistic topographic map or, in other words, a map for which the neurons maximize the (unconditional) information-theoretic
- the optimized algorithm is given in Fig. 5.
- the time-complexity is 0(NMd).
- the N weights are initialized by sampling a uniform grid in the unit square [-1, l] 2 , and the radii are initialized by sampling the uniform distribution [0, 0.1].
- a Gaussian neigborhood function A is used and its range is decreased as follows:
- the converged lattice can be used for building a non-parametric model ofthe input density p( ).
- Two approaches can be adopted in practice. First, one can determine the winning frequencies of all neurons for a given data set and assume fixed, equally sized volumes for the corresponding RF regions (Voronoi's or hyperspheres as in kMER's case). This will lead to the Parzen window technique which allocates fixed-radii kernels to obtain the density estimate p
- K a fixed volume
- fixed radius kernel here, a Gaussian
- Z a normalizing factor
- the RF region volumes can be determined in such a manner that they yield equal winning frequencies (i.e. equiprobabilism). This is basically what kMER does and it leads to variable kernel density estimation. Since the lattice generated by kMER not only consists ofthe neuron weights but also the radii ofthe neurons' RF regions, with the radii adapted to the local input density, the method can go beyond the Parzen window technique and cast density estimation into a format which is similar to the one used for variable kernel density estimation (VK) (Silverman, 1992) :
- VK variable kernel density estimation
- the integral can be approximated as — — ⁇ ⁇ "(2) (w, - ) , with h the
- kernel bandwith and K the convolution ofthe kernel with itself.
- the kernel is Gaussian, and when fixed kernel density estimation is performed, it can be shown that, asymptotically, whenN— » oo, the theoretically best choice of p s is obtained (Stone, 1984).
- cross-validation may yield poor results.
- This metric is derived from the following heuristic.
- the MSE performance ofthe fixed kernel estimate can be considered as an upper bound for the variable kernel performance since in the former case, it is assumed that all radii are equal and, thus, that the distribution is locally uniform.
- the maximal MSE error made for the variable kernel estimate will be lower.
- a least-squares cross-validation is performed by constructing fixed and variable kernel density estimates, and by minimizing the discrepancy between the two, i.e. p * , p p ), thus with respect to s , for obtaining the optimal value, p s opt .
- the whole procedure is then repeated along the p- axis to obtain the optimal mix (p ropt , p s op -
- the above method is an embodiment ofthe present invention.
- variable kernel density obtained when optimizing for p s and r in steps of 0.1 and 2.5, respectively, is shown in Fig. 3D.
- the Skeleton by Influence Zones (SKIZ) technique (Serra, 1982) is applied in order to identify the number of clusters and the regions spanned by them.
- the initial cluster is split into a new set of clusters, and so on.
- a distance function is computed for each connected region.
- the "influence zone" of a connected region contains all points for which the associated distance function is smaller than that ofthe other regions (e.g. using the city-block distance metric).
- the border between the "influence zones” is marked. If none of the existing connected regions is split at the next threshold level, then these regions and their borders are kept. Finally, when the highest threshold level has been processed, the connected regions are identified with the clusters sought. This whole procedure is then repeated for different values of : the intrinsic dimensionality of the clustering problem is then identified with the longest plateau in the dependency ofthe number of clusters on L.
- the input space In order to determine the influence regions, the input space needs to be discretized. Hence, if each input dimension is partitioned into, e.g., b bins, then b d will need to be scanned and processed when the input space V is (/-dimensional. Furthermore, the range spanned by the input distribution p( ) must be known in order for the input discretization to be effective. For these reasons, a more computationally-efficient procedure will now be disclosed, which is an embodiment ofthe present invention.
- ⁇ P ( ( ) I z v,N) •
- a hypersphere at w is allocated, the radius of which is chosen in such a way that it contains the k nearest neuron weights.
- the neuron is sought with the highest density estimate and its lattice number is noted, e.g. neuron .
- neuron i points" to neurony, which will be called the "top” neuron. This procedure is repeated for each neuron and those top neurons are sought which point towards themselves.
- the density estimates of these neurons then represent local maxima in the input density, and hence, these neurons belong to separate clusters. These will be called “ultimate top” neurons. Cluster labels are assigned to these neurons and the "points to" relations of the other neurons are scanned in order to determine to which ultimate top neurons they belong, and hence, which cluster labels they should receive. Hence, in this way, the number of clusters present in the density estimate has been determined, for the given choice of the hypersphere parameter k, and the neurons have been labeled accordingly.
- a computationally-efficient version ofthe hill-climbing algorithm is given in Fig. 7.
- the present invention is not limited to hyperspheres. It is necessary to decide how to classify the input samples v.
- the neuron can be considered which is the closest, in Euclidean terms, to a given input sample, and that sample can be classified into the class to which the closest neuron belongs. This may be called the minimum Euclidean distance procedure (minEuC).
- minEuC minimum Euclidean distance procedure
- NNC nearest-neighbor classification
- the class labels which make up the majority of all class labels in the hypersphere then defines the input sample's class label.
- a third possibility is aimed at performing Bayesian classification (BayesC). The idea is to determine the class-conditional density estimates separately, combine them with the estimated class probabilities (i.e. the number of neurons in each class), and select the class for which the posterior probability is the largest.
- a fourth possibility is to determine the cluster means, determine for each input sample the closest cluster mean, and label the input sample accordingly (meanC). All of these methods are embodiments ofthe present invention.
- Stage 1 kMER This stage is completely identical to that ofthe SKIZ-based approach.
- This stage is also identical except that only a spatially-discrete estimate, located at the neuron weights is retained from the density estimate. Contrary to the SKIZ method, one can choose to minimize the effort in determining the optimal smoothness parameters since, basically, the density estimate is only further used for defining the class labels ofthe neurons, and not for defining the class regions or boundaries.
- the meanC, the NNC and the BayesC methods discussed earlier are also used; for NNC, k m is taken equal to 10 (5 or 20 did not yield significantly different results). Then the misclassification rates are determined on a ten times larger test set, the misclassification rates obtained are ranked, and the 6th one, i.e. the median is ranked. The median is taken, and not the mean, since otherwise the result would be strongly affected by the occurrence of even a single case where the number of clusters found is incorrect. The results are listed in Table 1 (first row). For the sake of reference, the expected Bayesian misclassification rate is also indicated
- Table 1 Misclassification rates (in %) for various sample set configurations using kMER learning, the SKIZ and the hill-climbing (HC) methods. Since HC operates in conjunction with a sample labeling method, the performance of various alternatives are listed: minimum Euclidean distance labeling (minEuC), cluster mean labeling (meanC), nearest-neighbor- (NNC), and Bayesian classification labeling (BayesC). The expected Bayesian misclassification rates are listed in the last column.
- minEuC minimum Euclidean distance labeling
- meanC cluster mean labeling
- NNC nearest-neighbor-
- Bayesian classification labeling Bayesian classification labeling
- the following competitive learning schemes are also considered of which some directly perform, or can be made suitable for topographic map formation: the SOM algorithm, Conscience Learning (CL; De Sieno, 1988), The Inter Markovian Artificial Neural Network (TInMANN, Van den Bout and Miller, 1989), Frequency-Sensitive Competitive Learning (FSCL: Ahalt et al, 1990; Galanopoulos and Ahalt, 1996), and the BDH algorithm (Bauer et al, 1996).
- the fixed kernel density estimation procedure outlined above is applied, including the automatic choice ofthe smoothness parameter, eq. (12). Hill- climbing and meanC labeling are then applied.
- the misclassification rates are given in Table 2. Note that the kMER results are those of Table 1 listed in the meanC column.
- kMER can be applied in combination with VK, to model the class- conditional densities p( ⁇
- Nv M discrete number of kernels
- hill-climbing can be usedto label the neurons, the density estimates ofthe neurons with the same label can be groupd and these estimates used for modeling the class-conditional densities; the prior class probabilities then simply follow from the fact that each neuron is active with equal probability (equiprobabilistic map). This can be done in an efficient ways as follows, the method being an embodiment ofthe present invention.
- the class-conditional densities can be determind and a given sample classified into class C,. when: p(v ⁇ C i .)P(C i .) > p(v ⁇ C j )P(C j ), Vj ⁇ i*, (17) or when its posterior probability satisfies:
- the class boundaries can be approximated by discretizing the input space into bins and by looking for adjacent bins for which the classes differ.
- this process is too time-consuming when the dimensionality increases.
- a computationally more efficient procedure is to consider the proportion of input samples which fall outside the subspace defined by the union ofthe hyperspheres with radii
- this value immediately yields an estimate ofthe true outlier probability; for multi-kernel classes it is a heuristic the quality of which improves when the class-conditional density becomes more peaked.
- an efficient way to proceed when the kernels are radially- symmetric Gaussians e.g., is to generate a set of input samples for a single d- dimensional Gaussian, say with unit radius and zero mean.
- these samples can then be shifted and stretched according to the kernel's actual mean and radius, and the proportion of samples determined that fall outside the union of hyperspheres mentioned earlier. This procedure is then repeated for each kernel of the class. The outlier probability is then estimated by the average of this sample proportion
- topological defects may occur when the neighborhood range is too rapidly decreased.
- metrics have been devised in order to quantify the degree of topology-preservation of a given map, such as the topographic product (Bauer and Pawelzik, 1992) and the topographic function (Villman et al, 1997).
- the topographic product is explained below.
- such a metric can be used for monitoring the degree of topology-preservation achieved during learning, at least in principle for the topographic function, since it is computationally much more intensive than the topographic product.
- kMER uses overlapping, spherical quantization regions (RF regions), instead of non-overlapping, Voronoi-based ones, we can take advantage ofthe overlap to assess the degree of topology-preservation achieved in the map, albeit in a heuristic manner. Assume that the equiprobabilistic scale factor p is chosen in such a way that the neurons ofthe untangled map have overlapping RF regions.
- a given map is more likely to be untangled if the number of neurons that are activated by a given input is constant over the training set. Indeed, the number of neurons that will be activated at a topological defect will be higher than in an already untangled part ofthe lattice. Furthermore, if that number is constant, it also implies that the map is locally smooth. Hence, it is desired to adjust the neighborhood "cooling" scheme in such a manner that the variability in the number of active neurons is minimized over the training set.
- This "variability score” divided by the mean number of activate neurons, is then a metric, which will be called the Overlap Variability (OV) metric.
- the usual topology-preservation metrics are not very sensitive in detecting small topological defects, or in distinguishing them from locally non-smooth portions of the map.
- the lattice is already untangled, the topographic product yields lower scores for larger neighborhood ranges.
- the usual metrics are quite heavy to run as a monitoring tool, during the learning phase.
- ⁇ ⁇ ( cr ⁇ o ex P ⁇ 2 ⁇ ⁇ 0 ⁇ ⁇ ov ,(26) max with ⁇ 0 the initial range and ⁇ os a parameter that controls the slope ofthe cooling scheme ("gain").
- the monitoring algorithm proceeds as follows. (Note that the algorithm is exemplified in the simulation example discussed next.)
- the map is first trained with a constant neighborhood range, namely ⁇ ⁇ 0 , during a fixed number of epochs, in order to obtain a more or less untangled lattice: this lattice then serves as a common starting point for all simulation runs. Let this lattice be called ⁇ 0 .
- a one-dimensional lattice i.e. a chain
- the same initial weights are taken as for the S ⁇ M algorithm.
- the monitoring results are summarized in Fig. 9.
- the OF and the neighborhood cooling plots are shown in Fig. 2A (thick and thin continuous lines). It is observed that, after a transitional phase, OF stabilizes.
- the topographic product (TP) is plotted (thick dashed line in Fig. 9A). Note that the desired P-value here is zero (thin dashed line). However, unlike the overlap variability, there is no clear optimum in the topographic product plot.
- the lattices are shown obtained without and with monitoring in Fig. 10A, B, respectively: the former is the result of continuing the first run until 100 epochs have elapsed, while the latter is the result ofthe fourth run.
- the effect of monitoring is clearly seen: the lattice is untangled and closely matches the theoretical principal curve ofthe distribution (dashed line).
- TP What is desired for TP is that it should be as close as possible to zero for the lattice to be untangled as much as possible. (Note that TP can be larger or smaller than zero.)
- the typical real-world application of data analysis involves analyzing multi-dimensional data to extract occult relationships.
- this involves the integration of several components: kMER learning, density estimation and optimal smoothness determination, clustering with hill-climbing technique or similar, input sample labeling, and the effect ofthe input dimensionality on the system's performance.
- hierarchical clustering may be performed. This involves determining major clusters in a first application of kMER (level 1) followed by a further application of kMER on individual clusters in level 1 to resolve more detail (clusters, level 2) and so on.
- the result of repeated clustering and application of kMER is a tree of interlinked datanodes.
- An example tree of datanodes is shown in Fig.19.
- the present invention also includes use of kMER to generate the first level clusters followed by application of other methods to analyze these clusters further.
- the clusters are extracted in each level after smoothing, density based clustering and labeling.
- Smoothing is a form of regression analysis which renders the topographic map in a more suitable form for clustering by eliminating local discontinuities and irregularities.
- Density based clustering after smoothing is preferably performed by hill-climbing but the present invention is not limited thereto.
- Labeling is the association of data with the relevant cluster. This operation must be accurate, correct data should be associated with the appropriate cluster if the total data model is to remain consistent despite distributed processing (Figs. 1 and 2).
- a datanode is a kernel-based processing engine (e.g. a neural network) which represents a part ofthe complete data model generated from the complete data as well as the associated data.
- a datanode in accordance with the present invention is also a data model. This extraction of data is safe because ofthe linear mapping between actual and represented data density in the topographic map produced by kMER.
- the most generalized version ofthe data model forms the top datanode - i.e. level 1, Fig. 19.
- kMER kMER-based clusters generated in one level resolves more detail, i.e. generates more clusters. At some point there is little or no detail left for kMER to identify in clusters. Then, further development ofthe tree may be stopped.
- Clusters which cannot be resolved any further are leaf datanodes.
- the leaf datanodes in Fig. 19 are surrounded by ellipses.
- Leaf datanodes may be generated in any level, in Fig. 19 they appear at level 2.
- the topographic map has less dimensions than the input space as this results in a data model which is accurate but compressed in size, but the present invention is not limited thereto.
- the music signal was generated on a Crystal 4232 audio controller (Yamaha
- OPL3 FM synthesizer at a sampling rate of 11,025 Hz, using the simulated sound of an oboe, a piano and a clarinet.
- Eight notes (one minor scale) were played on all three instruments consecutively, namely "F”, “G”, “Ab”, “Bb”, “C”, “Db”, “Eb”, and the eight note is an "F” again but one octave higher. This results in a signal track of 14 seconds.
- STFT Short-Time Fourier Transform
- spectral data are high-dimensional, and since more than one interpretation can be given to describe their similarity, a hierarchical clustering analysis is performed. Several levels in the analysis are distinguished, of which two are developed in Figs. 11 and 12.
- PCA Principal Component Analysis
- s(t) [s(t), ..., s(t - 1023)] is performed first and their amplitude spectra F(s) are projected onto the subspace spanned by the first k PC Principal Components (PCs) ofthe training set (boxes labeled "FFT” and "PCA” in Fig. 11).
- a two-dimensional rectangular lattice in this subspace (“kMER") is then developed and the converged lattice used for density-based clustering ("Clustering").
- the converged lattice is used for estimating the density distribution underlying the projected amplitude spectra.
- the scale factor ofthe kernel-based density estimate is optimized in order to obtain the optimal degree of smoothness.
- density-based clustering is performed, using the hill-climbing technique, and labels assigned to the neurons ofthe same clusters ("Clustering").
- Clustering the individual amplitude spectra are labeled (“Labeling") using the minimum Euclidean distance labeling technique (minEuC).
- the labeled spectra are represented as F 1 (1, i, s), where the superscript indicates Level 1, the parameter 1 the inherited cluster label (which is set to 1 by default, since Level 1 is the "root” level ofthe clustering tree), and the cluster label that music signal tract s receives at Level 1.
- the approach in accordance with the present invention differs from Kohonen's ASSOM (Kohonen, 1995; Kohonen et al, 1997), albeit that the neuron weights could be inte ⁇ reted as "features" and, thus, the map as a topographic feature map.
- the analysis is continued and a "clustering within the clusters" is performed: the Level 1 procedure is repeatedfor each cluster separately (ith Level 2 analysis in Fig. 12), starting from the amplitude spectra which received the same Level 1 label, F 1 (1, , s).
- the refinement of a given Level 1 cluster is only done when there is more than one cluster at Level 1, and the Level 2 result is accepted when the clustering analysis is valid.
- a simple heuristic called the "Continued
- CCA Clustering Analysis
- the amplitude spectra receive an additional Level 2 label ("Labeling" in Level 2).
- the labeled spectra are represented as ⁇ 2 (i,j, s), where the superscript indicates Level 2, i the cluster label inherited from Level 1, and/ the cluster label received at Level 2.
- the PCs ofthe training set of spectral data are computed, after subtracting the set average.
- Density estimation The method proceeds by determining the variable kernel density estimate p , which corresponds to the converged lattice, by exchanging the neuron weights w, and RF radii ⁇ t by Gaussian kernels K with centers w, and kernel radii p s ⁇ t .
- the next step is to determine the optimal degree of smoothness.
- optimizing for p r requires several runs ofthe monitoring algorithm (and thus of kMER), only the scaling factor/ ⁇ is optimized, but accordingly the definition ofthe fixed kernel estimate is slightly modified:
- an embodiment ofthe present invention makes use of an approximate representation ofthe input data density.
- the representation preferably has less dimensions than the input data space, for example, use is made of a principle manifold.
- advantage is taken ofthe fact that the two-dimensional lattice forms a discrete approximation ofthe possibly nonlinear manifold in kMER's input space, and a two-dimensional density estimate with respect to this lattice is developed.
- the variable kernel density estimate then becomes:
- the optimal value for p s was 14.375 (note that p r was fixed at 2).
- cluster map is developed for the smallest value of & that yields 8 clusters (Fig. 15). Eight contiguous labeled areas are observed, indicating that the neighborhood relations in the input space are preserved, i.e., that there is indeed a topographic map.
- the spectra of all notes and instruments can be identified after two levels of clustering are performed, except for the two "F" notes; although they differ by one octave, they have a similar harmonic structure.
- the corresponding spectra are grouped into one cluster at Level 1 (144 in total).
- Level 2 two clusters are found: one of them combines the clarinet and the piano that play the "F” note with the lower pitch (48 spectra), and the other combines the "F” note with the higher pitch (for all instruments) and the "F” note with the lower pitch for the oboe (96 spectra).
- the former cluster decomposes into two clusters, one for the clarinet and the other for the piano (24 spectra each), and the latter cluster decomposes into 2 cluster of which one represents the "F” note with the higher pitch for the clarinet (25 spectra) and the other represents a combination (71 spectra) which is, finally, at Level 4, decomposed into the "F” note with the lower pitch for the oboe (24), and the "F” note with the higher pitch for the piano (25) and the oboe (23). Hence, the decomposition into the "F” note with the lower and the higher pitch, and the musical instruments that play them is complete.
- P(i) is the prior probability of input sample v being generated from component ofthe mixture and p( ⁇ I i) the zth component density.
- the P(i)'s are often regarded as mixing parameters, in addition to the parameters which specify the component density function. For the mixing parameters:
- kMER's density estimation format can be re-written in terms of a mixture distribution, with the component densities represented by the kernel functions
- an input sample v can be generated by first selecting one of the components i at random, with probability P(i), and then by generating the input sample from the corresponding component density p( ⁇
- the component densities are often referred to as likelihood functions for the observed values of v.
- Density estimation with kMER using the VK format, can be considered as a semi-parametric method when the fact is emphasized that the number of kernels is much lower than the number of samples, N « M , but, since a fixed lattice is used, N does not vary during training. Furthermore, no assumption is necessary of a specific type of kernel function during training, since a pilot density estimate is generated with which the parameters ofthe variable kernels can be chosen. There is also a more fundamental distinction between the two training procedures which is discussed below.
- the importance ofthe mixture models is not only restricted to density estimation.
- the technique also finds its applications in other neural network areas, i.e. in configuring the basis functions in radial basis function (RBF) networks, in conditional density estimation, soft weight sharing, and the mixture-of-experts model (for an overview, see Bishop, 1995).
- kMER has also been considered recently for configuring RBF's (Ridella et al, 1999).
- the mixing parameters P(i), the kernel centers w cauliflower and the kernel radii ⁇ j, / 1, ..., N.
- the negative log-likelihood for the input sample set is given by:
- kMER starts from different assumptions and determines the kernel centers and radii in a completely different manner, but there is also a more subtle difference when density estimation is concerned.
- the maximum likelihood method first the kernel centers and radii of the component densities p( ⁇ ⁇ i) are determined, and then the prior probabilities P(i).
- the prior probabilities are determined in such a way that they become all equal, by adjusting the kernel radii, and at the same time the component densities are determined since the kernel centers are also adjusting during learning. Or, in other words, the prior probabilities are not considered to be additional model parameters.
- the data set is partitioned into one or more subsets for which data and regression modules are developed separately.
- the subsets consist of either complete data vectors or incomplete data vectors, or of a mixture ofthe two.
- the data vectors are incomplete since they contain missing vector components. This could be due to a partitioning into subspaces.
- the ensemble ofthe data modules form the data model and, similarly, the regression modules form the regression model.
- This embodiment ofthe present invention minimizes the need for communication between the data and regression modules during their development as well as their subsequent use: in this way, delays due to communication are minimized and, since only model parameters need to be communicated, issues concerning data ownership and confidentiality are maximally respected and data transfer is reduced to a minimum.
- the unknown function can be a scalar or a vector function.
- a function ⁇ -J ⁇ x a function ⁇ -J ⁇ x
- x e Fc 5R d needs to be estimated from a given set of M possibly noisy input samples:
- the vector case is often treated as the extension ofthe scalar case by developing d y scalar regression models independently, one for each vector component.
- the regression performance improves if the number of "knots” in the model are not fixed but allowed to depend dynamically on the data points: indeed, “dynamic” knot allocation is known to yield a better regression performance than its static counte ⁇ art (Friedman and Silverman, 1989).
- the "knots” are the points in F space that join piecewise smooth functions, such as splines, which act as inte ⁇ olating functions for generating values at intermediate positions. Alternatively, kernels are centered at these points (kernel-based regression).
- kernel-based regression proceeds as follows.
- a kernel such as a circular-symmetrical Gaussian, with a height equal to the corresponding desired output value y"; all Gaussians have the same radius (standard deviation) which, in fact, acts as a smoothing parameter.
- the output ofthe regression model for a given input x, can be written as:
- kernels for example, N circular-symmetrical Gaussians.
- the kernels are normalized so that an inte ⁇ olation between the kernels centers can be carried out, e.g., using the normalized
- the positions (centers) w, ofthe Gaussians are chosen in such a manner that the regression error, i.e., the discrepancy between the output ofthe model and the desired output, for a given set of input/output pairs is minimal; the radii are chosen in an ad hoc manner.
- This is basically the Radial Basis Function (RBF) network approach introduced by Moody and Darken (1988).
- RBF Radial Basis Function
- the parameters ofthe kernel-based regression model i.e., the regression weights W ] and W ⁇ kernel centers w, and possibly also the common kernel radius cror the variable radii ⁇ admir are optimized by minimizing the regression error. This is usually done by a learning algorithm which iteratively adjusts these parameters until the regression error becomes minimal (e.g., for a separate test set of input/output pairs).
- the foregoing may be described as a monolithic approach: all model parameters are optimized or chosen with a given regression task in mind.
- the present embodiment provides a modular one: in essence, the kernel centers w, and kernel radii ⁇ , are first determined separately from the regression task. This is done with an optimization procedure that operates on samples x ⁇ drawn from the input distribution (in F-space). This results in what is called a data model. Then, for a given regression task and, thus, for a given set of input/output pairs, only the regression weights W j and W y are optimized so as to minimize the regression error; the data model parameters are kept constant. This second, regression model is specified by the regression application at hand.
- the data and regression models are, thus, developed separately, and they operate in sequence (vertically modularity, see Fig. 20A).
- the data model is developed in accordance with the kernel-based topographic map formation procedure described above.
- the kernel centers w, and kernel radii ⁇ are obtained in such a manner that an equiprobabilistic map is obtained: each kernel will be activated with equal probability and the map is a faithful representation ofthe input distribution.
- the present embodiment provides the following advantages:
- the input samples x ⁇ may be incomplete, i.e., some ofthe vector components may be missing (incomplete data handling)
- the present embodiment is modular in two ways: vertically modular (data model and regression model) and horizontally modular (data modules and regression modules).
- the data model consists of a lattice A, with a regular and fixed topology, of arbitrary dimensionality d A , in ⁇ -dimensional input space V ⁇ z 9t
- a formal neuron which possesses, in addition to the traditional weight vector w sans a circular- (or hyperspherical-, in general) activation region S braid with radius ⁇ personally in F-space (Fig. 21 A).
- the neural activation state is represented by the code membership function:
- the weights w are adapted so as to produce a topology- preserving mapping: neighboring neurons in the lattice code for neighboring positions in F-space.
- the radii ⁇ are adapted so as to produce a lattice of which the neurons have an equal probability to be active (equiprobabilistic map), i.e.,
- the data model is trained as follows.
- a training set M ⁇ x ⁇ ⁇ of M input samples is considered.
- the kernel-based Maximum Entropy learning Rule (kMER) updates the neuron weights w, and radii ⁇ , as follows (Van Hulle, 1998, 1999b):
- the lattice forms a discrete estimate to a possibly non-linear data manifold in the input space F 2.
- the lattice defines a topology-preserving mapping from input to lattice space: neighboring neurons in the lattice code for neighboring positions in F-space 3. more neurons will be allocated at high density regions in F-space ("faithful" map, Van Hulle, 1999b).
- the adapted kMER algorithm is given in Fig. 22, in batch mode, but it can equally well be shown for incremental learning.
- the missing entries can be filled in by applying the incomplete input vector to the map, the neuron with the closest weight vector can be determined, by ignoring the missing vector components, and the latter can be replaced by those ofthe closest weight vector.
- the algorithm is shown in Fig.
- the circular (spherical, hyperspherical) activation region __?, of each neuron can be supplemented with a kernel, e.g., a normalized Gaussian:
- AW j ⁇ p[ ⁇ ⁇ - W j g j ix'.p ⁇ g j ix", s ), (64) in the case of univariate regression.
- the rule is readily extendible to the multivariate case by developing one series of weights for each output vector component.
- PPR projection pursuit regression
- C( ⁇ K ) is cyclically minimized for the residuals of projection direction k until there is little or no change.
- Training for the smoothing factor p s can also be done with the delta rule, however, better results (i.e., more reliable than would be achieved by learning) can be obtained in the following manner.
- the following quantity is defined as the training error:
- the decision depends on how many training samples are being used, and how many are available for testing.
- TE is plotted as a function of p s ⁇ and the p s -vah ⁇ e that minimizes TE is sought.
- a clear parabolic (TE, p s ) plot is expected. Only three points are then necessary to locate the minimum, theoretically, but the effect of numerical errors must be considered.
- the following strategy can be adopted for determining three TEs
- p s is increased, e.g., to 1.1, the regression model is trained again and the second training error, TE 2 , is determined
- p s is decreased, e.g., to 0.9, the regression model is trained again and the third training error, TE 3 , is determined.
- the location can be estimated, or the direction where the minimum should be sought determined.
- regression model will be fast since only one stage ("layer") of connection weights need to be trained; 3. the regression procedure adapts itself to the data (self-organization): since with kMER the weight density will be proportional to the input density, the regression model will be optimal in the sense that it will be locally more detailed when there are locally more input data available (higher density in x-space), and vice-versa; 4. by virtue ofthe delta rule and the smoothness procedure, regression modeling is easily implementable.
- Each subset of input vectors is used for developing a corresponding data model (data model 1, ..., m).
- the weights w, and radii ⁇ j available in the m data models are then used for introducing kernels g, in the regression module.
- the denominator in the normalized Gaussians definition refers to all kernels of all data models.
- the W and p s are determined, across the m data models, according to the scalar regression task.
- the former can be extended to a vectorial regression task, and desired output vectors can be considered instead of scalars.
- d y regression models are developed, one for each output vector component. This can be done in parallel. 14.2. Subsets of data - multiple regression modules
- the previous regression modeling strategy can be extended by breaking up the single regression module into m + 1 regression modules at two levels (Fig. 27): at the first level, there are m regression modules, one for each subset of input/output pairs, and at the second level, there is one module that integrates the outputs ofthe first level modules in order to produce the final regression output O.
- the WO j are parameters that can be trained in the same way as explained before, or in a similar way. If the subsets are perfectly disjunct, then one can simply take the sum of all O
- Bayesian approach Another embodiment that does not require additional training will now be introduced: it will take advantage ofthe fact that the regression surface developed in each module will be more detailed when more samples are available. Hence, if there is disposed of a density estimate in each module, then there will be more certainty ofthe regression output produced by a given module if the corresponding input also corresponds to a high local density.
- the regression output is taken for which the largest number of local input/output pairs were available for training.
- This selection criterion is pronounced of a Bayesian pattern classification procedure (whence the procedure's name). The selection itself occurs in the decision module. An estimate ofthe probability density of each module is obtained from the data modules as explained above with respect to the topographic map embodiments.
- the L y S can be based on experience or expressed in a more quantitative manner, for example, in monetary terms.
- Vectorial regression task The previous embodiment can be extended to a vectorial regression application, thus with desired c ⁇ -dimensional output vectors, rather than scalars. d y regression models are developed in parallel, one for each output vector component. All training processes can run in parallel.
- data module 1 is developed using the subspace vector x ⁇ , ..., x d ) , data module 2 using (x d +1 , ..., x d ) , and finally, data module m using (x d _ +1 , ..., x d ) .
- a data module is developed with kMER using the subspace vectors that are locally available. It is assumed that the desired output values (scalars) are available on each data server, m (level 1) regression modules are then locally developed, as explained in section 13.2.1, thus, using the subspace vectors as input vectors. The outputs of the m regression modules still have to be integrated into one global regression result. This is done by applying backfitting (see section 13.2.1).
- each module j the module's regression output O,(a y x ⁇ ) is determined for each input vector ofthe training set. These outputs (i.e., scalars) are then communicated to all other data servers. Hence, in this way, each data server disposes ofthe regression outputs produced by all regression modules for all the input vectors used for training.
- a module index, i is intruduced and is put i ⁇ — 1. Let she a preset level of training error.
- the total regression error (TRE) is defined as follows:
- Module 'S regression parameters are adapted so as to reduce the total regression error (gradient descent step, in batch mode); the parameters of all regression modules are kept constant.
- the regression outputs of module i are determined for each subspace input vector ofthe training set and communicated to the next module on which a learning step is going to be performed, module i + 1.
- We increment The module index is incremented, i - i + 1, and another backfitting run is performed.
- the regression modules can be used as follows. First, the subspace input vectors are applied to each regression module, then the corresponding regression module outputs O k are determined and communicated to the level 2 regression module which, in turn, calculates the final regression result: m I
- the previous embodiment can be easily extended to vectorial regression, thus with ⁇ -dimensional desired output vectors, rather than scalars: d y regression models are developed, one for each output vector component. All training processes can run in parallel.
- Subsets of subspaces - mixed case The case where the data modules are trained on subsets that consist of mixtures of input space vectors as well as subspace vectors is now considered, thus with missing input vector components (see Fig. 25C). There are two possibilities. If there are plenty of input space vectors available for developing the data modules and/or the subspace vectors only have a limited number of missing vector components, then the presence ofthe missing vector components can be ignored and the strategy mentioned under section 14.2 can be applied: m regression models are developed using the input space vectors as well as the subspace input vectors (Fig. 30). For the subspace input vectors the data completion procedure explained in section 13.1.2 is used.
- the first strategy is clearly an heuristic one: its success will critically depend on the ratio between the number of complete and incomplete input vectors, and the number of missing input vector components.
- the second strategy is in principle correct but it requires much more data communication than in the previous case.
- the subspace dimensionality that is used for each data module might correspond to the set of vector components that are in common to the training subset or, if this is too extreme, a common set of vector components can be chosen or determined, and data completion can be performed on the missing vector components and the ones that do not belong to this common set can be simply ignored.
- such an approach is also an heuristic one.
- vectorial regression application can be performed by developing d y regression models in parallel.
- the systems and methods described above may be used to model a physical system using a topographic map.
- a physical parameter ofthe modeled system may be changed, optimized, or controlled in accordance with an estimated data density determined in accordance with the present invention. 16. Definitions
- Entropy of a variable is the average amount of information obtained by observing the values adopted by that variable. It may also be called the uncertainty of the variable.
- Kernel-based a particular type of receptive field - it has the shape of a local function, e.g. a function that adopts its maximal value at a certain point in the space in which it is developed, and gradually decreases with distance away from the maximum point.
- Topographic map a mapping between one space and another in which neighboring positions in the former space are mapped onto neighboring positions in the latter space. Also called topology-preserving or neighborhood-preserving mapping. If there is a mismatch in the dimensionality between the two spaces there is no exact definition of topology preservation and the definition is then restricted to the case where neighboring positions in the latter space code for neighboring positions in the former space (but not necessarily vice versa).
- Receptive field the area or region or domain in the input space within which a neuron (generally synonymous with synaptic element of a neural network) can be stimulated.
- Self-organizing refers to the genesis of globally ordered data structures out of local interactions.
- Non-parametric density estimation no prior knowledge is assumed about the nature or shape ofthe input data density.
- Non-parametric regression no prior knowledge is assumed about the nature or shape ofthe function to be regressed.
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00955981A EP1222626A2 (en) | 1999-08-30 | 2000-08-30 | Topographic map and methods and systems for data processing therewith |
AU68122/00A AU6812200A (en) | 1999-08-30 | 2000-08-30 | Topographic map and methods and systems for data processing therewith |
EP01925220A EP1295251A2 (en) | 2000-04-13 | 2001-04-13 | Methods and systems for regression analysis of multidimensional data sets |
AU2001252045A AU2001252045A1 (en) | 2000-04-13 | 2001-04-13 | Methods and systems for regression analysis of multidimensional data sets |
PCT/BE2001/000065 WO2001080176A2 (en) | 2000-04-13 | 2001-04-13 | Methods and systems for regression analysis of multidimensional data sets |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15194799P | 1999-08-30 | 1999-08-30 | |
US60/151,947 | 1999-08-30 | ||
GBGB0008985.4A GB0008985D0 (en) | 2000-04-13 | 2000-04-13 | Distributed non-parametric regression modeling with equiprobalistic kernel-basedd topographic maps |
GB0008985.4 | 2000-04-13 | ||
GB0015526A GB0015526D0 (en) | 2000-06-26 | 2000-06-26 | Distributed non-parametric regression modeling with equiprobalistic kernel-based topographic maps |
GB0015526.7 | 2000-06-26 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001016880A2 true WO2001016880A2 (en) | 2001-03-08 |
WO2001016880A3 WO2001016880A3 (en) | 2002-05-16 |
Family
ID=56290051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/BE2000/000099 WO2001016880A2 (en) | 1999-08-30 | 2000-08-30 | Topographic map and methods and systems for data processing therewith |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1222626A2 (en) |
AU (1) | AU6812200A (en) |
WO (1) | WO2001016880A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004017258A2 (en) * | 2002-08-14 | 2004-02-26 | Wismueller Axel | Method, data processing device and computer program product for processing data |
US7743086B2 (en) | 2007-06-14 | 2010-06-22 | Microsoft Corporation | Distributed kernel density estimation |
WO2011063518A1 (en) * | 2009-11-24 | 2011-06-03 | Zymeworks Inc. | Density based clustering for multidimensional data |
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
US20200184134A1 (en) * | 2018-05-08 | 2020-06-11 | Landmark Graphics Corporation | Method for generating predictive chance maps of petroleum system elements |
CN112861669A (en) * | 2021-01-26 | 2021-05-28 | 中国科学院沈阳应用生态研究所 | High-resolution DEM topographic feature enhancement extraction method based on earth surface slope constraint |
CN113468801A (en) * | 2021-06-07 | 2021-10-01 | 太原科技大学 | Method for predicting residual life of gear by estimating nuclear density |
CN115455772A (en) * | 2022-09-15 | 2022-12-09 | 长安大学 | Microcosmic reservoir rock conductivity prediction method |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10261506B2 (en) | 2002-12-05 | 2019-04-16 | Fisher-Rosemount Systems, Inc. | Method of adding software to a field maintenance tool |
CN101655847B (en) * | 2008-08-22 | 2011-12-28 | 山东省计算中心 | Expansive entropy information bottleneck principle based clustering method |
-
2000
- 2000-08-30 WO PCT/BE2000/000099 patent/WO2001016880A2/en not_active Application Discontinuation
- 2000-08-30 AU AU68122/00A patent/AU6812200A/en not_active Abandoned
- 2000-08-30 EP EP00955981A patent/EP1222626A2/en not_active Withdrawn
Non-Patent Citations (4)
Title |
---|
VAN HULLE M M: "Clustering with kernel-based equiprobabilistic topographic maps" NEURAL NETWORKS FOR SIGNAL PROCESSING VIII. PROCEEDINGS OF THE 1998 IEEE SIGNAL PROCESSING SOCIETY WORKSHOP, CAMBRIDGE, UK, 31 AUG.-2 SEPT 1998, pages 204-213, XP002183512 1998, New York, NY, USA, IEEE, USA ISBN: 0-7803-5060-X * |
VAN HULLE M M: "Density-based clustering with topographic maps" IEEE TRANSACTIONS ON NEURAL NETWORKS, JAN. 1999, IEEE, USA, vol. 10, no. 1, pages 204-207, XP002183511 ISSN: 1045-9227 * |
VAN HULLE M M: "Faithful representations with topographic maps" NEURAL NETWORKS, ELSEVIER SCIENCE PUBLISHERS, BARKING, GB, vol. 12, no. 6, July 1999 (1999-07), pages 803-823, XP004174077 ISSN: 0893-6080 * |
VAN HULLE M M: "Nonparametric regression analysis achieved with topographic maps developed in combination with projection pursuit learning: an application to density estimation and adaptive filtering of grey-scale images" IEEE TRANSACTIONS ON SIGNAL PROCESSING, NOV. 1997, IEEE, USA, vol. 45, no. 11, pages 2663-2672, XP002183199 ISSN: 1053-587X * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004017258A2 (en) * | 2002-08-14 | 2004-02-26 | Wismueller Axel | Method, data processing device and computer program product for processing data |
WO2004017258A3 (en) * | 2002-08-14 | 2004-11-11 | Axel Wismueller | Method, data processing device and computer program product for processing data |
DE10237310B4 (en) * | 2002-08-14 | 2006-11-30 | Wismüller, Axel, Dipl.-Phys. Dr.med. | Method, data processing device and computer program product for data processing |
US7567889B2 (en) | 2002-08-14 | 2009-07-28 | Wismueller Axel | Method, data processing device and computer program product for processing data |
US7743086B2 (en) | 2007-06-14 | 2010-06-22 | Microsoft Corporation | Distributed kernel density estimation |
AU2010324501B2 (en) * | 2009-11-24 | 2016-05-12 | Zymeworks Inc. | Density based clustering for multidimensional data |
US9165052B2 (en) | 2009-11-24 | 2015-10-20 | Zymeworks Inc. | Density based clustering for multidimensional data |
WO2011063518A1 (en) * | 2009-11-24 | 2011-06-03 | Zymeworks Inc. | Density based clustering for multidimensional data |
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
CN103177088B (en) * | 2013-03-08 | 2016-05-18 | 北京理工大学 | A kind of biomedical vacancy data make up method |
US20200184134A1 (en) * | 2018-05-08 | 2020-06-11 | Landmark Graphics Corporation | Method for generating predictive chance maps of petroleum system elements |
CN112861669A (en) * | 2021-01-26 | 2021-05-28 | 中国科学院沈阳应用生态研究所 | High-resolution DEM topographic feature enhancement extraction method based on earth surface slope constraint |
CN112861669B (en) * | 2021-01-26 | 2021-12-10 | 中国科学院沈阳应用生态研究所 | High-resolution DEM topographic feature enhancement extraction method based on earth surface slope constraint |
CN113468801A (en) * | 2021-06-07 | 2021-10-01 | 太原科技大学 | Method for predicting residual life of gear by estimating nuclear density |
CN113468801B (en) * | 2021-06-07 | 2024-03-26 | 太原科技大学 | Gear nuclear density estimation residual life prediction method |
CN115455772A (en) * | 2022-09-15 | 2022-12-09 | 长安大学 | Microcosmic reservoir rock conductivity prediction method |
Also Published As
Publication number | Publication date |
---|---|
EP1222626A2 (en) | 2002-07-17 |
WO2001016880A3 (en) | 2002-05-16 |
AU6812200A (en) | 2001-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mueller et al. | Machine learning in materials science: Recent progress and emerging applications | |
Pal et al. | Pattern recognition algorithms for data mining | |
Imandoust et al. | Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background | |
Palczewska et al. | Interpreting random forest classification models using a feature contribution method | |
Wu et al. | On quantitative evaluation of clustering systems | |
US8250004B2 (en) | Machine learning | |
US11693917B2 (en) | Computational model optimizations | |
Salehi et al. | SMKFC-ER: Semi-supervised multiple kernel fuzzy clustering based on entropy and relative entropy | |
WO2003083695A1 (en) | Support vector machines for prediction and classification in supply chain management and other applications | |
Tsui et al. | Data mining methods and applications | |
AghaeiRad et al. | Improve credit scoring using transfer of learned knowledge from self-organizing map | |
Tai et al. | Growing self-organizing map with cross insert for mixed-type data clustering | |
Sriwanna et al. | Graph clustering-based discretization of splitting and merging methods (graphs and graphm) | |
EP1222626A2 (en) | Topographic map and methods and systems for data processing therewith | |
Brazdil et al. | Dataset characteristics (metafeatures) | |
ElShawi et al. | csmartml: A meta learning-based framework for automated selection and hyperparameter tuning for clustering | |
Abidi et al. | A new algorithm for fuzzy clustering handling incomplete dataset | |
Camastra et al. | Clustering methods | |
Śmieja et al. | Semi-supervised model-based clustering with controlled clusters leakage | |
WO2001080176A2 (en) | Methods and systems for regression analysis of multidimensional data sets | |
Andonie et al. | Neural networks for data mining: constrains and open problems. | |
Patil et al. | Efficient processing of decision tree using ID3 & improved C4. 5 algorithm | |
Fu et al. | A supervised method to enhance distance-based neural network clustering performance by discovering perfect representative neurons | |
Bernardi et al. | Clustering | |
Rahmani et al. | A self-organising eigenspace map for time series clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000955981 Country of ref document: EP |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10069841 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000955981 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000955981 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |