AU2021106050A4

AU2021106050A4 - An efficient technique for heterogenous data using extreme learning approach via unsupervised multiple kernels

Info

Publication number: AU2021106050A4
Application number: AU2021106050A
Authority: AU
Inventors: Salim Amdani; Gajendra Bamnote; Sohel Bhura; Anand Chaudhari; Hemant Deshmukh; Sunil Gupta; Sumedh Ingale; Roshan Karwa; Zeeshan Khan; Ankit Mune; Mahendra Pund; Vijaya Shandilya
Original assignee: Deshmukh Hemant Dr
Current assignee: Deshmukh Hemant Dr
Priority date: 2021-08-19
Filing date: 2021-08-19
Publication date: 2021-11-25
Anticipated expiration: 2029-08-19

Abstract

AN EXTREME LEARNING APPROACH FOR HETEROGENEOUS DATA USING UNSUPERVISED MULTIPLE KERNELS The present invention relates to an extreme learning approach for heterogeneous data using unsupervised multiple kernels.The proposed invention provides an efficient three-stage unsupervised multiple kernel clustering based extreme learning machine (TMKC-ELM).TMKC ELM will alternatively extract information from multiple sources and learn the heterogeneous data representation with closed-form solutions, which enables its extremely fast speed. This work will be helpful in analysis of social network in the view of heterogeneous data. Thus the present invention proposed an efficient three-stage unsupervised multiple kernels Extreme learning approach.

Description

AN EFFICIENT TECHNIQUE FOR HETEROGENOUS DATA USING EXTREME LEARNING APPROACH VIA UNSUPERVISED MULTIPLE KERNELS

Technical field of invention:

The present invention relates to the field of computer science and engineering and more particularly relates to an extreme learning approach for heterogeneous data using unsupervised multiple kernels.

Background of the present invention

The background information herein below relates to the present disclosure but is not necessarily prior art.

Heterogeneity is one of the main features of big data and heterogeneous data contributes to information convergence and big data analytics issues. Before being unified and incorporated, the harmonization of heterogeneous sources of data in an individual data structure is important. Information is obtained from heterogeneous sources, including system samples, warning logs, ultrasonic flow and stress measurements of high frequency, working log and video recordings. Most methods of heterogeneity was not managed well by data processing and machine learning.

Considering the multiple sources of heterogeneous data jointly offers a number of opportunities for improved reliability and robustness of monitoring algorithms. Novel techniques need to be developed to tackle the challenges of heterogeneous data. Testing such algorithms requires benchmark datasets that allow direct comparison of the performance of the methods.

Thus unsupervised learning is a machine learning technique, where you do not need to supervise the model. Unsupervised machine learning helps you to finds all kind of unknown patterns in data.

Advanced unsupervised learning techniques are emergency yet challenge in the big data era due to the increasing requirements of extracting knowledge from a large amount of unlabeled heterogeneous data.

Recently, many efforts of unsupervised learning have been done to effectively capture information from heterogeneous data. However, most of them are with huge time consumption, which obstructs their further application in the big data analytics scenarios where an enormous amount of heterogeneous data are provided but real-time learning are strongly demanded. Researches tried to address this problem by proposing a two-stage unsupervised multiple kernel extreme learning machine which alternatively extracts information from multiple sources.

Therefore to overcome the drawbacks of the existing methodology there exist a need to enable the learning without supervised labels. Hence the present invention provides a a three stage multiple kernel-based unsupervised learning approach for heterogeneous data.

Objective of the invention:

The primary object of the present invention is to provide an extreme learning approach for heterogeneous data using unsupervised multiple kernels.

Another object of the present invention is to provide an efficient a three stage multiple kernel-based unsupervised learning objective to learn the optimal kernel combination coefficients.

Summary of the invention

Accordingly present invention provides an extreme learning approach for heterogeneous data using unsupervised multiple kernels. The heterogeneous information obtained from various sources will be collected over multiple kernels and implemented with an iterative stage approach, led by a generalized unsupervised objective, into an optimal kernel. Datasets will be pre-processed to remove dirty values. The K-Space will be generating data from several kernels and assign pseudo-labels to the optimal kernel by clustering algorithms as per the learned optimal kernel. This research work will be an attempt to propose an efficient three-stage unsupervised multiple kernels Extreme learning approach.

Detailed description of invention

The present invention relates to an extreme learning approach for heterogeneous data using unsupervised multiple kernels. The proposed invention provides a fast unsupervised heterogeneous data learning algorithm, namely three-stage unsupervised multiple kernel clustering based extreme learning machine (TMKC-ELM).

Further in the preferred embodiment of the present invention the heterogeneous information obtained from various sources will be collected over multiple kernels and implemented with an iterative stage approach, led by a generalized unsupervised objective, into an optimal kernel. Datasets will be pre-processed to remove dirty values. The K-Space will be generating data from several kernels and assign pseudo-labels to the optimal kernel by clustering algorithms as per the learned optimal kernel.

In the present invention the proposed methodology will be working in following phases. First phase is data collection wherein the benchmark heterogeneous datasets is use for experiment. These data sets can be accessed from UCI Machine Learning Repository.

Second phase is data pre-processing. Today's real-world databases are highly susceptible to noise, missing, and inconsistent data because of their typically huge size (often several gigabytes or more) and their likely origin from multiple, heterogeneous sources. Incomplete data can occur for a number of reasons. Attributes of interest may not always be available. Data pre-processing is a proven method of resolving such issues.

The third phase of proposed methodology is K-space data construction. TMKC-ELM will extract heterogeneous information from multiple sources by p kernel functions. These kernel functions can be design according to prior knowledge and data characteristics. After the kernel projection, TMKC-ELM gets a set of k base kernel matrices, which is used for the optimal kernel generation and K-Space data construction. Denoting the data set in a K-Space as Z, the transformation from K to Z of a given data set X is formalize. TMKC-ELM will assign K-Space pseudo-label via clustering algorithm. The optimal kernel will generate by a linear combination of the k base kernel matrices according to a set of combination coefficients.

Another phase of proposed methodology is clustering. Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind. The goal of clustering is to determine the internal grouping in a set of unlabeled data. It is the user who should supply this criterion, in such a way that the result of the clustering will suit their needs.

The final phase is multiple kernel learning. For nk K-Space data and pseudo-labels, TMKC-ELM will be optimizing the given objective function to calculate the optimal kernel combination coefficients. For data from a lot of multiple sources, TMKC-ELM prefers to calculate the optimal solution in a faster way.

Thus the proposed invention is an attempt to propose an efficient three stage unsupervised multiple kernels Extreme learning approach.

The many features and advantages of the invention are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the invention which fall within the true spirit and scope of the invention. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS

1. An extreme learning approach for heterogeneous data using unsupervised multiple kernels which provides three-stage unsupervised multiple kernel clustering based extreme learning machine (TMKC-ELM), characterized in that,

the heterogeneous information obtained from various sources will be collected over multiple kernels and implemented with an iterative stage approach;

led by a generalized unsupervised objective into an optimal kernel;

datasets will be pre-processed to remove dirty values;

the K-Space will be generating data from several kernels and assign pseudo-labels to the optimal kernel by clustering algorithms as per the learned optimal kernel.

2. An extreme learning approach for heterogeneous data using unsupervised multiple kernels as claimed in claim 1 the said methodology works in the phases such as

data collection phase wherein the benchmark heterogeneous datasets is used and these data sets can be accessed from UCI machine learning repository;

data pre-processing phase which resolves issues such noise, missing, and inconsistent data because of their typically huge size;

K-space data construction phase which denotes the data set in a K-Space as Z the transformation from K to Z of a given data set X is formalize;

Clustering phase determines the internal grouping in a set of unlabeled data; multiple kernel learning phasefor nkK-Space data and pseudo-labels which further calculate the optimal kernel combination coefficients.

3. An extreme learning approach for heterogeneous data using unsupervised multiple kernels as claimed in claim provides a fast unsupervised heterogeneous data learning algorithm namely three-stage unsupervised multiple kernel clustering based extreme learning machine (TMKC-ELM).