AU2020103440A4

AU2020103440A4 - A method for optimizing the convergence performance of data learning with minimal computational steps

Info

Publication number: AU2020103440A4
Application number: AU2020103440A
Authority: AU
Inventors: Gulfishan Firdose Ahmed; Raju Barskar; Gaurav Dhiman; S. Gomathi; Rajeev Kumar Gupta; Arpana Dipak Mahajan; Rashmi Rani Patro; Rojalini Patro; Yudhvir Singh; Mukesh Soni
Original assignee: Patro Rojalini Dr; Singh Yudhvir Dr
Current assignee: Patro Rojalini Dr; Singh Yudhvir Dr
Priority date: 2020-11-14
Filing date: 2020-11-14
Publication date: 2021-01-28
Anticipated expiration: 2028-11-14

Abstract

The present invention relates to a method for optimizing the convergence performance of data learning with minimal computational steps. In this invention, a method for maximizing the convergence efficiency of data learning with limited computational steps is proposed to solve the problem of complexity and learning time. This invented method is useful for enhancing both processing performance and computational speed, and can outperform current unsupervised approaches with a wider breadth of applicability to futuristic applications of big data analytics. Following invention is described in detail with the help of Figure 1 of sheet 1 showing an overview of the research approach with the block-oriented design of KOCM. 1 Sochi Mdia Machine Trasactional Data GntdData Data KOCM Learning Approach <unsupervised k-means»> Leaning from uslabellcd data QptimltailuiiModeling j using kenlcefcet I Faster Learning approachJ I KOCNI Performance validation 1) Complexity and 2) Convergence Figure1I

Description

Sochi Mdia Machine Trasactional Data GntdData Data

KOCM Learning Approach <unsupervised k-means»>

Leaning from uslabellcd data QptimltailuiiModeling j using kenlcefcet I Faster Learning approachJ

I KOCNI Performance validation 1) Complexity and 2) Convergence

Figure1I

-I A METHOD FOR OPTIMIZING THE CONVERGENCE PERFORMANCE OF DATA LEARNING WITH MINIMAL COMPUTATIONAL STEPS

Technical field of invention:

Present invention in general relates to the field of computer engineering and more specifically to a method for optimizing the convergence performance of data learning with minimal computational steps.

Background of the invention:

The background information herein below relates to the present disclosure but is not necessarily prior art.

There are several optimization techniques known in the art that relate to optimization for big data. Some of the known ones are convergent parallel algorithms, limited memory bundle algorithm, diagonal bundle method, convergent parallel algorithms and network analytics. But at present, unsupervised learning modeling and deep learning are both envisioned for a better scope of knowledge extraction during big data learning scenarios. And mostly big data streams get generated from multiple sources. The said sources include the hidden and unknown patter of information attributes, which requires efficient data learning mechanisms to be incorporated for a better scope of knowledge discovery.

Known in the art are solutions like CN104965851A, US9177550B2, JP2006285899A that discloses about data analyzes solutions. Reference is made to document entitled 'Big Data Optimization: Recent Developments and Challenges', Ali Emrouznejad; DOI: 10.1007/978 3- 319-30265-2, which provides an insight to the various challenges of big data analysis. Further, known in the art are solutions like 'Multiple kernel clustering with local kernel alignment maximization; M. Li, X. Liu, L. Wang, Y. Dou, J. Yin, and E. Zhu, 2016' discloses a solution in which an alignment helps the clustering algorithm to focus on closer sample pairs that shall stay together and avoids involving unreliable similarity evaluation for farther sample pairs.

It is known in the art that unsupervised learning approach isvital to extract knowledge from the unlabelled big data stream, and many efforts for knowledge discovery have currently become wide-ranging. However, most of the traditional approaches of unsupervised learning are errorprone and shrouded with complex problems. Big data mostly contains unlabelled information, and hence an extensive research effort has already been laid towards applying unsupervised learning modeling. However, there exists a gap in the conventional research approach in terms of complexity and learning time, which restricts their further casestudies into a nearly effective big data analytics environment.

In view of the foregoing, there exists the dire need of a solution for a method for optimizing the convergence performance of data learning with minimal computational steps.

Objective of the invention

An objective of the present invention is to attempt to overcome the problems of the prior art and provide a method for optimizing the convergence performance of data learning with minimal computational steps.

It is therefore an object of the invention to provide a solution forbig data analysis that includes both computational efficiency and speed ofcomputation.

It is further an object of the invention to enhance both processing performance and computational speed, and can outperform current unsupervised approaches with a wider breadth of applicability to futuristic applications of big data analytics

These and other objects and characteristics of the present invention will become apparent from the further disclosure to be made in the detailed description given below.

Summary of the invention:

Accordingly following invention providesa method for optimizing the convergence performance of data learning with minimal computational steps.In view of these objects, the invention discloses a kernel-based unsupervised learning model executes optimized learning performance from heterogeneous unlabelled big data and also accomplishes the targets of computational efficiency with effective convergence solution.In an aspect of the invention is disclosed a kernel Oriented Controller Modelling (KOCM) system to optimize the convergence performance in data analytics. The system comprises a sub system including a means for two-fold procedural modeling. The said two-fold procedural modeling comprises a first sub unit configured to obtain the heterogeneous data attributes from multiple forms of sources using plurality of kernel agents and a second sub unit configured to perform an optimized data learning based on the hidden labeled data attributes obtained by first sub unit. The kernel agents are configured to initiate the generation of a space-feature vector (sp vector) for obtaining an optimal kernel factor and a hidden labeled data attributes.

Brief description of drawing:

This invention is described by way of example with reference to the following drawing where,

Figure 1 of sheet 1 illustratesan overview of the research approach with theblock-oriented design of KOCM

In order that the manner in which the above-cited and other advantages and objects of the invention are obtained, a more particular description of the invention briefly described above will be referred, which are illustrated in the appended drawing. Understanding that these drawing depict only typical embodiment of the invention and therefore not to be considered limiting on its scope, the invention will be described with additional specificity and details through the use of the accompanying drawing.

Detailed description of the invention:

The present invention providesa method for optimizing the convergence performance of data learning with minimal computational steps.The proposed invention provides a method for maximizing the convergence efficiency of data learning with limited computational steps is proposed to solve the problem of complexity and learning time.

The present invention discloses a kernel-based unsupervised learning method. The disclosed learning model assures optimized learning performance from heterogeneous unlabelled big data. It further accomplishes the targets of computational efficiency with effective convergence solution.

In an embodiment of the present invention is discloses a kernel oriented controller modelling, KOCM, in the form of a system. The herein disclosed system overcomes the limitation of the prior art by introducing a novel heterogeneous data learning algorithm using the KOCM approach. The approach assists in the use of big data analytics from the view-point of better computational efficiency. The approach incorporates two-fold prime design modeling with unsupervised kernel-based data learning.

The core-backbone of the disclosed system is focused at the fact that unsupervised learning approach is perquisites to extract knowledge from the unlabelled big data stream. As already discussed, the traditional approaches fail to serve the purpose since they are gullibleto error and often result in magnified complexities. Hence, the herein disclosed system attempts to overcome this limitation by introducing novel heterogeneous data learning approach using the KOCM approach. This approach assists in the use-cases of big data analytics from the view-point of better computational efficiency. The approach incorporates two-fold prime design modelings with unsupervised kernel-based data learning. Figure 1 30 shows an overview of the research approach with the block-oriented design of KOCM. As would be evidenced, the primary object of the systems to escalate the data learning speed with lower computational complexity factors.

In an embodiment of the invention is disclosed the KOCM system that comprises a sub system including a means for two-fold procedural modeling wherein the two-fold procedural modeling. The system comprises a first sub unit configured to obtain the heterogeneous data attributes from multiple forms of sources using plurality of kernel agents. Further, included in the system is a second sub unit configured to perform an optimized data learning based on the hidden labeled data attributes obtained by first sub unit.

The first stage of KOCM relates to attaining of optimization factor. The KOCM design and modeling are conceptualized on the basis of twol5 fold procedural modeling where first procedure involves obtaining the heterogeneous data attributes from multiple forms of sources. The different sources are including and not limited to social media, transactional data sources, and so on. The data is obtained by means of multiple kernel agents (KAs).

Further, the KAs enable a controller. The said controller is further configured to analyze the obtained acquired data attributes. The obtained data attributes are analyzed into an optimized environment of clustering using the unsupervised learning of k-means. While performing the optimization, the coefficient measurement and estimation significantly affects the speed of the process.

According to this embodiment of the invention, data acquisition using KA enables multiple functions of fKA(x). The said function is represented as

fKA(X)={fKA-(xMfKA-2(x).fK4X),f.A.(X).fK4X)..fKA-(x)

With fKA(x) the data from multiple sources are obtained. These KAs are designed for specific terms of big data sources in terms of their distinctive characteristics features and prior learned information attributes. The design is thus based on one or more specific features of the big data source which is based on pre-attained knowledge. The kernel agents are further configured to introduce another function that initiates the generation of a space feature vector (sp-vector).The sp-vector is constructed from feature attributes of fKA(x). The sp-vector is formalized as Sp-vector(u(i),u(j)} 4- T:fKA(u(i),u(j)) Y(u(i),(J fi

In the above presentation, T refers to the transformation process. Hereu(i),u(j) refers to the data attributes object taken through multiple KAs.The system is further configured to perform the labeling of KAs attributes in the sp-vector space using an approach of combinational co efficient measures() and analysis. The said approach is represented as:

Opt-fKA(i) = la(i)x fK(i) where 1sisp

In the above represent equation (1), p is a base factor associated with fKA(i).This process obtains the optimal kernel factor, which considers the coefficient metric evaluation. According to an embodiment of the invention, in stage-1 of the disclosed system, the optimal kernel factor (Opt-fKA) gets generated, which assists in speeding up the learning process. Further, the optimal k-means approach performs clustering of the optimal fKAinSp-vector. The formulated clusters by optimized k-means also assist in the labeling of the data clusters. The labeling takes place when the clustering of Sp-vector data is done. In another embodiment of the invention is presented the second stage of the KOCM. This procedural operations in KOCM perform anoptimized data learning based on the hidden labeled data attributes found in Opt-fKA(i) in the post-k-means clustering approach. The following implementation shows the optimized data clustering and data learning paradigm for the KOCM. It shows both the first stage and the second stage operations.

In an implementation is disclosed the data clustering and learning paradigm in KOCM of the invention. The steps in this implementation of the invention may be performed by a processing unit. This is by way of example and not by way of limiting the scope of the invention. Multiple heterogeneous source data (Hd) from multiple sources {s1 to sn}, values of the function flKA(x) and the number of clusters (nC)are the basic inputs to the system for this implementation. The implementation of the first stage of KOCM comprising: Si1: formulating kernels to import data from multiple heterogeneous resources. This is followed by the mapping of s(n) to the plurality of kernel agents. S12: formulating by the KA, Sp-vector{u(i),u(j)}. This is data space formulation. S13: computation of optimal kernel based on the combinational coefficient measures (a) using eq. (1). The completion of step 3 marks the completion of the first stage of implementation in the KOCM.

In a further implementation of the invention is disclosed a second stage of the KOCM. The second stage of this implementation comprises: S21: managing the kernel combinational coefficient matrix. S22: performing unsupervised clustering using flnn(X). The performing of the unsupervised clustering is done for each of the obtained optimal kernel. S23: performing hidden-labeling of clusters with co-efficient measures (a). S24: performing the training operation with reduced operational cost.

The instant implementation further includes minimizing the learning cost with training efficiency and optimizing the classification accuracy and loss. This implementation of the present invention provides selection of optimizedfactor, Opt-fKA and further provides an output of unsupervised learning with higher accuracy and cluster assignments.

From the detailed implementation of the invention, it is clearly evidenced that how to formulate the concept of heterogeneous data learning schema. It obtains optimized convergence solution with faster training and also reduces the computational burden on the system within a finite number of iterative stapes. The system is also configured to compute a loss factor. The computation of loss factor advantageously enhances the heterogeneous data learning procedure from multiple sources. This aims to attain better insight into analytics performance. The said loss value is associated with the convergence factor. Advantageously, the disclosed invention accomplishes better learning by mapping significant tasks to the KOCM.

The outcome obtained from the disclosed invention, after performing an unsupervised kernel based clustering approach on heterogeneous unlabelled big data streams from multiple sources has been evaluated. The system adopts numerical analysis to evaluate the performance of KOCM. A study was conducted to evaluate the performance of the disclosed system. The study evaluated its performance in terms of i) cost of computation and ii) clustering accuracy. The figure 2 shows the outcome obtained for the cost of computation in view of the disclosed KOCM, as compared to the solution of the referred prior art.

Reference is made to figure 2 that show infinite iterative steps of the approach of KOCM disclosed in the present invention attain very lesser computational cost. The complexity level is reduced from O(nA3) to O(nA2) which is quite higher in the case of approach of the referred prior art. Such reduced computational complexity is especially vital as it advantageously solves the issues of learning from unlabelled data in a very expensive way. Figure 2 further shows that for three different types of the dataset also KOCM accomplishes a very lesser cost of computation as opposed to the solution of Li et al. The computational cost of Li et al is comparatively higher. This is a direct consequence of the expensive way of computational procedure of the said prior art. In the system disclosed in the present invention, the clustering effectiveness is measured in terms of the moralized factor of information (NMI), accuracy, and pureness. Thus, the various embodiments disclosed herein essentially lead to a novel approach of kernel oriented controller modeling (KOCM), which is an unsupervised approach to learn from big unstructured data from various resources.

The many features and advantages of the invention are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the invention which fall within the true spirit and scope of the invention. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.

Editorial Note 2020103440 There is only one page of Claim

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1. A method to optimize the convergence performance of data learning with minimal computational steps, wherein the system comprises:

a sub system including a means for two-fold procedural modeling wherein the two-fold procedural modeling comprises:

a first sub unit configured to obtain the heterogeneous data attributes from multiple forms of sources using a plurality of kernel agents; wherein the kernel agents are configured to initiate the generation of a space-feature vector (sp-vector) for obtaining an optimal kernel factor and a hidden labeled data attributes; and

a second sub unit configured to perform an optimized data learning based on the hidden labeled data attributes obtained by 20 first sub unit.

2. The method as claimed in claim 1, wherein the kernel agents are configured to enable a controller to analyze the obtained data attributes into an optimized environment of clustering using an unsupervised learning of k-means.

3. The method as claimed in claim 1, wherein the plurality of kernel agents is configured for specific big data sources based on one or more distinctive features and prior learned data attributes.

4. The method as claimed in claim 1, wherein the space-feature vector (sp-vector) is: Sp vector{u(i),u(j)} T:fKA(u(i),u(j)); wherein T is a transformation process, fKA(i) is a function for data acquisition using a kernel agent and u(i),u(j) are the data attributes object taken through the plurality of kernel agents.

5. The method as claimed in claim 1, wherein the first sub unit is configured to compute a loss factor to enhance the heterogeneous data learning procedure from the multiple sources.