WO2019116388A1

WO2019116388A1 - A method for data mining by identifying and generating data processes and associated scenarios

Info

Publication number: WO2019116388A1
Application number: PCT/IN2018/050830
Authority: WO
Inventors: Sharad Saxena
Original assignee: Xyda Analytic Research Private Limited
Priority date: 2017-12-13
Filing date: 2018-12-12
Publication date: 2019-06-20

Abstract

The present invention discloses a method of tracing end-to-end data processes in a system by identifying a key process document. The disclosed method traces data processes by identifying crucial points in a key process document and using the key process document as a starting point for tracing multiple different interconnected data processes. Using the key process document, data processes are back-traced or forward traced, and associated information and relevant key process documents are traced and collected. Based on the tracing of the data, a comprehensive list of all fields related to various data processes and corresponding information in the system is created, and is further utilized to generate and evaluate test cases and scenarios to improve the data processes for efficient and accurate data management.

Description

Title of Invention : A method for data mining by identifying and generating data processes and associated scenarios

Technical Field

[1 ] The field of invention generally relates to a method of data mining in a system, and more specifically, it relates to data mining by tracing data processes for scenario generation and evaluation based on forward and backward tracing of data processes.

Background Art

[2] Data management is generally understood as a practice of organizing and maintaining data and related processes in order to meet information lifecycle needs, and plays a crucial role in all major industries.

[3] Commerce, in today’s day and age, has become more and more complex and dynamic. The gamut of modern-day supply and demand has been constantly expanding to accommodate growing needs of its customers. From the fields of business to the fields of IT, every sector has experienced a hike in the demand for goods and services.

[4] More so, the expansion of each of these sectors has pushed the management of the respective sectors to be automated. With data processes in each sector becoming complex with each passing day, it has become crucial to trace and understand the relationship between interconnected data processes.

[5] The existing systems for tracing data processes fall short on being completely independent and reliable without requiring any human input. At times, the existing systems for managing and tracing data processes may also be inadequate to accommodate tremendous data in large businesses.

[6] The existing systems for tracing data processes do not have any provisions for generating situations or scenarios to understand and evaluate data processes.

[7] The various existing methods for test case generation and generating automated test scripts for execution also require human input at times and are not fully automated. [8] Moreover, human involvement in data management processes may also lead to data being duplicated, thus increasing the redundancy in a system.

[9] Thus, in the light of this discussion, there is a long unresolved need for a method of efficiently tracing data processes and evaluating test cases and scenarios related to the data processes to make data management an easier process.

Object of Invention

[10] The primary object of the invention is to provide a method for data mining in a system or an organization.

[11 ] Another object of the invention is to provide a method for backward and forward tracing of data processes by identifying a key process document.

[12] Another object of the invention is to provide a method for mining information related to various data processes in a system or an organization.

[13] Another object of the invention is to provide a method for utilizing data processes for the purpose of scenario generation.

[14] Yet another object of the invention is to provide a method of generating scenarios and avoid duplicity of data by assigning unique identities to the generated scenarios.

Summary of Invention

[15] The present invention provides a method of data mining in a system by identifying and generating data processes and corresponding scenarios. Effective data mining is done by tracing data processes in a system.

[16] The method of tracing data processes in a system or an organization comprises identifying a key process document in a particular data function. The key process document acts as a starting point for initiating the tracing of data processes wherein crucial points are extracted from the key process documents. Data processes are back-traced and forward-traced based on the extracted points.

[17] Subsequently, a unique set of data is generated based on the tracing of the data. This unique set of data comprises information related to the data processes and details corresponding to the data processes. Brief Description of Drawings

[18] This invention is illustrated in the accompanying drawings, throughout which, like reference letters indicate corresponding parts in the various figures.

[19] The embodiments herein will be better understood from the following description with reference to the drawings, in which:

Fig. 1

[20] [Fig. 1 ] depicts/illustrates in detail the method in which data processes are traced in a system, in accordance with an embodiment of the invention.

Fig. 2

[21 ] [Fig. 2] depicts/illustrates the method, by way of an exemplary embodiment, in which key process documents are utilized for tracing data functions and related data processes, in accordance with an embodiment of the invention.

Fig. 3

[22] [Fig. 3] depicts/illustrates an exemplary embodiment of a method of tracing data processes in a business environment, in accordance with an embodiment of the invention.

Fig. 4

[23] [Fig. 4] depicts/illustrates a catalogue and its components, in accordance with an embodiment of the invention.

Fig. 5

[24] [Fig. 5] depicts/illustrates the various fields associated with data processes, in accordance with an embodiment of the invention.

Fig. 6

[25] [Fig. 6] depicts/illustrates in detail associations with each field with data processes, in accordance with an embodiment of the invention.

Fig. 7 [26] [Fig. 7] depicts/illustrates in detail the generation and association of scenarios with distinct fields, in accordance with an embodiment of the invention.

Fig. 8

[27] [Fig. 8] depicts/illustrates an environment comprising scenario generation and identity assigning from a catalogue, in accordance with an embodiment of the invention.

Fig. 9

[28] [Fig. 9] depicts/illustrates in detail the complete method fortracing of data and generation of scenarios, in accordance with an embodiment of the invention.

Description of Embodiments

[29] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and/or detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practised and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[30] The present application is a combination/cognate of two Indian provisional applications. Furthermore, the present application claims priority from Indian provisional application number 201741044777 and Indian provisional application number 201741044785.

[31 ] The present invention discloses, by way of an illustrative embodiment, a method of data mining in a system by tracing data processes and generating and evaluating corresponding scenarios. According to the current invention, function in this context refers to a particular action in a system or a particular objective having a particular manner of working. The method disclosed in the present invention makes use of a key process document to trace data processes in a system, and further generates particular scenarios to evaluate the efficient functioning of the system. [32] In the context of the present invention, a system may be any data management system implemented in an environment or an organization or may be an independent system to manage the flow of data. A data process may be understood as a process of flow of data associated with a particular data function.

[33] In the context of the present invention, process data may be defined as information about a particular process and metadata may be defined by the commonly known definition of metadata, i.e. , data that describes the nature of other data.

[34] A key process document may be understood as any physical document or a digital copy of the physical document corresponding to a particular data function in a system and comprising authorized details of the particular data process.

[35] Throughout this description, a method for data mining by tracing data processes and generating scenarios has been explained with the help of an exemplary embodiment. This exemplary embodiment should not be read as a limitation of this invention and the scope of this description covers other embodiments wherein the disclosed method of data mining by tracing data processes may be utilized.

[36] Referring now to the drawings, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.

[37] Fig. 1 depicts/illustratesin detail the method in which data processes are traced in a system, in accordance with an embodiment of the invention.

[38] In a preferred embodiment, initially a key process document is identified in a particular data function, as depicted at step 102. As aforementioned, a key process document may be any document comprising authorized details of a particular data process. In one embodiment of the invention, a key process document may be a digital copy of a bill, a receipt, a memo, and the like. The key process document may comprise information related to the data process. The key process document may be identified as belonging to a particular data function, depending on the information comprised in the key process document. [39] In an embodiment, as an example, a key process document comprising particulars of items bought and the amount of cash paid may be identified as belonging to a billing process, i.e. , a billing function of a system or an organization.

[40] Further, at step 104, tracing of data processes takes place. The tracing of data processes comprises backward tracing and forward tracing. Based on the key process document, various data processes related to a particular business function are traced. Data processes and corresponding documents preceding the key process document are traced in the backward tracing, and data processes and corresponding documents preceding the key process document are traced in the forward tracing. For example, if the key process document is identified as a sales order, documents associated with preceding steps of a sales order, such as ordering raw material, manufacturing cost details, etc. are identified in the backward tracing stage. Succeeding documents such as delivery costs, etc. are identified in forward tracing stage.

[41 ] Once the documents have been identified, unique sets of data are generated based on the tracing of data processes, as depicted at step 106. Further, at step 108, a catalogue is generated comprising a comprehensive list of fields associated with the various data functions. The catalogue also comprises information related to various fields. Further, scenario generation and evaluation takes place based on the catalogue, as depicted at step 110.

[42] Fig. 2 depicts/illustrates, by way of an exemplary embodiment, the method in which data functions and relevant data processes are identified. A key process document 202 is identified and further classified as belonging to a particular data function. As aforementioned, a key process document may be any document comprising authorized information about a particular data function.

[43] In one embodiment of the invention, more than one document may be identified as key process documents.

[44] Initially, backward tracing of data processes takes place wherein the key process document 202 is used as a starting point for tracing. Subsequently, data processes and documents preceding the key process document are traced based on the type of key process document. [45] In an exemplary embodiment, three data processes may have taken place before the data process in the key process document 202. Data process 1 , data process 2, and data process 3 may have taken place before the data process in the key process document 202. In an exemplary embodiment, data process 1 further produces process document 1 and process document 2. Data process 3 may not be associated with any process document, and may not have any further process linked with it. Key process document 202 may be traced back to both data process 3 and data process 4, wherein there may be not further data process beyond data process 3, however, data process 4 may further be traced to process document 2.

[46] Further, with respect to figure 2, forward tracing may take place from the key process document 202, wherein the key process document 202 may be traced to data process 5 and data process 6. Both data process 5 and data process 6 may be further traced to process document 3.

[47] The foregoing explanation will be understood better with the help of an exemplary embodiment as illustrated in Fig. 3.

[48] Fig. 3 depicts/illustrates an exemplary embodiment of a billing document to elaborate on the method of backward tracing and forward tracing of data processes.

[49] In the exemplary embodiment, the billing document 302 is determined to be used as a key process document. Based on the billing document 302, data processes are further backward traced or forward traced depending upon the precedence with respect to the billing document 302. Since the key process document is a billing document, data processes that are traced in backward tracing are delivery 312, sales order 318 and the original quotation 324. Further, purchase order 320, shipment 314 and proforma 310 associated with the delivery 312 are also traced. Sales order 318 is further traced back to the purchase requisition 316. Documents related to contracts or agreements 326 associated with sales order 318 are also traced. Since quotation 324 can only be issued after an inquiry 322, no process is further traced after inquiry 322 or the quotation 324. [50] Purchase requisition 316 may involve successful or unsuccessful transactions, and hence returns or credits or debits 328 are traced from purchase requisition 316.

[51 ] Further, with respect to Fig. 3, forward tracing from the billing document 302 takes place. Intercompany 308, in forward tracing, may be linked with returns or credits or debits 328. Cancellation 306 of the items that the billing document 302 has been generated for is directly traced from the billing document 302 and is not traced to any other process. Invoice list 304 is also traced in the forward tracing from the billing document 302. Further, the invoice list 304 is also traced to the processes between delivery 312 and sales order 318.

[52] In a preferred embodiment, links and documents related to every data process are exhausted as aforementioned, making the tracing system efficient and reliable.

[53] In a preferred embodiment, after the tracing the data processes, a unique set of end-to-end data processes is generated. This set of data of data process flows is generated by identifying one or more unique sets of data process documents linked to each other. Unique sets of various processes are derived by comparing process data and metadata, along with other data functions related details.

[54] For example, in one embodiment of the invention, for a sales function, unique sets of data may be generated in the following manner:

#1 - Quotation: Sales Order - Delivery - Billing - Invoice List

#2- Contract: Sales Order - Billing

#3- Billing: Returns Order - Credit Memo

[55] In a preferred embodiment, with each data process, they may be a further variation based on the process data or the metadata. For example, in the aforementioned example, in case of Process #2,‘contracts’ can be value-based or quantity-based contracts. This variation may further create two sub-processes under process #2.

[56] Further, in a preferred embodiment, a catalogue is generated comprising a list of fields associated with various data functions. Fig. 4 depicts/illustrates a catalogue 402 comprising related-fields information 404 and related-fields corresponding information 406.

[57] In a preferred embodiment, catalogue 402 is generated by tracing data processes and listing all fields associated with the various data functions. A combination of different data fields is further utilized in the method to generate scenarios and test cases to avoid data redundancy. Related fields information 404 is elaborated upon in Fig. 5. Related-fields corresponding information 406 comprises information related to the various data fields associated with the various data functions.

[58] In a preferred embodiment, data mining is implemented to fetch information related to the various data fields.

[59] In one embodiment of the invention, there may be more than one catalogues in a data management system.

[60] Fig. 5 depicts/illustrates the constituents of related-fields information 404. In a preferred embodiment, the related-fields information 404 comprises of data functions and corresponding fields associated with each mentioned data function. In a preferred embodiment, each data function may have‘n’ number of fields associated with a data function.

[61 ] For example, in one embodiment of the invention, a list of relevant fields for a billing process may comprise fields such as billing party, accounting details, legal requirements, pricing and taxation information, and the like. Further, a delivery process may comprise fields such as, but not limited to, warehouse details, goods movement, storage location, and delivery schedule.

[62] In one embodiment of the invention, fields related to one data function may overlap with fields from another data function.

[63] Fig. 6 depicts/illustrates an exemplary listing of data functions and a grouping of an exhaustive list of fields of respective data functions. The grouping of fields may be customized as required by selecting only desired fields to be considered for scenario generation, regardless of whether undesired or ignored fields are being utilized in data processes or not. [64] In a preferred embodiment, after tracing data from the key process document 202, a list of data processes may be generated along with the corresponding fields. In one embodiment, a particular data process 1 may comprise‘n’ number of fields, data process 2 may comprise‘m’ fields of data and data process 3 may comprise‘o’ fields of data.

[65] Fig. 7 illustrates a list of scenarios determined by each unique combination of fields and their respective values. As depicted in Fig. 7, Field-1 has o-number of values, Field-2 has p-number of values and Field-3 has q-number of values, which collectively forms m-number of unique combinations and hence m-number of unique scenarios. The values of fields can be null or blank for a given combination and forms the part of the uniqueness of a given combination.

[66] As aforementioned, field groups can be customized by selecting desired fields and ignoring others. Similarly, particular field values associated with particular fields can either selected or ignored from the data fetched. These custom izations directly impact derived unique combinations and hence number of scenarios determined.

[67] In a preferred embodiment, multiple different scenarios comprising overlapped fields may be generated.

[68] For example, the table below illustrates a sales function for a group of countries:

[69] In the aforementioned exemplary embodiment, first level of customization is applied, i.e. , all selected fields and respective values are chosen and rest fields and respective values are ignored. At second level customization, particular field values are also ignored, in Table-1 above,“Customer Country” - Singapore is ignored or excluded. The results of these custom izations is illustrated in a table below:

[70] Fig. 8 depicts/illustrates different catalogues, i.e. , the field-groups along with custom izations and the results of the customizations into various scenarios which can be represented by a unique identity.

[71 ] The various scenarios originating from an original data source 802, which, in a preferred document, may be a key process document (refer Fig. 2) are further classified into catalogues 402. In one embodiment of the invention, there may be more than one catalogue 402.

[72] As depicted in the figure, catalogue 402-1 and catalogue 402-2 comprise of different combinations of fields. Catalogue 402-1 comprises fields A, B, and C, whereas catalogue 402-2 comprises fields B, C, and D.

[73] The various generated scenarios are assigned a unique identity to differentiate scenarios from each other. In one embodiment of the invention, scenarios may be grouped together based on similarity and may be assigned a similar unique identity. For example, as illustrated in Fig. 8, scenario 1 and scenario 2 may be assigned unique identity 1 based on similar fields that the scenario 1 and scenario 2 may be made of or on the basis of similar data processes. Similarly, scenario 3, scenario 4 and scenario 5 may be assigned a unique identity 2, and so on.

[74] In one embodiment, scenarios generated in a different catalogue may also be assigned unique identities in groups. In a preferred embodiment, the assigned unique identities are used to fetch scenarios on demand.

[75] In a preferred embodiment, the original data source 802 from where the data is fetched remains intact and data is not duplicated to represent or store various scenarios. The uniqueness of scenarios maintained by the unique identity, thus, reduces data redundancy by duplicity.

[76] Fig. 9 depicts/illustrates in detail acomplete method for data mining by tracing data processes and generating corresponding scenarios.

[77] Initially, a key process document is identified and further classified according to the particular function it belongs to, as depicted at step 902. The identified key process document then acts as a starting point for tracing the relevant data processes. [78] Backward tracing of data processes takes place, wherein all the preceding data processes and relevant documents are identified, as depicted at step 904. Further, at step 906, after the backward tracing of the data processes, forward tracing of the data processes takes place. In this step, the succeeding data processes and corresponding documents are identified. Based on the traced data processes, a unique set of data is generated at step 908. This unique set of data comprises the data processes, the corresponding documents and information related to the data processes.

[79] At step 910, a catalogue is generated comprising a comprehensive list of fields associated with the various data functions in a system. Further, data mining is implemented at step 912 to fetch information related to each of the fields associated with the data functions, as per the catalogue.

[80] Based on step 912, unique scenarios are generated by the disclosed method at step 914 by combining various different fields. These scenarios are also test cases in evaluating the data functions in a system. As depicted at step 916, a unique identity is assigned to each of the scenarios that are generated and further, at step 918, the scenarios are linked to data process that the scenarios originate from. The assigning of the unique identity to the scenarios and the linking to the data process of origination allows the system to identify redundant data by avoiding duplicity of data.

[81 ] The disclosed method can be applied in any organization to efficiently identify and record different data functions. The recording of data function can further enable the organization to efficiently manage data, test, replicate, or regenerate processes, and re-engineer data processes without hassle. It can also enable an organization to review, analyze and execute test cases for data processes. Another important feature of the method is that data management systems are free of data redundancy that may arise out of duplicity and the data management systems are thus very efficient and reliable.

[82] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the scope of the embodiments as described herein.

Claims

[Claim 1 ] A method for data mining by identifying and generating data processes and associated scenarios, the method comprising:

identifying a key data process document (202) within a function in a data processing system;

tracing at least one preceding data process document from the key process document (202);

tracing at least one succeeding data process document from the key process document (202);

generating one or more end-to-end data process flows by identifying one or more unique sets of data process documents linked to each other; generating one or more scenarios by generating one or more unique sets of data values for one or more data fields in the data process; and organizing one or more scenarios in a repository by representing each scenario with the key data process document.

[Claim 2] The method as claimed in claim 1 , wherein the tracing of preceding data process and the tracing of the succeeding data process further generates a catalogue for the said data process.

[Claim 3] The method as claimed in claim 1 , wherein the said catalogue comprises of at least one distinct field related to at least one function of the said data process.

[Claim 4] The method as claimed in claim 3, wherein data mining is implemented to fetch details related to the said functions of the said data process.

[Claim 5] The method as claimed in claim 1 , wherein the said scenario is assigned a unique identity and a link.

[Claim 6] The method as claimed in claim 5, wherein the said link is a connection between the said scenario and the data process from which the said scenario is generated.

[Claim 7] The method as claimed in claim 5, wherein the said unique identity is used as a reference to organize the one or more said data processes and avoid data redundancy by eliminating copies of existing data.

[Claim 8] The method as claimed in claim 5, wherein the said unique identity is used to fetch one or more scenarios on demand]