WO2023111627A1 - System for information extraction and mining and method, and computer program product thereof - Google Patents

System for information extraction and mining and method, and computer program product thereof Download PDF

Info

Publication number
WO2023111627A1
WO2023111627A1 PCT/IB2021/061663 IB2021061663W WO2023111627A1 WO 2023111627 A1 WO2023111627 A1 WO 2023111627A1 IB 2021061663 W IB2021061663 W IB 2021061663W WO 2023111627 A1 WO2023111627 A1 WO 2023111627A1
Authority
WO
WIPO (PCT)
Prior art keywords
engine
information processing
data
information
processing engine
Prior art date
Application number
PCT/IB2021/061663
Other languages
French (fr)
Inventor
Mohanasankar SIVAPRAKASAM
Keerthi Ram S.S
Original Assignee
Indian Institute Of Technology Madras
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indian Institute Of Technology Madras filed Critical Indian Institute Of Technology Madras
Priority to PCT/IB2021/061663 priority Critical patent/WO2023111627A1/en
Publication of WO2023111627A1 publication Critical patent/WO2023111627A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Definitions

  • the present aspect generally relates to a system for information extraction and mining and more particularly to a system for information mining and extraction via knowledge graph construction and advanced information retrieval framework from insights from the data records and research documents.
  • Graph databases provide an alternative semantic mechanism for storing relationship graphs, but programmatic access to the graph is cumbersome, with alternative non-standard declarative textual query syntax.
  • Object-graph-mappers provide means to abstract the syntactic variation through annotations of entities and relationships in object-oriented programming. These are, however, programmatic tools, requiring specific syntactic knowledge, along with domain knowledge and application development skills, to implement analytics models and dashboards. Therefore, there remains a need for a system and a method that encounters knowledge graph construction and advanced information retrieval framework while extracting and mining the information from the records.
  • the system includes an information processing engine configured with a natural understanding engine to process, analyse and perform various query operations upon a data, wherein the information processing engine configured with a data repository engine and a graph database to access and store the data.
  • the system further includes a transformation engine communicating with the information processing engine to perform loading operations as inferred from the initial query analysis.
  • the system further includes a modern artificial intelligence (Al) technique module communicating with the natural language understanding engine to perform access roles in the query interface along with the analysis of the patterns in the query.
  • the system further includes a semantic analysis module communicating with the modem Al technique module to capture detailed data of semantic level result.
  • the system further includes a rapid translation module communicating with the semantic analysis module to perform analysis to the entities involved in the query, relationships, and associations in the specific information schema.
  • the system further includes a feedback mechanism engine communicating with the rapid translation module and the information processing engine, wherein the feedback mechanism reverts the data to the information processing engine to collect the explicit feedback on the relevance of the result.
  • the system further includes a display engine configured with the feedback mechanism engine to display the output data, wherein the display engine further provides a means to select a collection of stored queries and prepares a dashboard.
  • the information processing engine further executes the relevant information extraction from the graph database. In some aspects of the present disclosure, the information processing engine further enables the natural language understanding engine that interacts to the data repository engine by performing natural language understanding.
  • the information processing engine fixes schedules for periodic refreshing of the query in the background.
  • the information processing engine provides quantifiers on quality of data source and gathers complimentary or contradictory information from other relevant sources and indicating the credibility of the reported information related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.
  • the data repository engine constructs a knowledge graph by the information processing engine.
  • the transformation engine disintegrates the complex joins into graph traversals and aggregates the necessary information in parallel to facilitate multi-user access.
  • the modern artificial intelligence (Al) technique module performs the initial query analysis.
  • a method for extracting and mining the information through advanced information retrieval framework to perform query analysis is provided.
  • Second aspect of the present disclosure provides that the input data is received by the data repository engine through a user to transmit the data to an information processing engine. Thereafter, the data repository engine constructs a knowledge graph to perform query analysis via the information processing engine. Thereafter, the query is implemented using graph traversals that is stored in the graph database. Thereafter, the data is synced by the periodic synchronization through the information processing engine. Thereafter, the data is recomputed to replicate the updated relations. Thereafter, the relevant information is executed from the graph database by the information processing engine. The data is transformed via the transformation engine. Furthermore, the data is abstracted and loading operations is inferred by the initial query analysis. The data is disintegrated to complex joins into graph traversals. Thereafter, the necessary information is aggregated in parallel to facilitate multi-user access. Furthermore, the output data is displayed via the display engine to the user.
  • the information processing engine further includes a module to log frequently posed questions or analyses thereby reducing the initial keying effort.
  • FIG. 1 illustrates a block diagram of a system for information extraction and mining from the records, in accordance with an aspect of the present aspect
  • FIG. 2 illustrates a flowchart of a method for extracting and mining the information through advanced information retrieval framework, in accordance with an aspect of the aspect
  • FIGG illustrates block diagram of a system architecture of the system of Fig. 1, in accordance with an aspect of the present aspect.
  • input data refers input data that is including but not limited to lists of medical imaging and signal data, handwritten, or dictated notes, patient’s information, doctor’s information, and prescription slips.
  • FIG. 1 a block diagram of a system 100 for information extraction and mining from the records, in accordance with an aspect of the present aspect.
  • the system 100 may include an information processing engine 105, a natural language understanding engine 115, a modern Al (Artificial Intelligence) technique engine 120, a semantic analysis engine 125, a rapid translation engine 130, a data repository engine 135, a graph database 140, a feedback mechanism engine 145, a transformation engine 150 and a display engine 165.
  • the system 100 may include the input data that may be coupled to the information processing engine 105.
  • the natural language understanding engine 115 may be intercoupled with the information processing engine 105 and the modern Al technique engine 120.
  • the semantic analysis engine 125 may be coupled with the modern Al technique engine 120.
  • the rapid translation module 130 may be coupled with the semantic analysis engine 125.
  • the data repository engine 135 maybe coupled with the information processing engine 105.
  • the graph database 140 may be coupled with the data repository engine 135 and the information processing engine 105.
  • the transformation engine 150 may be coupled to the information processing engine 105 and intercoupled with the graph database 140 and the data repository engine 135. Furthermore, the transformation engine 150 may be coupled with the rapid translation engine 130.
  • the feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. Additionally, the feedback mechanism engine 145 may be configured to provide output data. In an aspect, the output data may be displayed via the display engine 165.
  • the system 100 may include the input data.
  • the input data may include unstructured/structured data of healthcare department and published research documents of organizations.
  • the input data may include, but not limited to medical imaging and signal data, handwritten, or dictated notes.
  • the input data may be fed to the information processing engine 105.
  • the information processing engine 105 may enable the natural language understanding engine 115 that interacts to the data repository engine 135 by performing natural language understanding.
  • the information processing engine 105 may apply the data to the modem Al technique engine 120 for input understanding.
  • the semantic analysis engine 125 and the rapid translation engine 130 is followed to the modern Al technique engine 120.
  • the information processing engine 105 may execute the relevant information extraction from the graph database 140 followed by the transformation engine 150.
  • the transformation engine 150 may perform loading operations as inferred from the initial query analysis.
  • the transformation engine 150 may break down the complex joins into graph traversals.
  • the transformation engine 150 may aggregate the necessary information in parallel, to facilitate multi-user access.
  • the modem Al technique engine 120 may perform access roles in the query interface that are used alongside the analysis of the patterns in the query, to capture the semantic level and detail needed in the result by the semantic analysis engine 125.
  • the information processing engine 105 may apply modem artificial intelligence techniques by way of the modern Al technique engine 120 that includes, but not limited to enable input query understanding, restructuring of the data, selecting optimal query path, result extraction, transformation, visualization and response preparation, and iterative learning to refine and improve the correctness, quality, and relevance of the output presentation.
  • the information processing engine 105 may represent response obtained from the query process thereby the presented information that includes, but not limited to the summaries, drilldowns, and visualizations, matches the cognitive level and detail demanded by the query.
  • the result is subjected to abstraction and transformation by the transformation engine 150, to generate visual elements that includes, but not limited to tabulations, summaries, action keys to further interact with the result obtained in the output data.
  • the feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. The output data may be displayed on the display engine 165. In an aspect, the feedback mechanism engine 145 may collect explicit feedback on the relevance of the result. In an aspect, the information processing engine 105 may provide a process that includes, but not limited to compose and store queries. Furthermore, the information processing engine 105 may fixe schedules for periodic refreshing of the query in the background.
  • the output data may produce the result that is packaged as a report and stored, with record of timestamps.
  • the display engine 165 may display the output data and further provides a means to select a collection of such stored queries and may prepare a dashboard.
  • the display engine 165 may provide means to enter rules or conditions that are cross-checked with the produced result to evaluate correctness, for instance that includes, but not limited to the percentages adding up to 100 and drilldowns adding up to the total.
  • a confidence score is attributed to each stored result, to assist in refining and correcting errors at the query level or considered as feedback for improvement of the query understanding engine.
  • the information processing engine 105 may consume structured data, perform automatic reorganization and understanding. Furthermore, the information processing engine 105 may interact with a corpus or repository of published research documents, specifically that are related to environmental impact of emissions, emission factors and other statistics and reports.
  • the information processing engine 105 may provide quantifiers on quality of data source, gathering complimentary or contradictory information from other relevant sources, and indicating the credibility of the reported information, specifically related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.
  • the information may be used for providing recommendations for reducing in-house emissions according to the Oxford offsetting principles.
  • FIG. 2 a flowchart of a method 200 for extracting and mining the information through advanced information retrieval framework in accordance with an aspect of the present disclosure.
  • the input data may be received from the data repository engine 135 to transmit the data to the information processing engine 105.
  • knowledge graph may be constructed upon the data repository engine 135 by way of the information processing engine 105.
  • the query may be interpreted following to the initial query that may be processed by way of the modern Al technique engine 120.
  • graph traversals refer to the process of visiting (checking and/or updating) each vertex in a graph.
  • the traversals may be classified by the order in which the vertices are visited.
  • the data may be synced by periodic synchronization through the information processing engine 105.
  • the associations data may be recomputed to replicate the updated relations.
  • the relevant information may be extracted from the graph database 140 by way of the information processing engine 105.
  • operations inferred by way of the initial query analysis may be transformed, abstracted, and loaded.
  • complex joins may be broken down into graph traversals.
  • the necessary information may be aggregated in parallel to facilitate multi-user access.
  • the output data may be displayed via the display engine 165.
  • a background method of inferring the correctness of the result is performed, which is based on the usage patterns following the result generation and follow up queries.
  • the information processing engine 105 may include a module to log frequently posed questions or analyses thereby reducing the initial keying effort.
  • FIG. 3 a block diagram of a system architecture 300 of the system 100 in accordance with an aspect of present disclosure.
  • the system architecture 300 may include an input/output device 334, a graphics processing engine (GPU) 832, a display 330, a processor (s) 304, a main memory 308, a display interface 302, a communication infrastructure 306, an encryption/decryption processor 326, a communication interface 320, a communication path 326, a first removable storage engine 318, a second removable storage engine 322, a secondary memory 810.
  • the secondary memory 810 may further include a hard disk drive 312, a removable storage drive 314 and an interface 320.
  • the system architecture 300 may include the input/output device 334 that may be coupled to the graphics processing engine (GPU) 832.
  • the graphics processing engine (GPU) 832 may be coupled with the display 330.
  • the display 330 may be coupled with the display interface 302.
  • the display interface 302 may be coupled with the communication infrastructure 306.
  • the processor(s) 304 may be coupled with the communication infrastructure 306.
  • the main memory 308 may be coupled with the communication infrastructure 306.
  • the encryption/decryption processor 326 may be coupled with the communication infrastructure 306.
  • the secondary memory 810 may be coupled with the communication infrastructure 306.
  • the secondary memory 810 may include, but not limited to the hard disk drive 312, the removable storage drive 314 and the interface 320.
  • the removable storage drive 314 may be coupled with the first removable storage engine 318.
  • the interface 320 may be coupled with the second removable storage engine 322.
  • the communications interface may be coupled with the communication infrastructure 306 and the communication path 326.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed is a system 100 for information extraction and mining from the records that includes an information processing engine 105, a natural language understanding engine 115, a modern AI (Artificial Intelligence) technique engine 120, a semantic analysis engine 125, a rapid translation engine 130, a data repository engine 135, a graph database 140, a feedback mechanism engine 145, a transformation engine 150 and a display engine 165.

Description

SYSTEM FOR INFORMATION EXTRACTION AND MINING AND METHOD, AND COMPUTER PROGRAM PRODUCT THEREOF
FIEED OF INVENTION
The present aspect generally relates to a system for information extraction and mining and more particularly to a system for information mining and extraction via knowledge graph construction and advanced information retrieval framework from insights from the data records and research documents.
BACKGROUND OF THE INVENTION
Healthcare administrative data are well-structured, rich source of information on patient journeys, evidenced care pathways, capabilities and limitations of providers, access patterns, costs, disparities, trends, and insights. Unlike unstructured data such as medical imaging, signal data, handwritten or dictated notes, which require expert interpretation or modem Al techniques, administrative data in healthcare are well structured and amenable for deriving insights for various stakeholders using business intelligence (BI) techniques.
However, the information and transactions are modelled and stored in tabular databases. These are structured for efficiency of entity storage and consistency through normalization. While this is good for administrative operations (create - read-update-delete), this structure demands high runtime complexity for multirelationship querying and deriving insights with modern BI tools.
Graph databases provide an alternative semantic mechanism for storing relationship graphs, but programmatic access to the graph is cumbersome, with alternative non-standard declarative textual query syntax. Object-graph-mappers provide means to abstract the syntactic variation through annotations of entities and relationships in object-oriented programming. These are, however, programmatic tools, requiring specific syntactic knowledge, along with domain knowledge and application development skills, to implement analytics models and dashboards. Therefore, there remains a need for a system and a method that encounters knowledge graph construction and advanced information retrieval framework while extracting and mining the information from the records.
SUMMARY
One aspect of the present disclosure provides, a system for information extraction and mining through advanced information retrieval framework to perform query analysis is provided. The system includes an information processing engine configured with a natural understanding engine to process, analyse and perform various query operations upon a data, wherein the information processing engine configured with a data repository engine and a graph database to access and store the data. The system further includes a transformation engine communicating with the information processing engine to perform loading operations as inferred from the initial query analysis. The system further includes a modern artificial intelligence (Al) technique module communicating with the natural language understanding engine to perform access roles in the query interface along with the analysis of the patterns in the query. The system further includes a semantic analysis module communicating with the modem Al technique module to capture detailed data of semantic level result. The system further includes a rapid translation module communicating with the semantic analysis module to perform analysis to the entities involved in the query, relationships, and associations in the specific information schema. The system further includes a feedback mechanism engine communicating with the rapid translation module and the information processing engine, wherein the feedback mechanism reverts the data to the information processing engine to collect the explicit feedback on the relevance of the result. The system further includes a display engine configured with the feedback mechanism engine to display the output data, wherein the display engine further provides a means to select a collection of stored queries and prepares a dashboard.
In some aspects of the present disclosure, the information processing engine further executes the relevant information extraction from the graph database. In some aspects of the present disclosure, the information processing engine further enables the natural language understanding engine that interacts to the data repository engine by performing natural language understanding.
In some aspects of the present disclosure, the information processing engine fixes schedules for periodic refreshing of the query in the background.
In some aspects of the present disclosure, the information processing engine provides quantifiers on quality of data source and gathers complimentary or contradictory information from other relevant sources and indicating the credibility of the reported information related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.
In some aspects of the present disclosure, the data repository engine constructs a knowledge graph by the information processing engine.
In some aspects of the present disclosure, the transformation engine disintegrates the complex joins into graph traversals and aggregates the necessary information in parallel to facilitate multi-user access.
In some aspects of the present disclosure, the modern artificial intelligence (Al) technique module performs the initial query analysis.
In some aspects of the present disclosure, a method for extracting and mining the information through advanced information retrieval framework to perform query analysis is provided.
Second aspect of the present disclosure provides that the input data is received by the data repository engine through a user to transmit the data to an information processing engine. Thereafter, the data repository engine constructs a knowledge graph to perform query analysis via the information processing engine. Thereafter, the query is implemented using graph traversals that is stored in the graph database. Thereafter, the data is synced by the periodic synchronization through the information processing engine. Thereafter, the data is recomputed to replicate the updated relations. Thereafter, the relevant information is executed from the graph database by the information processing engine. The data is transformed via the transformation engine. Furthermore, the data is abstracted and loading operations is inferred by the initial query analysis. The data is disintegrated to complex joins into graph traversals. Thereafter, the necessary information is aggregated in parallel to facilitate multi-user access. Furthermore, the output data is displayed via the display engine to the user.
In some aspects of the present disclosure, the information processing engine further includes a module to log frequently posed questions or analyses thereby reducing the initial keying effort.
BREIF DESCRIPTION OF THE DRAWINGS
Other objects, features, and advantages of the aspect will be apparent from the following description when read with reference to the accompanying drawings. In the drawings, wherein like reference numerals denote corresponding parts throughout the several views:
The diagrams are for illustration only, which thus is not a limitation of the present disclosure, and wherein:
FIG. 1 illustrates a block diagram of a system for information extraction and mining from the records, in accordance with an aspect of the present aspect;
FIG. 2 illustrates a flowchart of a method for extracting and mining the information through advanced information retrieval framework, in accordance with an aspect of the aspect; and
FIGG illustrates block diagram of a system architecture of the system of Fig. 1, in accordance with an aspect of the present aspect.
To facilitate understanding, like reference numerals have been used, where possible to designate like elements common to the figures. DETAILED DESCRIPTION
The aspects herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting aspects that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the aspects herein. The examples used herein are intended merely to facilitate an understanding of ways in which the aspects herein may be practiced and to further enable those of skill in the art to practice the aspects herein. Accordingly, the examples should not be construed as limiting the scope of the aspects herein.
The examples used herein are intended merely to facilitate an understanding of ways in which the aspects herein may be practiced and to further enable those of skill in the art to practice the aspects herein. Accordingly, the examples should not be construed as limiting the scope of the aspects herein.
Throughout the prior arts, there remains a need to develop a system to monitor, assess and analyse a framework for greenhouse gasses emission monitoring.
In an aspect, the term “input data” or “data” refers input data that is including but not limited to lists of medical imaging and signal data, handwritten, or dictated notes, patient’s information, doctor’s information, and prescription slips.
Referring to FIG. 1, a block diagram of a system 100 for information extraction and mining from the records, in accordance with an aspect of the present aspect.
The system 100 may include an information processing engine 105, a natural language understanding engine 115, a modern Al (Artificial Intelligence) technique engine 120, a semantic analysis engine 125, a rapid translation engine 130, a data repository engine 135, a graph database 140, a feedback mechanism engine 145, a transformation engine 150 and a display engine 165. The system 100 may include the input data that may be coupled to the information processing engine 105. The natural language understanding engine 115 may be intercoupled with the information processing engine 105 and the modern Al technique engine 120. The semantic analysis engine 125 may be coupled with the modern Al technique engine 120. The rapid translation module 130 may be coupled with the semantic analysis engine 125. The data repository engine 135 maybe coupled with the information processing engine 105. The graph database 140 may be coupled with the data repository engine 135 and the information processing engine 105. The transformation engine 150 may be coupled to the information processing engine 105 and intercoupled with the graph database 140 and the data repository engine 135. Furthermore, the transformation engine 150 may be coupled with the rapid translation engine 130. The feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. Additionally, the feedback mechanism engine 145 may be configured to provide output data. In an aspect, the output data may be displayed via the display engine 165.
The system 100 may include the input data. The input data may include unstructured/structured data of healthcare department and published research documents of organizations. In an aspect, the input data may include, but not limited to medical imaging and signal data, handwritten, or dictated notes. The input data may be fed to the information processing engine 105. The information processing engine 105 may enable the natural language understanding engine 115 that interacts to the data repository engine 135 by performing natural language understanding. In an aspect, the information processing engine 105 may apply the data to the modem Al technique engine 120 for input understanding.
The semantic analysis engine 125 and the rapid translation engine 130 is followed to the modern Al technique engine 120. The information processing engine 105 may execute the relevant information extraction from the graph database 140 followed by the transformation engine 150. The transformation engine 150 may perform loading operations as inferred from the initial query analysis. In an aspect, the transformation engine 150 may break down the complex joins into graph traversals. In another aspect, the transformation engine 150 may aggregate the necessary information in parallel, to facilitate multi-user access.
In an aspect, the modem Al technique engine 120 may perform access roles in the query interface that are used alongside the analysis of the patterns in the query, to capture the semantic level and detail needed in the result by the semantic analysis engine 125.
In another aspect, the information processing engine 105 may apply modem artificial intelligence techniques by way of the modern Al technique engine 120 that includes, but not limited to enable input query understanding, restructuring of the data, selecting optimal query path, result extraction, transformation, visualization and response preparation, and iterative learning to refine and improve the correctness, quality, and relevance of the output presentation.
In another aspect, the information processing engine 105 may represent response obtained from the query process thereby the presented information that includes, but not limited to the summaries, drilldowns, and visualizations, matches the cognitive level and detail demanded by the query. The result is subjected to abstraction and transformation by the transformation engine 150, to generate visual elements that includes, but not limited to tabulations, summaries, action keys to further interact with the result obtained in the output data.
The feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. The output data may be displayed on the display engine 165. In an aspect, the feedback mechanism engine 145 may collect explicit feedback on the relevance of the result. In an aspect, the information processing engine 105 may provide a process that includes, but not limited to compose and store queries. Furthermore, the information processing engine 105 may fixe schedules for periodic refreshing of the query in the background.
In an aspect, the output data may produce the result that is packaged as a report and stored, with record of timestamps.
The display engine 165 may display the output data and further provides a means to select a collection of such stored queries and may prepare a dashboard.
In an aspect, the display engine 165 may provide means to enter rules or conditions that are cross-checked with the produced result to evaluate correctness, for instance that includes, but not limited to the percentages adding up to 100 and drilldowns adding up to the total.
In an aspect, a confidence score is attributed to each stored result, to assist in refining and correcting errors at the query level or considered as feedback for improvement of the query understanding engine.
In another aspect, the information processing engine 105 may consume structured data, perform automatic reorganization and understanding. Furthermore, the information processing engine 105 may interact with a corpus or repository of published research documents, specifically that are related to environmental impact of emissions, emission factors and other statistics and reports.
In another aspect, the information processing engine 105 may provide quantifiers on quality of data source, gathering complimentary or contradictory information from other relevant sources, and indicating the credibility of the reported information, specifically related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions. The information may be used for providing recommendations for reducing in-house emissions according to the Oxford offsetting principles. Referring to FIG. 2, a flowchart of a method 200 for extracting and mining the information through advanced information retrieval framework in accordance with an aspect of the present disclosure.
At step 205, the input data may be received from the data repository engine 135 to transmit the data to the information processing engine 105.
At step 210, knowledge graph may be constructed upon the data repository engine 135 by way of the information processing engine 105. In an aspect, the query may be interpreted following to the initial query that may be processed by way of the modern Al technique engine 120.
At step 215, query execution using graph traversals may be implemented. In an aspect, graph traversals refer to the process of visiting (checking and/or updating) each vertex in a graph. The traversals may be classified by the order in which the vertices are visited.
At step 220, the data may be synced by periodic synchronization through the information processing engine 105.
At step 225, the associations data may be recomputed to replicate the updated relations.
At step 230, the relevant information may be extracted from the graph database 140 by way of the information processing engine 105.
At step 235, operations inferred by way of the initial query analysis may be transformed, abstracted, and loaded.
At step 240, complex joins may be broken down into graph traversals.
At step 245, the necessary information may be aggregated in parallel to facilitate multi-user access.
At step 250, the output data may be displayed via the display engine 165. In an aspect, a background method of inferring the correctness of the result is performed, which is based on the usage patterns following the result generation and follow up queries. In another aspect, the information processing engine 105 may include a module to log frequently posed questions or analyses thereby reducing the initial keying effort.
Referring to FIG. 3, a block diagram of a system architecture 300 of the system 100 in accordance with an aspect of present disclosure.
The system architecture 300 may include an input/output device 334, a graphics processing engine (GPU) 832, a display 330, a processor (s) 304, a main memory 308, a display interface 302, a communication infrastructure 306, an encryption/decryption processor 326, a communication interface 320, a communication path 326, a first removable storage engine 318, a second removable storage engine 322, a secondary memory 810. The secondary memory 810 may further include a hard disk drive 312, a removable storage drive 314 and an interface 320.
The system architecture 300 may include the input/output device 334 that may be coupled to the graphics processing engine (GPU) 832. The graphics processing engine (GPU) 832 may be coupled with the display 330. The display 330 may be coupled with the display interface 302. The display interface 302 may be coupled with the communication infrastructure 306. The processor(s) 304 may be coupled with the communication infrastructure 306. The main memory 308 may be coupled with the communication infrastructure 306. The encryption/decryption processor 326 may be coupled with the communication infrastructure 306. The secondary memory 810 may be coupled with the communication infrastructure 306. Furthermore, the secondary memory 810 may include, but not limited to the hard disk drive 312, the removable storage drive 314 and the interface 320. The removable storage drive 314 may be coupled with the first removable storage engine 318. The interface 320 may be coupled with the second removable storage engine 322. The communications interface may be coupled with the communication infrastructure 306 and the communication path 326.
As will be readily apparent to those skilled in the art, the present aspect may easily be produced in other specific forms without departing from its essential characteristics. The present aspect are, therefore, to be considered as merely illustrative and not restrictive, the scope being indicated by the claims rather than the foregoing description, and all changes which come within therefore intended to be embraced therein.

Claims

WE CLAIM
1. A system (100) for information extraction and mining through advanced information retrieval framework to perform query analysis, the system (100) comprises: an information processing engine (105) configured with a natural understanding engine (115) to process, analyse and perform various query operations upon a data, characterized in that the information processing engine (105) configured with a data repository engine (135) and a graph database (140) to access and store the data; a transformation engine (150) coupled with the information processing engine (105) to perform loading operations as inferred from the initial query analysis; a modem artificial intelligence (Al) technique engine (120) coupled with the natural language understanding engine (115) to perform access roles in the query interface along with the analysis of the patterns in the query; a semantic analysis engine (125) coupled with the modem Al technique engine (120) to capture detailed data of semantic level result; a rapid translation engine (130) coupled with the semantic analysis engine (125) to perform analysis to the entities involved in the query, relationships, and associations in the specific information schema; a feedback mechanism engine (145) coupled with the rapid translation engine (130) and the information processing engine (105), wherein the feedback mechanism reverts the data to the information processing engine (105) to collect the explicit feedback on the relevance of the result; and a display engine (165) configured with the feedback mechanism engine (145) to display the output data, wherein the display engine (165) further provides a means to select a collection of stored queries and prepares a dashboard;
2. The system (100) as claimed in claim 1, wherein the information processing engine (105) further executes the relevant information extraction from the graph database (140).
3. The system (100) as claimed in claim 1, wherein the information processing engine (105) enables the natural language understanding engine (115) that interacts to the data repository engine (135) by performing natural language understanding.
4. The system (100) as claimed in claim 1, wherein the information processing engine (105) fixes schedules for periodic refreshing of the query in the background.
5. The system (100) as claimed in claim 1, wherein the information processing engine (105) provides quantifiers on quality of data source and gathers complimentary or contradictory information from other relevant sources and indicating the credibility of the reported information related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.
6. The system (100) as claimed in claim 1, wherein the data repository engine (135) constructs a knowledge graph by the information processing engine (105).
7. The system (100) as claimed in claim 1, wherein the transformation engine (150) disintegrates the complex joins into graph traversals and aggregates the necessary information in parallel to facilitate multi-user access.
8. The system (100) as claimed in claim 1, wherein the modern artificial intelligence (Al) technique engine (120) performs the initial query analysis.
9. A method (200) for extracting and mining the information through advanced information retrieval framework to perform query analysis, the method comprises: receiving input data from the data repository engine (135) by a user to transmit the data to an information processing engine (105); constructing a knowledge graph upon the data repository engine (135) to perform query analysis via the information processing engine (105); implementing query execution using graph traversals, wherein a graph database (140) stores multiple graph traversals and the joins; syncing the data by periodic synchronization through the information processing engine (105); recomputing the associations data to replicate the updated relations; executing the relevant information from the graph database (140) by the information processing engine (105); transforming via transformation engine (150), abstracting, and loading operations inferred by the initial query analysis; breaking down complex joins into graph traversals; aggregating the necessary information in parallel to facilitate multi-user access; and displaying the output data via the display engine (165) to the user.
10. The claim as claimed in claim 9, wherein the information processing engine (105) includes a module to log frequently posed questions or analyses thereby reducing the initial keying effort.
PCT/IB2021/061663 2021-12-13 2021-12-13 System for information extraction and mining and method, and computer program product thereof WO2023111627A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2021/061663 WO2023111627A1 (en) 2021-12-13 2021-12-13 System for information extraction and mining and method, and computer program product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2021/061663 WO2023111627A1 (en) 2021-12-13 2021-12-13 System for information extraction and mining and method, and computer program product thereof

Publications (1)

Publication Number Publication Date
WO2023111627A1 true WO2023111627A1 (en) 2023-06-22

Family

ID=86773682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2021/061663 WO2023111627A1 (en) 2021-12-13 2021-12-13 System for information extraction and mining and method, and computer program product thereof

Country Status (1)

Country Link
WO (1) WO2023111627A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021050391A1 (en) * 2019-09-14 2021-03-18 Oracle International Corporation Machine learning (ml) infrastructure techniques
US10997244B2 (en) * 2017-07-14 2021-05-04 Phylot Inc. Method and system for identifying and discovering relationships between disparate datasets from multiple sources
US20210210170A1 (en) * 2016-11-10 2021-07-08 Precisionlife Ltd Control apparatus and method for processing data inputs in computing devices therefore

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210210170A1 (en) * 2016-11-10 2021-07-08 Precisionlife Ltd Control apparatus and method for processing data inputs in computing devices therefore
US10997244B2 (en) * 2017-07-14 2021-05-04 Phylot Inc. Method and system for identifying and discovering relationships between disparate datasets from multiple sources
WO2021050391A1 (en) * 2019-09-14 2021-03-18 Oracle International Corporation Machine learning (ml) infrastructure techniques

Similar Documents

Publication Publication Date Title
Ehrlinger et al. A survey of data quality measurement and monitoring tools
US20240070487A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
US11301467B2 (en) Systems and methods for intelligent capture and fast transformations of granulated data summaries in database engines
Howe et al. Database-as-a-service for long-tail science
US20050120001A1 (en) SQL structure analyzer
Curcin et al. Templates as a method for implementing data provenance in decision support systems
Cheah et al. Provenance analysis: Towards quality provenance
Glavic et al. Trends in explanations: Understanding and debugging data-driven systems
Yang et al. Predicting co-changes between functionality specifications and source code in behavior driven development
Samuel et al. Computational reproducibility of Jupyter notebooks from biomedical publications
Pernisch et al. Beware of the hierarchy—An analysis of ontology evolution and the materialisation impact for biomedical ontologies
Zhu et al. Restoring the executability of jupyter notebooks by automatic upgrade of deprecated apis
Barabucci et al. Measuring the quality of diff algorithms: a formalization
WO2023111627A1 (en) System for information extraction and mining and method, and computer program product thereof
Liartis et al. Searching for explanations of black-box classifiers in the space of semantic queries
Asaduzzaman Visualization and analysis of software clones
Yang et al. On Code Reuse from StackOverflow: An Exploratory Study on Jupyter Notebook
Fowler et al. Language-integrated query for temporal data
Garg et al. Example-based synthesis of static analysis rules
Kumari et al. DataSense: display agnostic data documentation
Dos Reis et al. Requirements for implementing mapping adaptation systems
Koyuncu Boosting Automated Program Repair for adoption by practitioners
Kasica et al. Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
Alspaugh Understanding Data Analysis Activity via Log Analysis
Fowler et al. Language-Integrated Query for Temporal Data (Extended version)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21967971

Country of ref document: EP

Kind code of ref document: A1