WO2023111627A1

WO2023111627A1 - System for information extraction and mining and method, and computer program product thereof

Info

Publication number: WO2023111627A1
Application number: PCT/IB2021/061663
Authority: WO
Inventors: Mohanasankar SIVAPRAKASAM; Keerthi Ram S.S
Original assignee: Indian Institute Of Technology Madras
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2023-06-22

Abstract

Disclosed is a system 100 for information extraction and mining from the records that includes an information processing engine 105, a natural language understanding engine 115, a modern AI (Artificial Intelligence) technique engine 120, a semantic analysis engine 125, a rapid translation engine 130, a data repository engine 135, a graph database 140, a feedback mechanism engine 145, a transformation engine 150 and a display engine 165.

Description

SYSTEM FOR INFORMATION EXTRACTION AND MINING AND METHOD, AND COMPUTER PROGRAM PRODUCT THEREOF

FIEED OF INVENTION

The present aspect generally relates to a system for information extraction and mining and more particularly to a system for information mining and extraction via knowledge graph construction and advanced information retrieval framework from insights from the data records and research documents.

BACKGROUND OF THE INVENTION

Healthcare administrative data are well-structured, rich source of information on patient journeys, evidenced care pathways, capabilities and limitations of providers, access patterns, costs, disparities, trends, and insights. Unlike unstructured data such as medical imaging, signal data, handwritten or dictated notes, which require expert interpretation or modem Al techniques, administrative data in healthcare are well structured and amenable for deriving insights for various stakeholders using business intelligence (BI) techniques.

However, the information and transactions are modelled and stored in tabular databases. These are structured for efficiency of entity storage and consistency through normalization. While this is good for administrative operations (create - read-update-delete), this structure demands high runtime complexity for multirelationship querying and deriving insights with modern BI tools.

Graph databases provide an alternative semantic mechanism for storing relationship graphs, but programmatic access to the graph is cumbersome, with alternative non-standard declarative textual query syntax. Object-graph-mappers provide means to abstract the syntactic variation through annotations of entities and relationships in object-oriented programming. These are, however, programmatic tools, requiring specific syntactic knowledge, along with domain knowledge and application development skills, to implement analytics models and dashboards. Therefore, there remains a need for a system and a method that encounters knowledge graph construction and advanced information retrieval framework while extracting and mining the information from the records.

SUMMARY

One aspect of the present disclosure provides, a system for information extraction and mining through advanced information retrieval framework to perform query analysis is provided. The system includes an information processing engine configured with a natural understanding engine to process, analyse and perform various query operations upon a data, wherein the information processing engine configured with a data repository engine and a graph database to access and store the data. The system further includes a transformation engine communicating with the information processing engine to perform loading operations as inferred from the initial query analysis. The system further includes a modern artificial intelligence (Al) technique module communicating with the natural language understanding engine to perform access roles in the query interface along with the analysis of the patterns in the query. The system further includes a semantic analysis module communicating with the modem Al technique module to capture detailed data of semantic level result. The system further includes a rapid translation module communicating with the semantic analysis module to perform analysis to the entities involved in the query, relationships, and associations in the specific information schema. The system further includes a feedback mechanism engine communicating with the rapid translation module and the information processing engine, wherein the feedback mechanism reverts the data to the information processing engine to collect the explicit feedback on the relevance of the result. The system further includes a display engine configured with the feedback mechanism engine to display the output data, wherein the display engine further provides a means to select a collection of stored queries and prepares a dashboard.

In some aspects of the present disclosure, the information processing engine further executes the relevant information extraction from the graph database. In some aspects of the present disclosure, the information processing engine further enables the natural language understanding engine that interacts to the data repository engine by performing natural language understanding.

In some aspects of the present disclosure, the information processing engine fixes schedules for periodic refreshing of the query in the background.

In some aspects of the present disclosure, the information processing engine provides quantifiers on quality of data source and gathers complimentary or contradictory information from other relevant sources and indicating the credibility of the reported information related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.

In some aspects of the present disclosure, the data repository engine constructs a knowledge graph by the information processing engine.

In some aspects of the present disclosure, the transformation engine disintegrates the complex joins into graph traversals and aggregates the necessary information in parallel to facilitate multi-user access.

In some aspects of the present disclosure, the modern artificial intelligence (Al) technique module performs the initial query analysis.

In some aspects of the present disclosure, a method for extracting and mining the information through advanced information retrieval framework to perform query analysis is provided.

Second aspect of the present disclosure provides that the input data is received by the data repository engine through a user to transmit the data to an information processing engine. Thereafter, the data repository engine constructs a knowledge graph to perform query analysis via the information processing engine. Thereafter, the query is implemented using graph traversals that is stored in the graph database. Thereafter, the data is synced by the periodic synchronization through the information processing engine. Thereafter, the data is recomputed to replicate the updated relations. Thereafter, the relevant information is executed from the graph database by the information processing engine. The data is transformed via the transformation engine. Furthermore, the data is abstracted and loading operations is inferred by the initial query analysis. The data is disintegrated to complex joins into graph traversals. Thereafter, the necessary information is aggregated in parallel to facilitate multi-user access. Furthermore, the output data is displayed via the display engine to the user.

In some aspects of the present disclosure, the information processing engine further includes a module to log frequently posed questions or analyses thereby reducing the initial keying effort.

BREIF DESCRIPTION OF THE DRAWINGS

Other objects, features, and advantages of the aspect will be apparent from the following description when read with reference to the accompanying drawings. In the drawings, wherein like reference numerals denote corresponding parts throughout the several views:

The diagrams are for illustration only, which thus is not a limitation of the present disclosure, and wherein:

FIG. 1 illustrates a block diagram of a system for information extraction and mining from the records, in accordance with an aspect of the present aspect;

FIG. 2 illustrates a flowchart of a method for extracting and mining the information through advanced information retrieval framework, in accordance with an aspect of the aspect; and

FIGG illustrates block diagram of a system architecture of the system of Fig. 1, in accordance with an aspect of the present aspect.

To facilitate understanding, like reference numerals have been used, where possible to designate like elements common to the figures. DETAILED DESCRIPTION

The aspects herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting aspects that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the aspects herein. The examples used herein are intended merely to facilitate an understanding of ways in which the aspects herein may be practiced and to further enable those of skill in the art to practice the aspects herein. Accordingly, the examples should not be construed as limiting the scope of the aspects herein.

The examples used herein are intended merely to facilitate an understanding of ways in which the aspects herein may be practiced and to further enable those of skill in the art to practice the aspects herein. Accordingly, the examples should not be construed as limiting the scope of the aspects herein.

Throughout the prior arts, there remains a need to develop a system to monitor, assess and analyse a framework for greenhouse gasses emission monitoring.

In an aspect, the term “input data” or “data” refers input data that is including but not limited to lists of medical imaging and signal data, handwritten, or dictated notes, patient’s information, doctor’s information, and prescription slips.

Referring to FIG. 1, a block diagram of a system 100 for information extraction and mining from the records, in accordance with an aspect of the present aspect.

The system 100 may include an information processing engine 105, a natural language understanding engine 115, a modern Al (Artificial Intelligence) technique engine 120, a semantic analysis engine 125, a rapid translation engine 130, a data repository engine 135, a graph database 140, a feedback mechanism engine 145, a transformation engine 150 and a display engine 165. The system 100 may include the input data that may be coupled to the information processing engine 105. The natural language understanding engine 115 may be intercoupled with the information processing engine 105 and the modern Al technique engine 120. The semantic analysis engine 125 may be coupled with the modern Al technique engine 120. The rapid translation module 130 may be coupled with the semantic analysis engine 125. The data repository engine 135 maybe coupled with the information processing engine 105. The graph database 140 may be coupled with the data repository engine 135 and the information processing engine 105. The transformation engine 150 may be coupled to the information processing engine 105 and intercoupled with the graph database 140 and the data repository engine 135. Furthermore, the transformation engine 150 may be coupled with the rapid translation engine 130. The feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. Additionally, the feedback mechanism engine 145 may be configured to provide output data. In an aspect, the output data may be displayed via the display engine 165.

The system 100 may include the input data. The input data may include unstructured/structured data of healthcare department and published research documents of organizations. In an aspect, the input data may include, but not limited to medical imaging and signal data, handwritten, or dictated notes. The input data may be fed to the information processing engine 105. The information processing engine 105 may enable the natural language understanding engine 115 that interacts to the data repository engine 135 by performing natural language understanding. In an aspect, the information processing engine 105 may apply the data to the modem Al technique engine 120 for input understanding.

The semantic analysis engine 125 and the rapid translation engine 130 is followed to the modern Al technique engine 120. The information processing engine 105 may execute the relevant information extraction from the graph database 140 followed by the transformation engine 150. The transformation engine 150 may perform loading operations as inferred from the initial query analysis. In an aspect, the transformation engine 150 may break down the complex joins into graph traversals. In another aspect, the transformation engine 150 may aggregate the necessary information in parallel, to facilitate multi-user access.

In an aspect, the modem Al technique engine 120 may perform access roles in the query interface that are used alongside the analysis of the patterns in the query, to capture the semantic level and detail needed in the result by the semantic analysis engine 125.

In another aspect, the information processing engine 105 may apply modem artificial intelligence techniques by way of the modern Al technique engine 120 that includes, but not limited to enable input query understanding, restructuring of the data, selecting optimal query path, result extraction, transformation, visualization and response preparation, and iterative learning to refine and improve the correctness, quality, and relevance of the output presentation.

In another aspect, the information processing engine 105 may represent response obtained from the query process thereby the presented information that includes, but not limited to the summaries, drilldowns, and visualizations, matches the cognitive level and detail demanded by the query. The result is subjected to abstraction and transformation by the transformation engine 150, to generate visual elements that includes, but not limited to tabulations, summaries, action keys to further interact with the result obtained in the output data.

The feedback mechanism engine 145 may be coupled with the rapid translation engine 130. Moreover, the feedback mechanism engine 145 may be coupled with the information processing engine 105. The output data may be displayed on the display engine 165. In an aspect, the feedback mechanism engine 145 may collect explicit feedback on the relevance of the result. In an aspect, the information processing engine 105 may provide a process that includes, but not limited to compose and store queries. Furthermore, the information processing engine 105 may fixe schedules for periodic refreshing of the query in the background.

In an aspect, the output data may produce the result that is packaged as a report and stored, with record of timestamps.

The display engine 165 may display the output data and further provides a means to select a collection of such stored queries and may prepare a dashboard.

In an aspect, the display engine 165 may provide means to enter rules or conditions that are cross-checked with the produced result to evaluate correctness, for instance that includes, but not limited to the percentages adding up to 100 and drilldowns adding up to the total.

In an aspect, a confidence score is attributed to each stored result, to assist in refining and correcting errors at the query level or considered as feedback for improvement of the query understanding engine.

In another aspect, the information processing engine 105 may consume structured data, perform automatic reorganization and understanding. Furthermore, the information processing engine 105 may interact with a corpus or repository of published research documents, specifically that are related to environmental impact of emissions, emission factors and other statistics and reports.

In another aspect, the information processing engine 105 may provide quantifiers on quality of data source, gathering complimentary or contradictory information from other relevant sources, and indicating the credibility of the reported information, specifically related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions. The information may be used for providing recommendations for reducing in-house emissions according to the Oxford offsetting principles. Referring to FIG. 2, a flowchart of a method 200 for extracting and mining the information through advanced information retrieval framework in accordance with an aspect of the present disclosure.

At step 205, the input data may be received from the data repository engine 135 to transmit the data to the information processing engine 105.

At step 210, knowledge graph may be constructed upon the data repository engine 135 by way of the information processing engine 105. In an aspect, the query may be interpreted following to the initial query that may be processed by way of the modern Al technique engine 120.

At step 215, query execution using graph traversals may be implemented. In an aspect, graph traversals refer to the process of visiting (checking and/or updating) each vertex in a graph. The traversals may be classified by the order in which the vertices are visited.

At step 220, the data may be synced by periodic synchronization through the information processing engine 105.

At step 225, the associations data may be recomputed to replicate the updated relations.

At step 230, the relevant information may be extracted from the graph database 140 by way of the information processing engine 105.

At step 235, operations inferred by way of the initial query analysis may be transformed, abstracted, and loaded.

At step 240, complex joins may be broken down into graph traversals.

At step 245, the necessary information may be aggregated in parallel to facilitate multi-user access.

At step 250, the output data may be displayed via the display engine 165. In an aspect, a background method of inferring the correctness of the result is performed, which is based on the usage patterns following the result generation and follow up queries. In another aspect, the information processing engine 105 may include a module to log frequently posed questions or analyses thereby reducing the initial keying effort.

Referring to FIG. 3, a block diagram of a system architecture 300 of the system 100 in accordance with an aspect of present disclosure.

The system architecture 300 may include an input/output device 334, a graphics processing engine (GPU) 832, a display 330, a processor (s) 304, a main memory 308, a display interface 302, a communication infrastructure 306, an encryption/decryption processor 326, a communication interface 320, a communication path 326, a first removable storage engine 318, a second removable storage engine 322, a secondary memory 810. The secondary memory 810 may further include a hard disk drive 312, a removable storage drive 314 and an interface 320.

The system architecture 300 may include the input/output device 334 that may be coupled to the graphics processing engine (GPU) 832. The graphics processing engine (GPU) 832 may be coupled with the display 330. The display 330 may be coupled with the display interface 302. The display interface 302 may be coupled with the communication infrastructure 306. The processor(s) 304 may be coupled with the communication infrastructure 306. The main memory 308 may be coupled with the communication infrastructure 306. The encryption/decryption processor 326 may be coupled with the communication infrastructure 306. The secondary memory 810 may be coupled with the communication infrastructure 306. Furthermore, the secondary memory 810 may include, but not limited to the hard disk drive 312, the removable storage drive 314 and the interface 320. The removable storage drive 314 may be coupled with the first removable storage engine 318. The interface 320 may be coupled with the second removable storage engine 322. The communications interface may be coupled with the communication infrastructure 306 and the communication path 326.

As will be readily apparent to those skilled in the art, the present aspect may easily be produced in other specific forms without departing from its essential characteristics. The present aspect are, therefore, to be considered as merely illustrative and not restrictive, the scope being indicated by the claims rather than the foregoing description, and all changes which come within therefore intended to be embraced therein.

Claims

WE CLAIM

1. A system (100) for information extraction and mining through advanced information retrieval framework to perform query analysis, the system (100) comprises: an information processing engine (105) configured with a natural understanding engine (115) to process, analyse and perform various query operations upon a data, characterized in that the information processing engine (105) configured with a data repository engine (135) and a graph database (140) to access and store the data; a transformation engine (150) coupled with the information processing engine (105) to perform loading operations as inferred from the initial query analysis; a modem artificial intelligence (Al) technique engine (120) coupled with the natural language understanding engine (115) to perform access roles in the query interface along with the analysis of the patterns in the query; a semantic analysis engine (125) coupled with the modem Al technique engine (120) to capture detailed data of semantic level result; a rapid translation engine (130) coupled with the semantic analysis engine (125) to perform analysis to the entities involved in the query, relationships, and associations in the specific information schema; a feedback mechanism engine (145) coupled with the rapid translation engine (130) and the information processing engine (105), wherein the feedback mechanism reverts the data to the information processing engine (105) to collect the explicit feedback on the relevance of the result; and a display engine (165) configured with the feedback mechanism engine (145) to display the output data, wherein the display engine (165) further provides a means to select a collection of stored queries and prepares a dashboard;

2. The system (100) as claimed in claim 1, wherein the information processing engine (105) further executes the relevant information extraction from the graph database (140).

3. The system (100) as claimed in claim 1, wherein the information processing engine (105) enables the natural language understanding engine (115) that interacts to the data repository engine (135) by performing natural language understanding.

4. The system (100) as claimed in claim 1, wherein the information processing engine (105) fixes schedules for periodic refreshing of the query in the background.

5. The system (100) as claimed in claim 1, wherein the information processing engine (105) provides quantifiers on quality of data source and gathers complimentary or contradictory information from other relevant sources and indicating the credibility of the reported information related to indirect emissions and technology solutions and alternative methods of estimating or measuring emissions.

6. The system (100) as claimed in claim 1, wherein the data repository engine (135) constructs a knowledge graph by the information processing engine (105).

7. The system (100) as claimed in claim 1, wherein the transformation engine (150) disintegrates the complex joins into graph traversals and aggregates the necessary information in parallel to facilitate multi-user access.

8. The system (100) as claimed in claim 1, wherein the modern artificial intelligence (Al) technique engine (120) performs the initial query analysis.

9. A method (200) for extracting and mining the information through advanced information retrieval framework to perform query analysis, the method comprises: receiving input data from the data repository engine (135) by a user to transmit the data to an information processing engine (105); constructing a knowledge graph upon the data repository engine (135) to perform query analysis via the information processing engine (105); implementing query execution using graph traversals, wherein a graph database (140) stores multiple graph traversals and the joins; syncing the data by periodic synchronization through the information processing engine (105); recomputing the associations data to replicate the updated relations; executing the relevant information from the graph database (140) by the information processing engine (105); transforming via transformation engine (150), abstracting, and loading operations inferred by the initial query analysis; breaking down complex joins into graph traversals; aggregating the necessary information in parallel to facilitate multi-user access; and displaying the output data via the display engine (165) to the user.

10. The claim as claimed in claim 9, wherein the information processing engine (105) includes a module to log frequently posed questions or analyses thereby reducing the initial keying effort.