CN110019751B - Machine learning model modification and natural language processing - Google Patents

Machine learning model modification and natural language processing Download PDF

Info

Publication number
CN110019751B
CN110019751B CN201910012993.7A CN201910012993A CN110019751B CN 110019751 B CN110019751 B CN 110019751B CN 201910012993 A CN201910012993 A CN 201910012993A CN 110019751 B CN110019751 B CN 110019751B
Authority
CN
China
Prior art keywords
triplet
mlm
input
triples
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910012993.7A
Other languages
Chinese (zh)
Other versions
CN110019751A (en
Inventor
D·巴卡雷拉
J·H·巴奈贝四世
N·劳伦斯
S·帕特尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/866,706 external-priority patent/US10606958B2/en
Priority claimed from US15/866,702 external-priority patent/US10776586B2/en
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN110019751A publication Critical patent/CN110019751A/en
Application granted granted Critical
Publication of CN110019751B publication Critical patent/CN110019751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system, computer program product, and method are provided for automating the framework of knowledge graph-based data persistence and resolving time variations and uncertainties in knowledge graphs. Natural language understanding is used with one or more Machine Learning Models (MLMs) to extract data, including entities and entity relationships, from unstructured information. The extracted data is filled into the knowledge graph. KG is used to create new and retrain existing Machine Learning Models (MLMs) because KG may change. The weighting is applied to the padded data in the form of an accuracy value. Blockchain techniques are used to populate data to ensure reliability of the data and to provide auditability to evaluate changes in the data.

Description

Machine learning model modification and natural language processing
Background
The present embodiment relates to natural language processing. More particularly, the present embodiments relate to an artificial intelligence platform that communicates and utilizes memory in natural language processing.
In the field of artificial intelligence computer systems, natural language systems (e.g., IBM Watson TM Artificial intelligence computer systems and other natural language question-answering systems) process natural language based on knowledge acquired by the system. To process natural language, the system may be trained using data from a database or knowledge system, but the results obtained may be incorrect or inaccurate for various reasons related to the specificity of language structure and human reasoning or new training data.
Machine learning is a subset of Artificial Intelligence (AI) that utilizes algorithms to learn from data and create predictions based on the data. AI refers to the intelligence that a machine can make decisions based on information, which can maximize the chance of success in a given topic. More specifically, the AI can learn from the dataset to solve the problem and provide relevant suggestions. AI is a subset of cognitive computing, which refers to a system that learns on a large scale, purposefully infers, and interacts naturally with humans. Cognitive computing is a mixture of computer science and cognitive science. Cognitive computing utilizes self-teaching algorithms, using data minimization, visual recognition, and natural language processing to solve problems and optimize the human process.
The cognitive system is non-deterministic in nature. In particular, data output from the cognitive system is susceptible to information provided and used as input. For example, when deploying a new machine learning model, there is no guarantee that the system will extract the same entities as before. The new model may adversely affect the results of the previous model. Similarly, errors introduced through a document may result in extracting incorrect data and providing the incorrect data as output. Thus, there is a need to create deterministic behavior in cognitive systems.
Disclosure of Invention
Embodiments include systems, computer program products, and methods for natural language processing of deterministic data for cognitive systems.
In one aspect, a system is provided having a processing unit operably coupled to a memory, wherein an artificial intelligence platform is in communication with the processing unit and the memory. A knowledge engine in communication with the processing unit is provided to train a machine learning model (machine learning model, MLM). More specifically, the knowledge engine identifies or otherwise selects a first MLM aligned with a knowledge domain expressed in a first knowledge graph, receives natural language input to query for the first knowledge graph for extraction of triples, and uses the identified MLM for a second knowledge graph for extraction of one or more triples. Each triplet includes a topic, an object, and a relationship. For each triplet, a BC identifier is obtained and a triplet accuracy value is identified in the corresponding BC ledger. The knowledge engine detects the modification by comparing the extracted triples and evaluates the detected modification, including evaluating accuracy of the modification using the obtained BC identifier. The first MLM is dynamically modified based on the modification as a structural modification.
In another aspect, a computer program product is provided for processing natural language. The computer program product comprises a computer readable storage device having program code executable by a processing unit. Program code is provided to select a first MLM from a library of MLMs, wherein the selected MLM is aligned with a knowledge domain expressed in the first knowledge graph. Program code is further provided to receive natural language input, query the input for a first knowledge graph, and extract one or more triples from the first knowledge graph. In addition, program code is provided to use the selected MLM for the second knowledge graph and to extract one or more triples from the second knowledge graph. Each triplet includes a body, an object, and a relationship therebetween. For each extracted triplet, program code obtains a BC identifier and identifies a triplet accuracy value from the corresponding BC ledger. Program code detects a modification of the first knowledge graph based on the extracted triples, evaluates the modification, and dynamically modifies the first MLM based on the modification as a structural modification.
In another aspect, a method for processing natural language is provided. The method includes selecting a first MLM that is aligned with a knowledge domain expressed in a first knowledge graph. The first MLM is selected from a natural language processing library of two or more MLMs. After MLM selection, natural language input is received, a query is processed against a first knowledge graph, and one or more triples are extracted from the first knowledge graph. The selected MLM is applied to a second knowledge graph that is different from the first knowledge graph. One or more triples are extracted from the second knowledge graph. Each extracted triplet includes a topic, an object, and an association. For each extracted triplet, a BC identifier associated with the triplet is obtained and a triplet accuracy value from the corresponding BC ledger is identified. After applying the MLM, a modification of the first knowledge graph is detected, wherein the modification is a content and/or structure change. Evaluating the detected modification, which includes evaluating accuracy of the modification using the BC identifier, and dynamically modifying the first MLM in response to the structural modification.
These and other features and advantages will become apparent from the following detailed description of the presently preferred embodiments, read in conjunction with the accompanying drawings.
Brief description of the drawings
The accompanying drawings form a part of the specification. Features shown in the drawings are intended to be illustrative of only some, and not all, embodiments unless explicitly stated otherwise.
FIG. 1 depicts a system diagram showing a schematic diagram of a natural language processing system.
FIG. 2 depicts a block diagram showing the NL processing tools shown in FIG. 1 and their associated APIs.
Fig. 3 depicts a flow chart showing a process of populating a Knowledge Graph (KG) from a Natural Language (NL) of a Natural Language Processing (NLP) system.
FIG. 4 depicts a flowchart that shows a process for creating a new triplet from extracted data.
Fig. 5A and 5B depict a flow chart illustrating a process of extracting triples from NLP output.
Fig. 6 depicts a flowchart showing a process for partitioning KG.
Fig. 7 depicts a flowchart showing a process for linking two KG.
Fig. 8A and 8B depict a flow chart illustrating a process for enhancing query input using a Machine Learning Model (MLM).
Fig. 9 depicts a flowchart showing a process of training an existing MLM.
Fig. 10 depicts a flowchart showing a process of progressive and adaptive MLM configuration.
Detailed Description
It will be readily understood that the components of the present embodiments, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the apparatus, system, method, and computer program product of the present embodiments, as illustrated in the accompanying drawings, is not intended to limit the scope of the claimed embodiments, but is merely representative of selected embodiments.
Reference in the specification to "an embodiment," "one embodiment," or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "a select embodiment," "in one embodiment," or "in an embodiment" in various places throughout this specification are not necessarily referring to the same embodiment.
The illustrated embodiments will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only as an example, and simply illustrates certain selected embodiments of devices, systems, and processes consistent with the embodiments claimed herein.
The ontology serves as a structural framework for organizing information and concepts. Natural language understanding (natural language understanding, NLU) is a subset of natural language processing (natural language processing, NLP). The NLU uses algorithms to convert speech into structured ontologies. In one embodiment, the ontology is built from the classifications of NLU outputs. NLU provides the definition needed to construct an ontology from classes, subclasses, domains, scopes, data attributes, and object attributes. The ontology individual is mapped to the object. Processing the same or similar documents provides the data needed to build the ontology (also referred to as the initial ontology). The ontology is defined by a Machine Learning Model (MLM) which is applied to the data store by a Knowledge Graph (KG) manager; an ontology is constructed using the output of the relevant NLP service. More specifically, the ontology is generated by the fact or mention that the MLM generates. The fact or mention of the individuals that make up the ontology. In one embodiment, the ontology is in the form of KG, where facts or references are represented as nodes in the graph. The structure of KG may remain unchanged while allowing information to be added or deleted. Similarly, the ontology may be used to create new and retrain existing MLMs. In one embodiment, when KG is modified, new entities and relationships are implemented and used for training of the automated MLM; MLM becomes dynamic and advanced. Thus, the ontologies represented by KG and MLM are interrelated.
Referring to FIG. 1, a schematic diagram of a natural language processing system (100) is depicted. As shown, a server (110) is provided that communicates with a plurality of computing devices (180), (182), (184), (186), and (188) over a network connection (105). The server (110) is configured with a processing unit (112), the processing unit (112) being operatively coupled to the memory (114) by a bus (116). Tools in the form of knowledge engines (170) are shown locally at the server (110) and are operably coupled to the processing unit (112) and/or the memory (114). As shown, knowledge engine (170) includes one or more tools (172) - (178). Tools (172) - (178) provide natural language processing from one or more computing devices (180), (182), (184), (186), and (188) over a network (105). More specifically, computing devices (180), (182), (184), (186), and (188) communicate with each other and with other devices or components via one or more wires and/or wireless data communication links, where each communication link may include one or more of wires, routers, switches, transmitters, receivers, and the like. In the networking arrangement, the server (110) and the network connection (105) may enable natural language processing and parsing for one or more content users. Other embodiments of the server (110) may be used with components, systems, subsystems, and/or devices other than those described herein.
Tools, including knowledge engine (170), or in one embodiment, embedded therein, including KG manager (172), accuracy manager (174), BC manager (176), and MLM manager (178), may be configured to receive input from a plurality of sources, including but not limited to input from network (105), one or more knowledge graphs from node-graph data store (160) operatively coupled to a corpus of structured data (168) via interface (166), BC network (150), and library (140) of one or more Machine Learning Models (MLM). As shown, the node-graph data store (160) operates as a repository (162) of knowledge graphs, having a plurality of KGs, including KG 0 (164A),KG 1 (164B),and KG N (164N). The amount of KG shown here should not be regarded as limiting. Each KG is an expression of the ontology of the concept. More specifically, each KG (162), (164), and (166) includes a plurality of related topics and objects. In one embodiment, the associated KG is stored in an associated KG container, wherein a corpus (160) stores one or more KG containers. In one embodiment, KG may also be obtained from other sources, and thus, the depicted data store should not be considered limiting.
A plurality of computing devices (180), (182), (184), (186), and (188) in communication with the network (105) illustrate access points for content creators and content usage. Some of the computing devices may include devices for a database that stores a corpus of data as a body of information used by a knowledge engine (170), and in one embodiment tools (172) - (178), embedding deterministic behavior into the system. The network (105) may include local network connections and remote connections in various embodiments such that the knowledge engine (170) and embedded tools (172) - (178) may operate in any size environment, including local and global, such as the internet. In addition, the server (110) and knowledge engine (170) serve as a front-end system providing a variety of knowledge extracted or represented in documents, network-accessible sources, and/or structured data sources. In this way, some processes populate the server (110) with the server (110), the server (110) further including an input interface to receive the request and respond accordingly. Content creators and content users may also be available in data stores such as, but not limited to, (140) and (160), and the list of access points shown herein should not be considered limiting.
As shown, the node-graph data store (160) is operatively coupled to the server (110). The node-graph data store (160) includes a KG library (162), the KG library (162) having one or more KGs (164A) - (164N) for use by the server (110). Content users may access the system via an API management or orchestration platform (as shown and described in fig. 2) and natural language input received via the NLU input path.
As described in detail below, the server (110) and knowledge engine (170) process natural language queries using one or more machine learning models (hereinafter MLMs) to extract or store content in one or more KG's stored in the node-graph data store (160). Blockchain technology, hereinafter "BC", is utilized in content to effectively provide authenticity, e.g., the source from which data is stored or received. The MLM manager (178) serves as a tool, or in one embodiment, an API within the knowledge engine (170), and serves to create, link, and/or modify the associated MLM. As described further below, the MLM is generated, created, or modified for a particular knowledge domain. The MLM is created to extract entities and relationships from unstructured data. These models are specifically created to understand specific knowledge fields (e.g., biographical information, stock market, astronomy, etc.).
BC is represented herein as a BC network (150) in the form of a decentralised and distributed digital ledger for recording transaction histories. More specifically, BC refers to a data structure that is capable of digitally identifying and tracking transactions across a distributed computer network and sharing that information. BC effectively creates a distributed trust network by transparently and securely tracking ownership. As shown and described herein, BC and MLM manager (178), accuracy manager (174), and KG manager (172) are utilized together to integrate knowledge and natural language processing.
The server (110) may be an IBM Watson available from International Business machines corporation, armonk, N.Y. TM A system enhanced with the mechanisms of the illustrative embodiments described below. IBM Watson TM The knowledge management system inputs knowledge into Natural Language Processing (NLP). In particular, as described in detail below, when data is received, organized, and/or stored, the data may be real or spurious. The server (110) cannot distinguish itself or, more specifically, verify the accuracy of the data. As shown herein, a server (110) receives input content (102), then evaluates the input content (102) to extract features of the content (102), and then applies the features of the content (102) to a node map data store (160). Specifically, it can be obtained by IBM Watson TM The server (110) processes the received content (102), IBM Watson TM The server (110) performs an analysis to evaluate or inform the authenticity of the input content (102) using one or more inference algorithms.
To process natural language, server (110) supports NLP with an information handling system in the form of knowledge engine (170) and associated tools (172) - (178). Although shown as being included or integrated with server (110), the information handling system may be implemented as a separate computing system (e.g., 190) connected to server (110) across network (105). Wherever embodied, one or moreMLM is used to manage and process data, and more specifically, to detect and identify natural language and create or utilize deterministic output. As shown, the tools include a KG manager (172), an accuracy manager (174), a BC manager (176), and an MLM manager (178). An MLM manager (178) is shown as operably coupled to the MLM library (140) shown herein, having a plurality of MLMs, including the MLMs 0 ,(142)、MLM 1 (144) And MLM N (146) Although the number of MLMs shown and described should not be considered limiting. It should be appreciated that in one embodiment, the MLM is an algorithm for or adapted to support NLP. Although shown as being local to the server (110), the tools (170) - (178) may be embedded in the memory (114) collectively or individually.
One or more MLMs (142) - (146) operate as management data, including stored data in KG. As understood, KG is a structured ontology and not just stores data. Specifically, the knowledge engine (170) extracts data and one or more data relationships from unstructured data, creates an entry in the KG for the extracted data and data relationships, and stores the data and data relationships in the KG entry. In one embodiment, the data in KG is stored or expressed as a node, and the relationship between two data elements is expressed as an edge connecting two nodes. Similarly, in one embodiment, each node has a node level accuracy value and each relationship has a relationship accuracy value, wherein the relationship accuracy value is calculated based on the accuracy values of the two interconnected nodes. In addition to data extraction and storage, MLM(s), MLM 0 (142) assigning or otherwise specifying an accuracy value for the data stored in KG. In one embodiment, the accuracy value is a composite score consisting of solidity, source reliability, and human feedback, as described in detail below. In one embodiment, the accuracy value may include other factors or subsets of factors and, therefore, should not be considered limiting. The assigned accuracy value is stored in KG. The assigned accuracy value is also stored in an entry in the identified BC ledger. Each entry in the BC ledger has a corresponding identifier, referred to herein as BC identifier, that identifies the ledger and the classification The address of the account entry. The BC identifier is stored in KG along with the identified data and identifies the location of the corresponding BC ledger and the stored accuracy value. In one embodiment, a KG manager (172) manages storage of BC identifiers in KGs. Thus, the assigned or created accuracy value is stored in the BC and is a duplicate copy of the accuracy value in KG in the node map data store (160).
It should be understood that each KG organizes and provides structure to a large amount of data. The KG may be a separate body or, in one embodiment, the KG or KG container may include a plurality of KG linked together to reveal their relationship or association. The KG manager (172) operates to manage the structure and organization of KG. For example, managing a large KG may be too cumbersome or expensive. In this case, the KG manager (172) may partition the KG, effectively creating at least two partitions, e.g., a first KG partition and a second KG partition. KG may be partitioned based on one or more factors. For example, in one embodiment, KG may be divided by topic or sub-topic. Similarly, each fact expressed in KG has an associated accuracy value that is a combination of factors including, but not limited to, a stability indicator, a source reliability metric, and a human feedback factor. The KG manager (172) may divide KG based on the accuracy value or, in one embodiment, based on one or more factors including the accuracy value. In one embodiment, after the KG has been partitioned into at least first and second partitions, the KG manager (172) may assign one or more components of the accuracy value to each node or edge represented in the partition. For example, after a KG partition, KG manager (172) may populate and assign a first reliability value to data in the first partition, and in one embodiment KG manager (172) may further populate and assign a second reliability value to data in a second partition, the second reliability value being different from the first reliability value. Modification of one or more components of the accuracy value effectively changes the accuracy value. However, it should be appreciated that the value of one or more components of the accuracy value may change over time and, as such, the change is reflected or reflected by the associated data. Accordingly, the KG manager (172) is configured to manage data and provide structure and values for the data.
One function of the KG manager (172) is to link or merge two or more KG. Merging or linking KG is the opposite of partition KG. The functionality of merging or linking KG requires KG manager (172) to compare one or more data elements in one KG with one or more data elements in a second KG to eliminate or at least reduce the occurrence of duplicate data. As described above, each data element represented in KG has an associated combined score. The KG manager (172) may use a component, components, or accuracy value itself as a factor for data comparison and evaluation. Once merged or linked, it may be feasible or guaranteed to delete the duplicate data item. The KG manager (172) selectively removes data in the linked KG that is determined to be duplicate data. One feature of removing duplicate data is the ability to maintain a constant structure of KG. Therefore, the KG manager (172) is configured to manage the structure of the KG by managing the data indicated in the KG.
The BC manager (176) has a plurality of functions related to a machine learning environment. As described above, the BC manager (176) may be used with the MLM to maintain the authenticity of the relevant data. The BC manager (176) generates contracts for BC network interactions, provides provenance, retrieves BC information, and manages all BC interactions of the system.
MLM,MLM 0 (142) manage the evaluation of NL input. The query results (more specifically, the ordering of the query results) from KG generated by the NL input identify conflicts or errors associated with the NL input. When there is a conflict between the query result and the NL input, if the query result has a strong accuracy value, it is indicated that the NL input may be inaccurate. An accuracy manager (174) corrects the NL input by replacing the language of the NL input with the identified or selected triples in the generated list. A triplet (also referred to herein as a memory) is based on two or more nodes in KG and a relationship between the two or more nodes. In one embodiment, as captured from KG, the triples are subject-verb-object relationships. In one embodiment, the identification or selection may be based on a highest accuracy value, in one embodiment selected by the user. SimilarlyIn another embodiment, the identifying or selecting may be based on one or more factors that make up the composite accuracy value. Another form of conflict may occur when the knowledge engine (150) identifies an immutable factor associated with one or more entries in the list and further identifies a conflict between the immutable factor and the NL input. The accuracy manager (174) resolves the conflict by correcting the NL input by replacing the language of the NL input with the triplet associated with the entry having the immutable factor. In addition to conflicts, another solution may identify partial matches between the NL input and the ordered list entries in an accuracy manager (174). The partial match enables or instructs the KG manager (172) and BC manager (176) to create new entries for NL input in KG and corresponding BC ledgers, respectively. In addition, the KG manager (172) connects the new entry with the existing KG entry corresponding to the partial match. It should also be appreciated that NL inputs may not produce any matches, e.g., empty sets. If there is no match, the KG manager (172) and the BC manager (176) create a new KG entry and BC ledger entry, respectively, corresponding to the NL input. Thus, the NL input is defined by MLM, taking into account the data organized in KG 0 (142), and in one embodiment by an accuracy manager (174).
As shown and described herein, the MLM library (140) is operably coupled to the server (110) and includes a plurality of MLMs to support natural language processing on the AI platform. One or more of the MLMs may be dynamic and trained to adapt to new entities or relationships. Different KGs may be associated with different knowledge domains. For example, it can be based on KG 0 (164A) Identify or select a first MLM from a library (140), the MLM 0 (142). Responsive to processing NL input, KG can be addressed 0 (164A) Application of MLM 0 (142) And for the second KG, KG 1 (164B) MLM0 (142) was applied alone. The MLM manager (178) processes the results from the two KGs and their corresponding accuracy values and, based on the processing, identifies a modification of one of the KGs. In one embodiment, the accuracy value is evaluated to identify the authenticity of the modification. Upon authentication, the MLM manager (178) dynamically modifies the associated MLM, the MLM 0 (142). In one embodiment, the identifiedOther modifications may be extensions of the associated data set to include additional fields. Similarly, in one embodiment, the MLM manager (178) may determine that the modification is co-time or duration and use the classification as an element of supervising the modification. In one embodiment, MLM 0 (142) The modification of (a) results in the creation of a new MLM, e.g., MLM N (146) And in one embodiment retains the original MLM, MLM 0 (142). Thus, the MLM library (140) may be extended according to dynamic modifications of the MLM.
The types of information handling systems that may utilize the system (110) range from small handheld devices (e.g., handheld computer/mobile phone (180)) to large host systems (e.g., large computer (182)). Examples of handheld computers (180) include Personal Digital Assistants (PDAs), personal entertainment devices (e.g., MP4 players), portable televisions, and compact disc players. Other examples of information handling systems include pen or tablet computers (184), laptop or notebook computers (186), personal computer systems (188), and servers (190). As shown, various information handling systems may be networked together using a computer network (105). Types of computer networks (105) that may be used to interconnect the various information handling systems include Local Area Networks (LANs), wireless Local Area Networks (WLANs), the internet, public Switched Telephone Networks (PSTN), other wireless networks, and any other network topology that may be used to interconnect the information handling systems. Many information handling systems include nonvolatile data stores such as hard disk drives and/or nonvolatile memory. Some information handling systems may use a separate nonvolatile data store (e.g., server (190) utilizes nonvolatile data store (190 a) and mainframe computer (182) utilizes nonvolatile data store (182 a). The nonvolatile data store (182 a) may be a component that is external to the various information handling systems or may be internal to one of the information handling systems.
The information handling system may take a variety of forms, some of which are shown in FIG. 1. For example, an information handling system may take the form of a desktop, server, portable, laptop, notebook, or other form factor computer or data processing system. In addition, the information handling system may take other form factors such as a Personal Digital Assistant (PDA), a gaming device, ATM machine, a portable telephone device, a communication device or other devices that include a processor and memory.
In the art, an application program interface (Application Program Interface, API) is understood as a software intermediary between two or more application programs. With respect to the NL processing system shown and described in fig. 1, one or more APIs may be utilized to support one or more of the tools (172) - (178) and their associated functions. Referring to FIG. 2, a block diagram (200) is provided showing an NL processing tool and its associated APIs. As shown, a plurality of tools are embedded within the knowledge engine (205), wherein the tools include an API 0 (212) An accuracy manager (210) associated with the API 1 (222) Associated KG manager (220), and API 2 (232) An associated BC manager (230), and an API 3 (242) An associated MLM manager (240). Each API may be implemented in one or more languages and interface specifications. API (application program interface) 0 (212) Asset comparison, accuracy determination, accuracy decision and accuracy allocation are provided; API (application program interface) 1 (222) Providing KG creation, update and deletion; API (application program interface) 2 (232) Providing MLM creation, update and deletion; API (application program interface) 3 (242) BC contract creation, block creation, network communication, and block addition are provided. As shown, each of APIs (212), (222), (232), and (242) is operatively coupled to an API coordinator (250), also known as an orchestration layer, which is understood in the art to function as one abstraction layer for transparently connecting individual APIs together. In one embodiment, the functionality of the individual APIs may be combined or combined. Accordingly, the configuration of the APIs shown herein should not be considered limiting. Thus, as shown herein, the functionality of the tools may be embodied or supported by their respective APIs.
To provide additional details for improving understanding of selected embodiments of the present disclosure, reference is now made to fig. 3, which illustrates a process for initializing a form of KG. When the system is initialized, KG is empty. The MLM is created or utilized to extract entities and relationships from unstructured data. The MLM is created to understand a specific knowledge domain, namely biographical information, financial market, scientific domain, etc. Representative data is used to teach the system text to identify entities and relationships defined in the model. Referring to fig. 3, a flow chart (300) is provided that describes a process for filling KG from the natural language output of an NLP system. As part of the KG initialization and filling process, the accuracy value of the extracted triples is specified. The accuracy value includes a firmness indicator, a source reliability indicator, and a human feedback indicator. In one embodiment, each indicator comprising an accuracy value is a numeric value on a scale between 0 and 1. The certainty indicator reflects the certainty of the potential fact. In one embodiment, a certainty value of 1 reflects that the fact is certainly true, a value of 0 reflects that the fact is certainly false, and a value between 0 and 1 represents a level of certainty or uncertainty about the fact. The source reliability factor is associated with a source (e.g., a source of facts, including but not limited to data and time determining facts). Human feedback indicators track the number of affirmations and refutes of facts. In one embodiment, the factor tracks the number of responses. Thus, when KG is initialized and filled with data, components of the accuracy value are selected or set to be assigned to triples extracted via the NLP system.
Classification such as co-time and duration information is used to describe data that remains constant or that may change over time, respectively. In the case of the example of supervised training, the stability value is set to 1, the source reliability value is set to 1, and the human feedback is set to 0. These values are merely examples and may vary in one embodiment. In one embodiment, a KG Application Program Interface (API) provides a platform for specifying accuracy values. As shown, an MLM is created by defining entities and relationships (302). The MLM is trained using the representative data (304). After step (304), the MLM is used with the NLP to extract triples from the training data (306). The extracted triples may be saved to a file or streamed. In one embodiment, the extracted triples are subject-verb-object relationships. After step (306), KG is filled (308) with the extracted triples. In one embodiment, the KG API is used to read and parse triples from NLU output. In one embodiment, the triples filled in KG are referred to as memories. The MLM is created with training, after which the MLM is applied to the data to populate KG. Thus, the MLM, along with the NLP, extracts triples from the data and fills the previously empty KG.
For each subject-entity extracted from the NLP output (310), it is determined whether the subject-entity is present in the associated KG (312). Following a positive response to the determination of step (312), a determination is made as to whether there is a known relationship associated with the extracted subject-entity (314). If the response to the determination of step (314) is affirmative, it is determined whether the principal-entity is presented in KG together with the association and the assigned accuracy value (316). A positive response to the determination of step (316) is an indication that there is a subject-entity relationship in KG, and the process ends. However, following a negative response to any of the determinations shown in steps (312), (314) and (316), a new triplet and an entry for the new triplet is created in KG (318). Thus, as shown, the MLM is used to extract data from the NLP document and access the KG manager to selectively populate KG with the extracted data.
Referring to FIG. 4, a flow chart (400) is provided that describes a process for creating a new triplet from extracted data. As depicted in fig. 3, an accuracy value component of the extracted data is created or assigned. In one embodiment, the accuracy value component is created based on a supervision related to KG initialization. For each new triplet, e.g., subject-verb-object relationship, an accuracy value is assigned to the triplet (402). In one embodiment, the accuracy value is assigned via a KG API. Following step (402), an entry is created in the corresponding or designated BC ledger (404). More specifically, at step (404), the BC entry stores the triplet accuracy value and an identifier, referred to herein as a BC identifier, is created and then retrieved. In one embodiment, the retrieved BC identifier is a Uniform Resource Identifier (URI) or other unique asset identifier. After step (404), the new triplet is inserted into KG with the associated BC identifier (406). In one embodiment, at step (406), the KG API implements the insertion of triples and associated BC identifiers. Thus, as shown, the accuracy value of each new triplet is stored in the respective BC ledger and the associated BC identifier is stored or otherwise associated with the triplet in the KG entry.
The processes shown and described in fig. 3 and 4 may also be used to populate KG from natural language output of NLP systems using unsupervised training (e.g., data may not be authentic) or using supervised training. As shown and described in fig. 3 and 4, KG API is used to set accuracy values for data extracted from NLP output. Depending on the resource, the accuracy value may be set to indicate uncertainty. For example, in one embodiment, the loyalty indicator may be set to 0.5, the source reliability may be determined to be 0.5, and the human feedback may be set to 0. Thus, the unsupervised training may be reflected in different sets of accuracy values.
During processing of the non-training data, if no exact triplet match is found, a new memory is created and stored in the corresponding or identified KG. This may be achieved when multiple documents are considered to be processed on the same topic. For example, one document may identify facts having a first date, and a second document may identify the same facts having a second date. However, only one date is in fact correct. As shown in fig. 3 and 4, each triplet of input KG has a corresponding accuracy value that is used as an indicator of the correctness of the stored memories. These accuracy scores may be used to determine the accuracy and/or correctness of the conflicting facts filled into KG.
Referring to fig. 5A and 5B, a flow chart (500) is provided that describes a process for extracting triples from NLP output. As shown, a query or statement is presented to KG by an accuracy manager (502). The presentation may be for a variety of reasons including, but not limited to, a fact check. The MLM is used with the NLP to extract triples from KG (504), and KG API is used to read and parse triples from the NLP output (506). The following table illustrates one example triplet:
principal-entity Relationship of Principal-entity-value
George Washington Occurs in 22 nd 1832 nd month
TABLE 1
After step (506), a variable X is assigned to the number of parsed triples Total (508). Then determine X Total Whether greater than zero (510). A negative response to the determination of step (510) ends the extraction process (512), as this is an indication that the query produced an empty set. However, following the positive response to the determination of step (510), the parsed triplet is processed (514). The triplet count variable is set to 1 (516), and for each triplet X, KG is queried to obtain all triples with the same subject-entity and relationship (518). As shown in fig. 3 and 4, each triplet has an associated BC identifier. The BC identifier is used to access the corresponding BC ledger and obtain a stored triplet accuracy value (520). After step (520), the triplet count variable is incremented (522). It is then determined if each identified triplet has been processed (527). After a negative response to the determination of step (522), return to step (518). Similarly, the process of querying KG and corresponding BC ledger entry is ended for a determined positive response (526), and the extracted and processed triples are ordered (528). (528) The ordering at this point is used to place the triples in a sequence. For example, in one embodiment, triples may be sorted in ascending order by stability indicator, source reliability, and human feedback. Similarly, the ordering order may be customizable to accommodate a particular use case. For example, in one embodiment, human feedback indicators may be prioritized. Thus, triplet extraction utilizes KG to obtain or identify The triplet and associated BC identifier are used to obtain the associated accuracy value, which is then used as a feature to classify the triplet.
Table 2 below is an extension of table 1, which shows an example ordering of two triples:
Figure BDA0001938033370000141
TABLE 2
In the example of table 2, there are two triplet entries, each associated with a different topic-entity value. As shown, the entries are ordered by a stability indicator or source reliability indicator Fu Sheng. The ranking factor should not be considered limiting. In one embodiment, the ordering may be reversed and in decreasing order, or based on different components of the accuracy value. The first triplet entry in this example is defined by the subject entity and the relationship is considered to have the largest accuracy value, e.g., accuracy score.
The business use case drives the interpretation of the query results. For example, if a triplet with a higher confidence score is implemented, the system may be configured to automatically replace the original value of the subject entity value with a higher accuracy score. The stability indicator reflects the accuracy of the returned information. As shown, after step (528), the business case is applied to the search results (530). After the application of step (530), the KG and the appropriate or identified BC ledgers associated with the corresponding BC identifiers in the KG are queried (532). The query of step (532) obtains all associated relationships and principal-entity values. More specifically, this enables analytical review of all data of the subject-entity. After step (532), the NLP input or output data is enhanced (534). Examples of enhancements include, but are not limited to: correction, analysis, enhancement, and masking. Correction includes replacing the subject entity value with data from the memory. In one embodiment, the replacement is local, e.g., for the query, and is not reflected in KG or BC. Analysis includes adding a list of subject-relationship-values with accuracy. Enhancement includes supplementing the results with all known subject-relationship-values with the highest confidence level, e.g., one value per subject-relationship pair. Masking includes deleting one or more triples from the NLP output. After step (532), the enhancement data is returned. Thus, alternatively, a different use case may be used to drive interpretation of the search results, which may also be enhanced to return one or more appropriate data elements from the NLP input.
As shown and described in fig. 5A and 5B, one or more queries may be processed for the created KG. It is understood that KG operates as a tool to organize data, each triplet reflected in the graph representing or otherwise associating an accuracy score component, e.g., stability, reliability, and feedback. It is understood that one or more of the accuracy score components may be dynamic, e.g., the value varies over time. In the selected KG, this change may be uniform, affecting each triplet represented in the KG, or it may be non-uniform and selectively affecting one or more triples in the KG.
Referring to fig. 6, a flow chart (600) is provided that describes a process for partitioning one or more KGs. The example of partitioning shown here is based on variations in the reliability factor. This is merely an example and in one embodiment the partitioning may be based on a change in stability or feedback factor. The reliability factor reflects a measure of the reliability of the source of the data. A reliability factor value is received 602. In one embodiment, the reliability factor value is part of NL input and feedback via the KG API. KG is queried to identify entries associated with the received reliability value (604). It is then determined whether any KG entries are identified (606). The negative response to the determination of step (606) ends the partitioning process because there is no basis for constraining KG based on the received reliability factor (616). However, following a positive response to the determination of step (606), a partition is created within the KG (608) and the created partition is populated with entries in the KG having the identified reliability values (610). The partition creation of step (608) effectively creates a second partition (612) filled with the remaining entries in the original KG.
It should be appreciated that the entries in the first and second partitions of KG have different reliability factor values. As described above, the accuracy value is used as a combination of stability, reliability, and feedback values. Any single component value change will have an effect on the combination, which may affect any query results. After step (612), an accuracy assessment (614) within KG (including the first and second partitions) is performed. The evaluation of step (614) includes a comparison of the data populated in the first KG partition (e.g., first data) with the data populated in the second partition (e.g., second data). In one embodiment, accuracy assessment is performed automatically after partitioning. It will be appreciated that the data populated in the first partition will have a different accuracy value than the data in the second partition. The partitioning shown here is based on the variation of one component represented in the accuracy value. In one embodiment, the partitioning may be performed on two or more components of the accuracy value or on changes to the components. Thus, a change in any one component including an accuracy value may include creating one or more partitions of the associated KG.
As shown in fig. 6, KG may be partitioned. The opposite concept may occur by linking or otherwise connecting two or more ks and associated BC ledgers. Referring to fig. 7, a flow chart (700) is provided that illustrates a process for linking two KGs and an associated BC ledger. In one embodiment, at least slightly related KG may be attached. The relationship may be based on the content or relationship represented in KG. As shown, a query is presented to a knowledge base (702), and two or more KGs are identified (704). In one embodiment, the KG API recognizes that two KGs contain data related to a query. Similarly, in one embodiment, the KG API may recognize more than two KG's, and thus the number of KG's recognized should not be considered limiting. A link is established between the identified KGs (706). The linking of two or more KGs maintains the structure of the individual KGs, i.e. the structure remains unchanged.
It should be appreciated that the relationships between KG, in particular, the data represented therein, may provide query results, e.g., memory, with conflicting triples. To resolve the potential conflict, an evaluation of the KG of the link is made to compare the data elements (708). More specifically, the comparison includes an evaluation (710) of the data represented in KG for each link, including their corresponding accuracy value components. The identified conflicting data elements are selectively replaced based on at least one accuracy value component (e.g., stability, reliability, and feedback) (712). The replacement follows the structure of KG alone. In other words, the nodes in KG are not deleted or added links. But may replace the data represented in the identified node. Thus, replacing conflicting entries in the KG of the link mitigates conflicting query results.
Referring to fig. 8A and 8B, a flow chart (800) is provided that illustrates utilizing MLM to enhance query input. More specifically, the results of query submission may indicate errors pointing to the query input. As shown, natural language input is received and processed (802). The received input is queried based on the context (804), including one or more specified KG's, and in one embodiment, corresponding BC ledgers. The query processing generates results, e.g., memory, in the form of one or more triples that are extracted or identified from the specified KG (806). As described above, each triplet includes a body, an object, and an association relationship. Variable X Total The number of triples extracted or identified from KG (808). It is then determined whether the quantity extracted at step (808) includes at least one triplet (810). Following a positive response to the determination of step (810), the associated triplet count variable is initialized (812). Each triplet has a BC identifier corresponding to a BC ledger entry that includes an accuracy value associated with or assigned to the triplet. For each extracted or identified triplet, e.g., triplet x, a BC identifier is obtained 814 from which the BC ledger is queried and a corresponding accuracy value is identified 816. After step (816), the triplet count variable is incremented (818) and an evaluation is made to determine if each extracted or identified KG has been evaluated (820). After a negative response to the determination of step (820), step (814) is returned, and the affirmative response ends the process of triplet extraction and identification. Thus, for each triplet determined to be associated with a query input, a related accuracy value is identified.
Following a negative response to the determination at step (810), a new triplet is created (822) for the entry in the associated KG. The new triplet corresponds to the received natural language input (e.g., query submission) and an accuracy score is assigned to the new triplet (824). Further, an entry in the BC ledger corresponding to KG is created (826). A BC identifier associated with the BC ledger entry is created and stored with the new triplet in KG (828), and the assigned accuracy score is stored in the corresponding ledger entry (830). Thus, a set of empty triples returned from the query input results in an increase in KG and corresponding BC ledgers.
It should be appreciated that query submissions may return responses in the form of one or more triples from the associated KG, as identified by a positive response to the determination at step (820). After the identified triples have been processed and ordered (832), the MLM enhances the natural language input to correspond to the ordering of the identified triples (834). Enhancement may take one or more forms. For example, in one embodiment, conflicts between triples resulting from natural language input and ordering are enhanced (836). When a conflict is identified, the enhancement of the MLM is to identify the correct triplet (838) from the ordering and modify the NL input to correspond to the form of the identified triplet (840). The identification of step (838) may take different forms. For example, in one embodiment, the identification may be based on an associated accuracy value, which, as described above, is a composite score. Similarly, in one embodiment, one or more components including an accuracy value may be employed as a ranking factor to rank the triplet list. In another embodiment, the sorting may be based on an immutable factor associated with the triplet entry, based on which the triples are sorted. Thus, the augmentation may be based on the identified conflict.
It should be appreciated that enhancements may take other forms in response to a match (or, in one embodiment, a partial match). When a match is enhanced (842) between at least one triplet originating from the natural language input and the ordering, an entry for the natural language input and a BC ledger entry (844) are created in the respective KG. Similarly, when the enhancement results from a partial match between the natural language input (846) and the at least one identified triplet, a new triplet entry (848) in the relevant KG is created. The new triplet corresponds to the received NL input, e.g., query submission, and the certainty score is registered for the new triplet (848). Further, an entry in the BC ledger corresponding to KG is created (850). The BC identifier associated with the BC ledger entry is created and stored in KG along with the new triplet (852) and the assigned accuracy score is stored in the corresponding ledger entry (854). In addition, the new triplet entry in KG is connected to the triplet identified by the partial match (856). Thus, as shown, the addition of a match or partial match includes creating an entry in the corresponding KG and associated BC ledger.
As shown and described in FIGS. 3-8B, the MLM is used to support natural language processing in the form of query submissions to identify data stored in KG and, in one embodiment, to enhance query submissions. It should be appreciated that the MLM is dynamic and may change. KG may be used to create one or more new MLMs and/or to retrain existing MLMs. When the ontology is modified, new entities and relationships are implemented. This new information can then be utilized to automate the training of the MLM to support dynamic and progressive MLMs, create new MLMs, or augment existing MLMs.
Referring to fig. 9, a flow chart (900) provides a process for training an existing MLM. In the process shown here, there is an NLP library of MLMs. MLMs in the library (referred to herein as first MLMs) are identified or selected based on their alignment with knowledge domains expressed in KG (referred to herein as first KG) (902). In response to receiving a natural language input for a first KG query, the identified or selected first MLM processes the query input and extracts one or more triples from the first KG (904). In addition, a second KG (906) is identified, and in one embodiment, the second KG is associated with the first KG. The MLM processes the same query using the second KG and extracts one or more triples from the second KG (908). Each triplet extracted at steps (904) and (908) is also referred to herein as a memory, and includes a body, an object, and a relationship. As described above, each triplet has an associated BC identifier that indicates the BC ledger storing the corresponding accuracy value. Following step (908), each extracted triplet is processed to identify the associated accuracy value stored in its corresponding BC ledger entry (910). The triples of the first KG and the triples of the second KG are evaluated and compared 912. More specifically, the evaluation of step (912) evaluates whether the content and/or structure of the first KG is subject to modification, as reflected in the second KG (914). For a dynamically modified MLM, it is determined whether the two subjects KG have associated structure and content. The modification may be demonstrated by comparing triples returned from the first and second KG. A negative response to the evaluation of step (914) ends the MLM modification (922). However, the positive response to the evaluation of step (914) is followed by identification of content and/or structural changes (916). Further, the corresponding accuracy value is evaluated to verify the authenticity of the change (918). Based on the verification of step (918), the structure of the MLM is subject to dynamic modification (920).
The modification of step (920) may take different forms. For example, in one embodiment, the modification of the MLM may conform to the validated change reflected in the second KG entry as compared to the first KG entry. In another embodiment, the modification may be based on an evaluation of the corresponding accuracy value of the extracted data. Thus, KG-based changes demonstrate that MLM may vary.
Further, it should be understood that the data and correlation represented in KG may be co-time or duration information. The classification may be imported into the assessment at step (912). Data that should not be altered and that has been proved to be modified should not be reflected in the MLM modification. Thus, data classification can be imported into the data evaluation and the associated MLM evaluation.
Referring to fig. 10, a flow chart (1000) is provided that illustrates a process for progressive and adaptive MLM configuration. The KG API periodically searches the associated or identified KG for new entities, relationships, and data (1002). The identification in step (1002) may be accomplished by examining the data and/or time of the entries within the KG, or comparing the entities and relationships in the existing MLM with the data contained in the KG. A list of entities and relationships that are present in KG and not present in the MLM of interest is generated (1004). The list is generated in a format that can be used by the training tools used to generate the MLM. The consumable data is streamed to update the structure of the existing MLM (1006). In one embodiment, the KG API generates a language declaration from the KG that expresses each triplet that can then be fed to the MLM for training. After step (1006), the updated MLM is stored as a new MLM in the MLM library (1008). In one embodiment, the progressive MLM configuration is incremental in that it represents an incremental change to the existing MLM. The incremental machine learning function may synchronize the structure of the MLM and KG. Successive or incremental changes are performed on the target MLM such that with each incremental change, the ability of the MLM to extract data from KG increases and the MLM effectively adapts.
The system and flowchart shown here may also be in the form of a computer program device for use with a smart computer platform to facilitate NL processing. The device has program code embodied thereby. The program code may be executed by a processing unit to support the described functions.
As shown and described, in one embodiment, the processing unit supports the functionality of searching the corpus for evidence of existing KG and corresponding MLM and corresponding BC ledgers and associated entries. The composite accuracy score defines and/or quantifies the relevant data and provides a weight for making one or more evaluations. The record of the accuracy score and associated components in the corresponding BC ledger provides authenticity to the data. Each entry in the result set is evaluated based on a respective accuracy score. As described herein, KG is modified, including partitioning and linking, and assigning an accuracy score component to data representing or assigned to one or more select KG. Similarly, as described herein, the MLM may be dynamically adjusted to reflect structural changes to one or more KG. More specifically, the MLM accommodates new entities and entity relationships.
It should be appreciated that a system, method, apparatus, and computer program product for dynamic MLM generation and enhancement using memory and external learning are disclosed herein. As disclosed, the system, method, apparatus, and computer program product apply NL processing to support MLM, and for MLM to support KG persistence.
While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, that changes and modifications may be made without departing from this invention and its broader aspects. It is therefore intended that the appended claims cover all such changes and modifications that are within the true spirit and scope of this invention. Furthermore, it is to be understood that the invention is solely defined by the appended claims. It will be understood by those with skill in the art that if a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitation no such limitation is present. For non-limiting example, as an aid to understanding, the following appended claims contain usage of the introductory phrases "at least one" and "one or more" to introduce claim elements. However, the use of these phrases should not be construed to imply that: claim elements introduced by the indefinite articles "a" or "an" limit any particular claim containing such introduced claim element to an invention containing only one such element, even if the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles "a" or "an"; the same holds true for the use in the claims of definite articles.
The present invention may be a system, method, and/or computer program product. Furthermore, selected aspects of the invention could take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and/or hardware aspects (generally referred to herein as a "circuit," "module" or "system"). Furthermore, aspects of the present invention may take the form of a computer program product embodied in a computer-readable storage medium having computer-readable program instructions for causing a processor to perform aspects of the present invention. As such, the implemented, disclosed systems, methods, and/or computer program products are operable to improve the function and operation of machine learning models based on accuracy values and utilizing BC techniques.
A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium would include the following: portable computer diskette, hard disk, dynamic or static Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), magnetic storage device, portable compact disc read-only memory (CD-ROM), digital Versatile Disc (DVD), memory stick, floppy disk, mechanical coding means such as punch cards or bump structures in the grooves on which instructions are recorded, and any suitable combination of the foregoing.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server or servers. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. In particular, natural language processing may be performed by different computing platforms or across multiple devices. Furthermore, the data store and/or corpus may be localized, remote, or distributed across multiple systems. Accordingly, the scope of the invention is to be limited only by the following claims and equivalents thereof.

Claims (27)

1. A computer system, comprising:
a processing unit operably coupled to the memory;
an artificial intelligence platform in communication with the processing unit and the memory;
a knowledge engine operably coupled to the processing unit to train a machine learning model, MLM, comprising:
selecting a first MLM from a natural language NL processing library of a machine learning model MLM, wherein the first MLM is aligned with a knowledge domain expressed in a first knowledge graph KG;
receiving NL input, inquiring the input aiming at the first KG, and extracting one or more triples from the first KG;
applying the selected MLM to a second KG different from the first KG, and extracting one or more triples from the second KG, wherein each triplet includes a body, an object, and a relationship;
For each extracted triplet:
obtaining a blockchain BC identifier associated with each triplet; and
identifying a triplet accuracy value from the corresponding BC ledger;
detecting a modification of the first KG from one or more triples extracted from the second KG, wherein a modification is selected from the group consisting of: content and structure, and combinations thereof;
evaluating the detected modification, including using the obtained BC identifier to evaluate the accuracy of the detected modification; and
the first MLM is dynamically enhanced in response to a received NL input.
2. The system of claim 1, wherein the detected modification is content, and further comprising the knowledge engine to categorize the detected modification, wherein the categorization is selected from the group consisting of: co-time and duration.
3. The system of claim 2, wherein the detected modifications are classified as conflicting data and further comprising the knowledge engine to utilize the estimated accuracy values of the first and second KG and to limit modifications of the first MLM according to the estimated accuracy values.
4. The system of claim 2, further comprising using the classification as a contribution factor to the modification estimate.
5. The system of claim 1, wherein dynamically enhancing the first MLM comprises the MLM creating a new MLM.
6. A method for processing natural language, comprising:
selecting a first MLM from a natural language NL processing library of a machine learning model MLM, wherein the first MLM is aligned with a knowledge domain expressed in a first knowledge graph KG;
receiving NL input, inquiring the input aiming at the first KG, and extracting one or more triples from the first KG;
applying the selected MLM to a second KG different from the first KG, and extracting one or more triples from the second KG, wherein each triplet includes a body, an object, and a relationship, and for each extracted triplet:
obtaining a blockchain BC identifier associated with each triplet; and
identifying a triplet accuracy value from the corresponding BC ledger;
detecting a modification of the first KG from one or more triples extracted from the second KG, wherein a modification is selected from the group consisting of: content and structure, and combinations thereof;
evaluating the detected modification, including using the obtained BC identifier to evaluate the accuracy of the detected modification; and
the first MLM is dynamically enhanced in response to a received NL input.
7. The method of claim 6, wherein the detected modification is content, and further comprising:
Classifying the detected modification, wherein the classification is selected from the group consisting of: co-time and duration.
8. The method of claim 7, further comprising using the classification as a contribution factor to the modification estimate.
9. The method of claim 7, wherein the detected modification is classified as conflicting data, and further comprising:
and using the estimated accuracy values of the first and second KGs and limiting modification of the first MLM according to the estimated accuracy values.
10. The method of claim 6, wherein dynamically enhancing the first MLM comprises the MLM creating a new MLM.
11. A computer system, comprising:
a processing unit operatively connected to the memory;
an artificial intelligence platform in communication with the processing unit and the memory;
a knowledge engine in communication with the processing unit to utilize a machine learning model, MLM, manager, comprising:
receiving natural language NL input and querying the input according to a context, wherein the context comprises a specific knowledge graph KG and a corresponding blockchain BC ledger;
extracting one or more triples from the particular KG, wherein each triplet includes a body, an object, and a relationship;
acquiring a BC identifier;
Identifying a corresponding accuracy value in the BC ledger;
generating a triplet list using the identified accuracy values and ordering the generated triplet list based on the factors; and
the MLM manager uses the received natural language input to augment one or more MLMs.
12. The system of claim 11, further comprising: the knowledge engine to identify conflicts between the NL input and entries in the generated list, and further includes the knowledge engine to correct the received NL input by replacing with the identified triples in the generated list.
13. The system of claim 11, further comprising: the knowledge engine to identify a match between the NL input and at least one triplet in the generated list, and further includes a knowledge engine to create an entry for the NL input in the KG and a corresponding BC ledger.
14. The system of claim 11, further comprising: the knowledge engine to identify conflicts between the NL input and entries in the generated list, and further includes the knowledge engine to rank the generated list with a selection component of the identified accuracy values, and return triples in the ranked list corresponding to the selected accuracy value component.
15. The system of claim 11, wherein the knowledge engine identifies an immutable factor, and a conflict between the NL input and at least one entry in the list associated with the immutable factor, and further comprising the knowledge engine to return a related triplet from a list entry having the immutable factor and a corresponding BC identifier for the returned triplet.
16. The system of claim 11, further comprising the knowledge engine to identify a partial match between the NL input and at least one triplet in the generated list, and further comprising the knowledge engine to create a new entry and a corresponding BC ledger in the KG, and to connect the created new entry with an entry corresponding to the partial match.
17. The system of claim 11, wherein the generated triplet list is empty, and further comprising:
the knowledge engine registers the certainty score for the created triplet with creating a new triplet corresponding to the received NL input, creates an entry for the new triplet in KG, and creates a corresponding entry for the new triplet in the BC ledger.
18. The system of claim 17, further comprising: a knowledge engine for storing BC identifiers associated with BC ledger entries with new triples in KG and storing assigned accuracy scores with BC ledger entries.
19. A method of processing natural language NL, comprising:
receiving natural language input and querying the input for a context, wherein the context comprises a particular knowledge graph KG and a corresponding blockchain BC ledger;
extracting one or more triples from the particular KG, wherein each triplet includes a body, an object, and a relationship;
for each extracted triplet, obtaining a BC identifier identifying a corresponding accuracy value in the BC ledger;
generating a triplet list according to the identified accuracy value, and sorting the generated triplet list based on the factors; and
one or more MLMs are enhanced with the received natural language NL input.
20. The method of claim 19, wherein enhancing the NL input identifies a conflict between the NL input and an entry in the generated list, and further comprising:
sorting the generated list with the selection component of the identified accuracy values and returning triples in the sorted list corresponding to the selected accuracy values; and
the received NL input is replaced with the identified triplet in the ordered list.
21. The method of claim 19, wherein enhancing the NL input identifies a match between the NL input and at least one triplet in the generated list, and further comprising:
An entry for the NL input is created in the KG and corresponding BC ledger.
22. The method of claim 19, wherein enhancing the NL input identifies a conflict between the NL input and at least one entry in the list associated with an immutable factor, and further comprising:
the associated triples are returned from the list entry, the returned triples having an immutable factor and corresponding BC identifiers of the returned triples.
23. The method of claim 19, wherein enhancing the NL input identifies a partial match between the NL input and at least one triplet in the generated list, and further comprising:
creating a new entry in the KG and corresponding BC ledger and concatenating the created new entry with the entry corresponding to the partial match.
24. The method of claim 19, wherein the generated triplet list is empty, and further comprising:
creating a new triplet corresponding to the received natural language input;
assigning an authenticity score to the created triplet; and
an entry is created for a new triplet in the KG and a corresponding entry for the new triplet is created in the BC ledger.
25. A computer system, comprising:
A processing unit;
a computer readable storage device coupled to the processing unit, the computer readable storage device comprising instructions which, when executed by the processing unit, implement the method of any one of claims 6-10, 19-24.
26. A computer readable storage medium having program code stored thereon, the program code being executable by a processing unit to implement the method of any one of claims 6-10, 19-24.
27. A system for processing natural language, the system comprising modules for performing the steps of the method of any one of claims 6-10, 19-24, respectively.
CN201910012993.7A 2018-01-10 2019-01-07 Machine learning model modification and natural language processing Active CN110019751B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US15/866,706 US10606958B2 (en) 2018-01-10 2018-01-10 Machine learning modification and natural language processing
US15/866,702 US10776586B2 (en) 2018-01-10 2018-01-10 Machine learning to integrate knowledge and augment natural language processing
US15/866,702 2018-01-10
US15/866,706 2018-01-10

Publications (2)

Publication Number Publication Date
CN110019751A CN110019751A (en) 2019-07-16
CN110019751B true CN110019751B (en) 2023-06-02

Family

ID=67188742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910012993.7A Active CN110019751B (en) 2018-01-10 2019-01-07 Machine learning model modification and natural language processing

Country Status (1)

Country Link
CN (1) CN110019751B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210042628A1 (en) * 2019-08-09 2021-02-11 International Business Machines Corporation Building a federated learning framework
CN111046241B (en) * 2019-11-27 2023-09-26 中国人民解放军国防科技大学 Graph storage method and device for flow graph processing
TWI798513B (en) * 2019-12-20 2023-04-11 國立清華大學 Training method of natural language corpus for the decision making model of machine learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912559A (en) * 2015-02-20 2016-08-31 国际商业机器公司 Extracting complex entities and relationships from unstructured data
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates
US9767094B1 (en) * 2016-07-07 2017-09-19 International Business Machines Corporation User interface for supplementing an answer key of a question answering system using semantically equivalent variants of natural language expressions

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9704104B2 (en) * 2015-02-20 2017-07-11 International Business Machines Corporation Confidence weighting of complex relationships in unstructured data
CH711033B1 (en) * 2015-05-04 2022-07-15 Kiodia Sarl Relational search engine.
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
CN107368468B (en) * 2017-06-06 2020-11-24 广东广业开元科技有限公司 Operation and maintenance knowledge map generation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912559A (en) * 2015-02-20 2016-08-31 国际商业机器公司 Extracting complex entities and relationships from unstructured data
US9767094B1 (en) * 2016-07-07 2017-09-19 International Business Machines Corporation User interface for supplementing an answer key of a question answering system using semantically equivalent variants of natural language expressions
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates

Also Published As

Publication number Publication date
CN110019751A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN111566654B (en) Machine learning integrating knowledge and natural language processing
US10846485B2 (en) Machine learning model modification and natural language processing
US10776586B2 (en) Machine learning to integrate knowledge and augment natural language processing
Hariri et al. Supporting domain analysis through mining and recommending features from online product listings
US10198491B1 (en) Computerized systems and methods for extracting and storing information regarding entities
US20170103324A1 (en) Generating responses using memory networks
US20160092448A1 (en) Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon
CN110019751B (en) Machine learning model modification and natural language processing
US9928269B2 (en) Apply corrections to an ingested corpus
WO2014160282A1 (en) Classifying resources using a deep network
US11562029B2 (en) Dynamic query processing and document retrieval
CN116601626A (en) Personal knowledge graph construction method and device and related equipment
CN113761219A (en) Knowledge graph-based retrieval method and device, electronic equipment and storage medium
Zoupanos et al. Efficient comparison of sentence embeddings
US20200226213A1 (en) Dynamic Natural Language Processing
US10147095B2 (en) Chain understanding in search
Rahmani et al. Improving code example recommendations on informal documentation using bert and query-aware lsh: A comparative study
Irfan et al. TIE: an algorithm for incrementally evolving taxonomy for text data
US12001951B2 (en) Automated contextual processing of unstructured data
US20240004915A1 (en) Ontology customization for indexing digital content
CN111435409B (en) Dynamic query processing and document retrieval
Irfan et al. Evolving the taxonomy based on hierarchical clustering approach
Rana Movie Recommendation System
Pajankar et al. Bringing It All Together
Du Domain Parking Recognizer: an experimental study on web content categorization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant