CN110019751A - Machine learning model modification and natural language processing - Google Patents

Machine learning model modification and natural language processing Download PDF

Info

Publication number
CN110019751A
CN110019751A CN201910012993.7A CN201910012993A CN110019751A CN 110019751 A CN110019751 A CN 110019751A CN 201910012993 A CN201910012993 A CN 201910012993A CN 110019751 A CN110019751 A CN 110019751A
Authority
CN
China
Prior art keywords
triple
mlm
input
entry
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910012993.7A
Other languages
Chinese (zh)
Other versions
CN110019751B (en
Inventor
D·巴卡雷拉
J·H·巴奈贝四世
N·劳伦斯
S·帕特尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/866,706 external-priority patent/US10606958B2/en
Priority claimed from US15/866,702 external-priority patent/US10776586B2/en
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN110019751A publication Critical patent/CN110019751A/en
Application granted granted Critical
Publication of CN110019751B publication Critical patent/CN110019751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of system, computer program product and method are provided, the frame automation of the persistent data for making knowledge based figure, and solve the time change in knowledge graph and uncertainty.Natural language understanding and one or more machine learning models (MLM) from unstructured information together for extracting data, including entity and entity relationship.The data of extraction are filled into knowledge graph.Since KG may change, KG is for creating the existing machine learning model of new and re -training (MLM).Weighting is applied to the data of filling in the form of accuracy value.Block chain technology be used to fill data, to ensure the reliability of data, and provide auditing possibility to assess the variation of data.

Description

Machine learning model modification and natural language processing
Background technique
The present embodiment is related to natural language processing.More specifically, the present embodiment is related to conveying in natural language processing With the artificial intelligence platform using memory.
In artificial intelligence computer system regions, natural language system (such as IBM WatsonTMArtificial intelligence computer system System and other natural language question answering systems) the knowledge processing natural language based on system acquisition.It, can in order to handle natural language The system is trained from database or architectonic data to use, but due to the particularity with language construction and mankind inference Or the new relevant a variety of causes of training data, obtained result may be incorrect or inaccurate.
Machine learning is the subset of artificial intelligence (AI), is learnt from data using algorithm and pre- based on the data creation See.AI refers to the intelligence that machine can make a policy based on information, this can maximize the successful chance in given theme. More specifically, AI can be solved the problems, such as from data focusing study and be provided related advisory.AI is the son that cognition calculates Collection, cognition calculate refer to learning on a large scale, have purpose reasoning, naturally with the system of human interaction.It is to calculate that cognition, which calculates, The mixture of machine science and cognitive science.Cognition calculate using self teaching algorithm, using data minimize, visual identity and from Right Language Processing solves the problems, such as and optimizes mankind's process.
Cognitive system is substantially non-deterministic.Specifically, the data exported from cognitive system are vulnerable to provided And it is used as the influence of the information of input.For example, when disposing new machine learning model, it cannot be guaranteed that system is by extraction and previously Identical entity.New model may have an adverse effect to prior model result.It similarly, can by the mistake that document introduces It can cause to extract incorrect data and provide incorrect data as output.It is true therefore, it is necessary to be created in cognitive system Qualitative behavior.
Summary of the invention
Embodiment includes the system for the natural language processing of the deterministic data for cognitive system, computer program Product and method.
In one aspect, a kind of system is provided, there is the processing unit for being operatively coupled to memory, wherein people Work intelligent platform is communicated with processing unit and memory.The knowledge engine communicated with processing unit is provided, mould is learnt with training machine Type (machine learning model, MLM).More specifically, knowledge engine identification or otherwise selection with first First MLM of the knowledge domain alignment expressed in knowledge graph receives nature language in-put to inquire for the first knowledge graph for mentioning Triple is taken, and the MLM identified is used for the second knowledge graph, second knowledge graph is for extracting one or more ternarys Group.Each triple includes theme, object and relationship.For each triple, BC identifier is obtained, and is classified in corresponding BC Triple accuracy value is identified in account.Knowledge engine detects modification by comparing the triple extracted, and assesses and detect Modification, assesses the accuracy of modification including using the BC identifier of acquisition.Based on the modification as structural modification, dynamically repair Change the first MLM.
On the other hand, a kind of computer program product is provided to handle natural language.The computer program product packet Computer readable storage devices are included, there is the program code that can be executed by processing unit.Program code is provided from the library MLM The first MLM is selected, wherein the knowledge domain expressed in selected MLM and the first knowledge graph is aligned.Further provide for program code To receive nature language in-put, the input is inquired for the first knowledge graph, and extracts one or more three from the first knowledge graph Tuple.Additionally, it is provided program code with by selected MLM be used for the second knowledge graph and from the second knowledge graph extract one or Multiple triples.Each triple includes main body, object and the relationship between them.For the triple of each extraction, journey Sequence code obtains BC identifier and identifies triple accuracy value from corresponding BC ledger.Program code based on extraction three Tuple detects the modification of the first knowledge graph, assessment modification, and dynamically modifies the first MLM based on the modification as structural modification.
On the other hand, a kind of method for handling natural language is provided.This method includes selection and the first knowledge First MLM of the knowledge domain alignment expressed in figure.First MLM is selected from the natural language processing library of two or more MLM.? After MLM selection, nature language in-put is received, is inquired for the processing of the first knowledge graph, and extract one from the first knowledge graph A or multiple triples.Selected MLM is applied to second knowledge graph different from the first knowledge graph.It is mentioned from the second knowledge graph Take one or more triples.The triple of each extraction includes theme, object and incidence relation.For the ternary of each extraction Group obtains BC identifier associated with the triple, and identifies the triple accuracy value from corresponding BC ledger. After application MLM, the modification of the first knowledge graph is detected, wherein modification is that content and/or structure change.What assessment detected repairs Change comprising assess the accuracy of modification using BC identifier, and dynamically modify the first MLM in response to structural modification.
From below in conjunction with attached drawing, in detailed description of the currently preferred embodiments, these and other feature and advantage will become Obviously.
Brief Description Of Drawings
The attached drawing of this paper constitutes part of specification.Unless expressly stated otherwise, otherwise feature shown in the drawings is only For illustrating some embodiments, rather than all embodiments.
Fig. 1 depicts the system diagram for showing the schematic diagram of natural language processing system.
Fig. 2 depicts the block diagram for showing the NL handling implement shown in Fig. 1 and their relevant API.
Fig. 3, which depicts to show from the natural language (NL) of natural language processing (NLP) system, shows filling knowledge graph (KG) Process flow chart.
Fig. 4 depicts the flow chart for showing the process from the new triple of the data creation of extraction.
Fig. 5 A and 5B depict the flow chart for showing and exporting from NLP and extracting the process of triple.
Fig. 6 depicts the flow chart for showing the process for dividing KG.
Fig. 7 depicts the flow chart for showing the process for linking two KG.
Fig. 8 A and 8B depict the flow chart for showing using machine learning model (MLM) process for enhancing inquiry input.
Fig. 9 depicts the flow chart for showing the process of the existing MLM of training.
Figure 10 depicts the flow chart for showing the process of progressive and adaptive M LM configuration.
Specific embodiment
It is easily understood that as being generally described in this paper attached drawing and illustrating, the component of the present embodiment can with it is various not With configuration arrange and design.Therefore, the device of the present embodiment, system, method and computer program as shown in the drawings The range described in detail below for being not intended to limit embodiment claimed of the embodiment of product, and be only to represent institute Select embodiment.
The reference of " selection example ", " one embodiment " or " embodiment " is meaned in this specification to combine the implementation The a particular feature, structure, or characteristic of example description is included at least one embodiment.Therefore, through this specification in multiple places The phrase " selection example " of appearance, " in one embodiment " are not necessarily meant to refer to the same embodiment " in embodiment ".
Illustrated embodiment will be best understood by reference to attached drawing, wherein identical part is always by identical digital table Show.Be described below and be intended merely to be examples, and briefly illustrate with the consistent equipment of embodiment claimed herein, be Certain selected embodiments of system and process.
Ontology is used as the structural framing of organizational information and concept.Natural language understanding (natural language Understanding, NLU) be natural language processing (natural language processing, NLP) subset.NLU makes Structuring ontology is converted speech into algorithm.In one embodiment, the classification building that ontology is exported from NLU.NLU is provided Definition needed for constructing ontology according to class, subclass, domain, range, data attribute and object properties.Ontology individual is mapped to pair As.Handle data needed for same or similar document provides building ontology (also referred to as original body).Ontology is by engineering Model (MLM) definition is practised, machine learning model is stored by knowledge graph (knowledge graph, KG) manager application in data; The output construction ontology serviced using related NLP.More specifically, ontology is the fact that generated by MLM or to refer to generation.Thing Reality refers to and constitutes the individual of ontology.In one embodiment, ontology is the form of KG, wherein true or refer to the table in figure It is shown as node.The structure of KG can remain unchanged, while allow to add or delete information.Similarly, ontology can be used for creating The new and existing MLM of re -training.In one embodiment, when KG is modified, new entity and relationship are implemented and are used in combination In the training of automation MLM;MLM becomes dynamic and progress.It therefore, is to be mutually related by the ontology that KG and MLM is indicated.
With reference to Fig. 1, the schematic diagram of natural language processing system (100) is depicted.As shown, server (110) are provided, It is communicated by being connected to the network (105) with multiple calculating equipment (180), (182), (184), (186) and (188).Server (110) processing unit (112) are configured with, processing unit (112) is coupled to memory (114) operably by bus (116). The tool of knowledge engine (170) form is shown in the local of server (110), and is operatively coupled to processing unit (112) and/or memory (114).As shown, knowledge engine (170) includes one or more tools (172)-(178).Work Have (172)-(178) and calculates equipment (180), (182), (184), (186) and (188) from one or more by network (105) Natural language processing is provided.More specifically, calculating equipment (180), (182), (184), (186) and (188) via one or more A route and/or wireless data communication link communicate with one another and with other equipment or assembly communication, wherein each communication link can To include one or more of conducting wire, router, interchanger, transmitter, receiver etc..In the networked deployment, server (110) and network connection (105) can enable natural language processing and parsing for one or more content users.Server (110) other embodiments can in addition to those discussed here component, system, together with subsystem and/or equipment It uses.
Tool, including knowledge engine (170), or in one embodiment, be embedded including KG manager (172), The tool of accuracy manager (174), BC manager (176) and MLM manager (178), can be configured as and connect from multiple sources Input is received, includes but is not limited to the input for coming automatic network (105), from via interface (166) being operatively coupled to structuring One or more knowledge graphs of the node of the corpus of data (168)-diagram data storage (160), BC network (150) and one or The library (140) of multiple machine learning models (MLM).As shown, node-library of the diagram data storage (160) as knowledge graph (162) it runs, there are multiple KG, including KG0(164A),KG1(164B),and KGN(164N).Here the quantity of the KG shown It is not construed as limiting.Each KG is the expression of the ontology of concept.More specifically, each KG (162), (164) and (166) includes more A relevant theme and object.In one embodiment, relevant KG is stored in associated KG container, wherein corpus (160) one or more KG container is stored.In one embodiment, KG can also be obtained from other sources, and therefore, is described Data storage be not construed as limiting.
The multiple calculating equipment (180), (182), (184), (186) and (188) communicated with network (105) illustrate and are used for The access point that creator of content and content use.Calculate in equipment it is some may include equipment for database, the data The corpus of data is stored as the main body for the information that knowledge engine (170) uses by library, and includes tool in one embodiment (172)-(178) will determine sexual behaviour embedded system.Network (105) may include local network connection in different embodiments With long-range connection, tool (172)-(178) of such knowledge engine (170) and insertion may operate in the environment of arbitrary dimension In, it is including local and the whole world, such as internet.In addition, server (110) and knowledge engine (170) as provide document, The front end system service for a variety of knowledge extracted or indicated in network-accessible source and/or structured data source.Pass through this side Formula, some processes fill server (110) to server (110), and server (110) further includes input interface to receive request simultaneously It responds accordingly to.Creator of content and content user can also be in data repository (such as, but not limited to (140) and (160)) It can use, and the list of access point depicted herein is not construed as limiting.
As shown, node-diagram data storage (160) is operatively coupled to server (110).Node-diagram data is deposited Storing up (160) includes the library KG (162), which has the one or more KG (164A)-used for server (110) (164N).Content user can be managed via API or layout platform (as shown in Figure 2 and described) accesses system and warp The natural language input received by NLU input path.
As described in detail later, server (110) and knowledge engine (170) are by using one or more machine learning Model (MLM hereinafter) handles natural language querying, with the one or more stored in node-diagram data storage (160) Extraction or storage content in KG.Block chain technology, " BC " hereinafter are utilized in content true effectively to provide Property, for example, the source of storage or reception data.MLM manager (178) is used as tool, or in one embodiment, as knowing Know the API in engine (170), and for creating, linking and/or modifying associated MLM.As described further below, MLM is generated for specific knowledge domain, creates or is modified.Creation MLM be in order to from unstructured data extract entity and Relationship.These models are specially created to understand specific ken (such as biographic information, stock market, astronomy etc.).
BC is denoted herein as decentralization for recording transaction history and distributed digital ledger form BC network (150).More specifically, BC refers to a kind of data structure, can digitally be identified across distributed computer network (DCN) Affairs and the information is shared with tracking.BC is by the way that transparent and safely tracking ownership effectively creates distributed trust net Network.As shown and described herein, BC and MLM manager (178), accuracy manager (174) and KG manager are utilized together (172) with integrated knowledge and natural language processing.
Server (110) can be the IBM Watson that can be obtained from the International Business Machine Corporation (IBM) of New York ArmonkTMSystem System, is enhanced using the mechanism of illustrative embodiments described below.IBM WatsonTMKnowledge Management System inputs knowledge Natural language processing (NLP).Specifically, as described in detail below, when reception, tissue and/or storing data, the data It can be true or false.Server (110) cannot distinguish between itself, or more specifically, verify data accuracy.Such as Shown in herein, server (110) receives input content (102), assesses the input content (102) then to extract content (102) feature of content (102) is then applied to node map data storage (160) by feature.It specifically, can be by IBM WatsonTMThe received content (102) of server (110) processing institute, IBM WatsonTMServer (110) executes analysis to use The authenticities of input content (102) is assessed or notified to one or more reasoning algorithms.
In order to handle natural language, server (110) utilizes the information processing system and phase of knowledge engine (170) form NLP is supported in tool (172)-(178) of pass.Although illustrated as being included or be integrated into server (110), information processing System may be implemented as the individual computing system (such as 190) that across a network (105) is connected to server (110).No matter what Place is embodied, and one or more MLM are for managing and handling data, more specifically, for detecting and identifying natural language and create Or it is exported using certainty.As shown, tool includes KG manager (172), accuracy manager (174), BC manager (176) and MLM manager (178).MLM manager (178) is shown as being operatively coupled to the library MLM shown in this article (140), with multiple MLM, including MLM0,(142)、MLM1(144) and MLMN(146), although the MLM that shows and describe Quantity should not be regarded as limiting.It should be understood that in one embodiment, MLM is the algorithm for or suitable for support NLP.Although It is shown as local in server (110), but tool (170)-(178) can collectively or individually be embedded in memory (114) in.
One or more MLM (142)-(146) operate to management data, including the storing data in KG.As it is understood, KG is the ontology of structuring, and not merely storing data.Specifically, knowledge engine (170) is mentioned from non-structured data Evidence of fetching and one or more data relationships create entry for the data and data relationship extracted in KG, and by data It is stored in KG entry with data relationship.In one embodiment, the data in KG are stored or are expressed as node, and two numbers The side of two nodes of connection is expressed as according to the relationship between element.Similarly, in one embodiment, each node has node Rank accuracy value, and each relationship has relationship accuracy value, and wherein relationship accuracy value is mutually interconnected based on this two What the accuracy value of the node connect calculated.Other than data are extracted and are stored, MLM (s), MLM0, (142) are to be stored in KG Data distribution or otherwise specify accuracy value.In one embodiment, as described in detail later, accuracy value is The comprehensive score formed is fed back by steadfastness, source reliability and the mankind.In one embodiment, accuracy value may include other Therefore the subset of factor or factor should not be considered limiting.The accuracy value of distribution is stored in KG.Distribution Accuracy value also is stored in the entry in the BC ledger of identification.Each entry in BC ledger has corresponding mark Symbol, referred to herein as BC identifier identify ledger and ledger destination address.BC identifier is together with the data of identification It is stored in KG, and identifies the position of the accuracy value of corresponding BC ledger and storage.In one embodiment, KG manager (172) storage of the BC identifier in KG is managed.Therefore, the accuracy value distributed or created is stored in BC, and is node Diagram data stores the reproduction replica of the accuracy value in the KG in (160).
It should be appreciated that each KG tissue and offer structure are to a large amount of data.KG can be individual ontology, or one In a embodiment, KG KG container may include multiple KG for linking together to show their relationship or association.KG pipe Reason device (172) is run to manage the structure and tissue of KG.For example, the possible hell to pay of KG or valuableness that management is big.In this feelings Under condition, KG manager (172) can carry out subregion to KG, at least two subregions effectively be created, for example, the first KG subregion and the Two KG subregions.It can be based on one or more because usually dividing KG.For example, in one embodiment, it can be main by theme or son Topic divides KG.Similarly, each of expressed in KG it is true there is relevant accuracy value, which is the group of Multiple factors It closes, including but not limited to stability indicator, source degree of reiability and mankind's feedback factor.KG manager (172) can be based on Accuracy value divides KG, or in one embodiment, based on including the one or more of accuracy value because usually dividing KG. In one embodiment, after KG is already divided at least the first and second subregions, KG manager (172) can will be accurate One or more components of angle value are assigned to each node indicated in subregion or side.For example, after KG subregion, KG manager (172) it can fill and the first reliability value is distributed into the data in the first subregion, and in one embodiment, KG management The data distributed to the second reliability value in the second subregion, second reliability can further be filled and be distributed to device (172) Value and first reliability value difference.Accuracy value is effectively changed to the modification of one or more components of accuracy value. It will be appreciated, however, that the value of one or more components of accuracy value can change over time, and therefore, the variation is by phase It closes data reflection or embodies.Therefore, KG manager (172) is for managing data and providing structure and value for data.
One function of KG manager (172) is to link or merge two or more KG.Merge or link KG is subregion KG Reverse side.Merge or the functional requirement KG manager (172) of link KG is by one or more data elements and the in a KG One or more data elements in two KG compare, with elimination or the appearance of at least reduction repeated data.As described above, KG Each data element of middle expression has relevant composite score.KG manager (172) can by a component, multiple components, Or accuracy value itself is used as a factor to use for data to compare and assess.Once merging or linking, repeat number is deleted It may be feasible or guaranteed according to item.KG manager (172) selectively removes the link for being confirmed as repeated data Data in KG.A feature for removing repeated data is to be able to maintain that the constant structure of KG.Therefore, KG manager (172) For managing the structure of KG by the data indicated in management KG.
BC manager (176), which has, is related to multiple functions of machine learning environment.As described above, BC manager (176) It can be used to safeguard the authenticity of related data together with MLM.BC manager (176) generate for BC network interaction contract, Source, retrieval BC information, and all BC interaction of management system are provided.
MLM,MLM0, (142) manage the assessment of NL input.The query result of KG by generating from NL input is (more specific Ground, the sequence of query result) identify conflict associated with NL input or mistake.It is rushed when existing between query result and NL input When prominent, if query result has strong accuracy value, show that NL input may be inaccuracy.Accuracy manager (174) The language of NL input is replaced by the triple of mark or selection in the list with generation to correct NL input.Triple ( This also referred to as remembers) based on the relationship between two or more nodes and two or more nodes in KG.One In a embodiment, as captured from KG, triple is main body-verb-object relationship.In one embodiment, it identifies or selects It can be selected in one embodiment by user based on highest accuracy value.Similarly, in another embodiment, identification or Selection can be based on the one or more factors for constituting compound accuracy value.When knowledge engine (150) identification and one in list The associated immutable factor of a or multiple entries and when further identifying the conflict between immutable factor and NL input, can Another form of conflict can occur.Accuracy manager (174) passes through with the entry associated three with the immutable factor The language of tuple replacement NL input solves the conflict to correct NL input.Other than conflict, another solution can be with Identification NL input is matched with the part between sorted lists entry in accuracy manager (174).Part matching enables or refers to Show that KG manager (172) and BC manager (176) create new entry in KG and corresponding BC ledger respectively for NL input.Separately Outside, KG manager (172) connects new entry and corresponds to the matched existing KG entry in part.It should also be understood that NL input may not Any matching can be generated, for example, empty set.If it does not match, KG manager (172) and BC manager (176) are respectively created pair The new KG entry and BC ledger entry that should be inputted in NL.Accordingly, it is considered to the data organized into KG, NL is inputted by MLM, MLM0, (142) processing, and handled in one embodiment by accuracy manager (174).
As shown in this article and described, the library MLM (140) are operatively coupled to server (110) and including more A MLM is to support the natural language processing on AI platform.One or more of MLM can be dynamically, and be trained to fit It is fitted on new entity or relationship.Different KG can be associated with different knowledge domains.For example, can be based on and KG0(164A's) Alignment identifies or selects the first MLM, MLM from library (140)0(142).In response to processing NL input, KG can be directed to0(164A) is answered Use MLM0(142) and the 2nd KG, KG are directed to1MLM0 (142) are used alone in (164B).MLM manager (178) processing comes from two The result of a KG and its corresponding accuracy value, and it is based on the processing, identify the modification of one of KG.In one embodiment In, accuracy value is assessed to identify the authenticity of modification.According to certification, MLM manager (178) is dynamically modified associated MLM, MLM0(142).In one embodiment, the modification identified can be the extension of associated data set to include adding Field.Similarly, in one embodiment, MLM manager (178) can determine that modification is synchronic or lasts, and make Use the classification as the element of supervision modification.In one embodiment, MLM0(142) modification causes to create new MLM, example Such as, MLMN(146), and in one embodiment retain original MLM, MLM0(142).Therefore, the library MLM (140) can basis The dynamic of MLM is modified and is extended.
The range of the type of the information processing system of system (110) be can use from small hand held devices (such as hand-held Computer/mobile phone (180)) arrive mainframe system (such as mainframe computer (182)).Handheld computer (180) Example includes personal digital assistant (PDA), personal entertainment device (such as MP4 player), portable television and Disc player.Letter Other examples of breath processing system include pen or tablet computer (184), on knee or notebook computer (186), personal meter Calculation machine system (188) and server (190).As shown, computer network (105) can be used by various information processing systems Networking is together.Can be used for interconnecting the computer network (105) of various information processing systems type include local area network (LAN), WLAN (WLAN), internet, public switch telephone network (PSTN), other wireless networks, and can be used for interconnection information Any other network topology of processing system.Many information processing systems include non-volatile data storage, such as hard drive Device and/or nonvolatile memory.Individual non-volatile data storage can be used (for example, clothes in some information processing systems Device (190) are engaged in using non-volatile data storage (190a), and mainframe computer (182) utilizes non-volatile data storage (182a).Non-volatile data storage (182a) can be a component, it is located at the outside of various information processing systems, or The inside of information processing system can be located in one of.
Information processing system can take various forms, and some of forms are shown in FIG. 1.For example, information processing system System can use desktop computer, server, portable computer, laptop computer, laptop or other forms factor computer Or the form of data processing system.In addition, information processing system can use other forms factor, such as personal digital assistant (PDA), game station, ATM machine, portable telephone apparatus, communication equipment or the other equipment including processor and memory.
In the art, application programming interfaces (Application Program Interface, API) are understood to two Software intermediary agency between a or multiple application programs.About the NL processing system for showing and describing in Fig. 1, one can use A or multiple API support one or more of tool (172)-(178) and its associated function.With reference to Fig. 2, provide The block diagram (200) of NL handling implement and its correlation API is shown.As shown, multiple tools are embedded in knowledge engine (205), Wherein tool includes and API0(212) associated accuracy manager (210) and API1(222) associated KG manager (220) and API2(232) associated BC manager (230), and and API3(242) associated MLM manager (240). Each API can be realized with one or more language and interface specification.API0(212) provide assets compare, accuracy is determining, Accuracy decision and accuracy distribution;API1(222) KG creation is provided, updates and deletes;API2(232) MLM is provided to create, more New and deletion;And API3(242) creation of BC contract, block creation, network communication and block addition are provided.As shown, API (212), each of (222), (232) and (242) are operatively coupled to API coordinator (250), also referred to as orchestration layer, It is understood to be used as a level of abstraction in the art, for pellucidly individual API to link together.Implement at one In example, it can combine or combine the function of independent API.Therefore, the configuration of API shown here is not construed as limiting.Therefore, As shown here, the function of tool can be embodied or be supported by their own API.
In order to provide the additional detail for improving the understanding to the selected embodiment of the disclosure, referring now to Figure 3, it shows The process of the form for initializing KG is gone out.When system initialization, KG is sky.Creation utilizes MLM from unstructured data Middle extraction entity and relationship.Creating MLM is to understand specific ken, i.e. biographic information, financial market, scientific domain Deng.Representative data is used to instruct the text of entity defined in system identification model and relationship.With reference to Fig. 3, process is provided Scheme (300), which depict the processes for exporting filling KG from the natural language of NLP system.It initializes and filled as KG A part of journey specifies the accuracy value of the triple of extraction.Accuracy value includes steadfastness indicator, the instruction of source reliability Symbol and mankind's feedback indicator.In one embodiment, each indicator including accuracy value is in the scale between 0 and 1 Numerical value.Steadfastness indicator reflects the certainty of the potential fact.In one embodiment, firm angle value 1 reflects the fact Certainly be it is genuine, value 0 reflects that the fact is false certainly, and certainty or not of the value expression between 0 and 1 about the fact Deterministic level.Source reliability factor and source (for example, true source, including but not limited to determine true data and when Between) associated.The affirmative of mankind's feedback indicator tracking fact and the quantity refuted.In one embodiment, which tracks The quantity of response.Therefore, when KG is initialised and fills data, selection or setting accuracy value component with distribute to via The triple that NLP system is extracted.
Classification that is such as synchronic and lasting information is for describing the data that keeps constant or can change over time respectively.? In the exemplary situation of supervised training, 1 is set by stability value, sets 1 for source reliability value, and mankind's feedback is set It is set to 0.These values are only examples, and can change in one embodiment.In one embodiment, KG application programming interfaces (API) platform for specifying accuracy value is provided.As shown, creating MLM (302) by defining entity and relationship.Make With representative data training MLM (304).After step (304), MLM is used together to extract three from training data with NLP Tuple (306).The triple of extraction can be saved in file or stream transmission.In one embodiment, the triple of extraction is Main body-verb-object relationship.After step (306), KG (308) are filled using the triple of extraction.In one embodiment In, KG API, which is used to export from NLU, reads and parses triple.In one embodiment, the triple being filled into KG is claimed For memory.MLM is created by training, and MLM is applied to data to fill KG later.Therefore, MLM is together with NLP from data It is middle to extract triple and fill previously empty KG.
For each main body-entity (310) extracted from NLP output, determine that main body-entity whether there is in associated KG In (312).After the positive response of the determination of step (312), it is determined whether exist associated with the main body-entity extracted Known relation (314).If the response to the determination of step (314) is affirmative, it is determined that main body-entity whether be associated with The accuracy value of system and distribution is presented in KG (316) together.Positive response to the determination of step (316) is existed in KG The instruction of main body-entity relationship, and the process terminates.However, to any shown in step (312), (314) and (316) After one determining negative response, the entry (318) of new triple and new triple is created in KG.Therefore, as shown, MLM from NLP document for extracting data and accessing KG manager selectively to fill KG with the data extracted.
With reference to Fig. 4, flow chart (400) are provided, which depict the processes that new triple is created from the data of extraction. As described in Fig. 3, creation or the accuracy value component for assigning the data extracted.In one embodiment, it is based at the beginning of with KG The relevant supervision of beginningization creates accuracy value component.For each new triple, such as main body-verb-object relationship, to three Tuple assigns accuracy value (402).In one embodiment, accuracy value is assigned via KG API.After step (402), Entry (404) are created in corresponding or specified BC ledger.More specifically, it is accurate that BC entry stores triple in step (404) Angle value, and create identifier, hereon referred to as then BC identifier retrieves the identifier.In one embodiment, it is retrieved BC identifier is uniform resource identifier (URI) or other unique asset identifiers.After step (404), by new triple (406) are inserted into KG together with associated BC identifier.In one embodiment, in step (406), KG API realizes ternary The insertion of group and associated BC identifier.Therefore, as indicated, the accuracy value of each new triple is stored in corresponding BC points In class account, and associated BC identifier is stored or otherwise associated with the triple in KG entry.
The process being shown and described in Fig. 3 and 4 can be also used for from use unsupervised training (such as data may be can not Letter) or the natural language output of the NLP system of supervised training is used to fill KG.It is as shown in Figures 3 and 4 and described, it uses Accuracy value is arranged to be directed to from the data that NLP output is extracted in KG API.Depending on resource, which can be set to refer to Show uncertainty.For example, loyalty indicator can be set to 0.5, and source reliability can be determined as 0.5 in one embodiment, And the mankind, which feed back to, can be set to 0.Therefore, unsupervised training can reflect in different accuracy value set.
During handling non-training data data, if not finding accurate triple matching, new memory is created simultaneously It stores it in corresponding or identification KG.When considering to handle multiple documents on same subject, this be may be implemented.For example, One document, which can identify, has the fact that the first date, and the second document can identify the mutually colleague with the second date It is real.But only one date is in fact correct.As shown in Figures 3 and 4, each triple for inputting KG has accordingly Accuracy value is used as the indicator of the correctness of the memory of storage.These accuracy scores can be used for determination and be filled into KG The true accuracy and/or correctness of conflict.
With reference to Fig. 5 A and 5B, flow chart (500) are provided, which depict the mistakes for extracting triple in exporting from NLP Journey.As indicated, showing inquiry or statement (502) to KG by accuracy manager.Displaying may there are many reasons, including but not It is limited to true verification.MLM is used to extract triple (504) from KG together with NLP, and KG API is used for from NLP output It reads and parses triple (506).Following table illustrates an example triple:
Main body-entity Relationship Main body-entity-value
George Washington It is born in On 2 22nd, 1832
Table 1
After step (506), variable X is assigned to the quantity of the triple of parsingTotal(508).Then X is determinedTotalWhether Greater than zero (510).Extraction process (512) are terminated to the negative response of the determination of step (510), since this is that inquiry generates empty set Instruction.However, handling the triple (514) of parsing after to the active response of the determination of step (510).Triple counts Variable is set as 1 (516), and for each triple X, inquires KG to obtain the institute with same body-entity and relationship There are triple (518).As shown in Figures 3 and 4, each triple has relevant BC identifier.BC identifier is corresponding for accessing BC ledger and obtain the triple accuracy value (520) of storage.After step (520), it is incremented by triple counting variable (522).Then determine whether to have been processed by the triple (527) of each identification.Negative response to the determination of step (522) Later, return step (518).Similarly, terminate the mistake of inquiry KG and corresponding BC ledger entry to determining positive response Journey (526), and (528) are ranked up to the triple extracted and processed.(528) sequence at is for being placed in one for triple In a sequence.For example, in one embodiment, can be fed back by stability indicator, source reliability and the mankind to ternary Group carries out ascending sort.Similarly, collating sequence can be customized, to adapt to specific use-case.For example, implementing at one In example, mankind's feedback indicator can be paid the utmost attention to.Therefore, triple extraction and application KG obtains or identifies triple and correlation BC identifier, be used to obtain relevant accuracy value, be then used as classification triple feature.
Following table 2 is the extension of table 1, and which show the sequences of the example of two triples:
Table 2
In the example of table 2, there are two triple entries, each related to different theme-entity values.As indicated, item Mesh is arranged by stability indicator or source reliability indicator ascending order.The ranking factor should not be considered as limitation.Implement at one In example, sorting, it is opposite and descending to can be, or the different component based on accuracy value.The 1st in this example Tuple entry is by main body substantial definition, and relationship is considered to have maximum accuracy value, such as accuracy score.
The explanation of business case driving query result.For example, if realizing has the triple of high confidence score, System can be configured as the automatic original value with the replacement main body entity value of the value with high accuracy score.Stability instruction Symbol reflects the accuracy of return information.As shown, business use-case is applied to search result after step (528) (530).After the application of step (530), inquires KG and BC identifier corresponding in KG is associated appropriate or identification BC ledger (532).The inquiry of step (532) obtains the related relationship of institute and main body-entity value.More specifically, this makes Analytical examination can be carried out to all data of main body-entity by obtaining.After step (532), enhancing NLP input or output number According to (534).The example of enhancing includes but is not limited to: correction, analysis, enhancing and masking.Correction includes the data for Self-memory Replace main body entity value.In one embodiment, replacement is local, for example, not being reflected in KG or BC for inquiry In.Analysis includes main body-relationship-value list that addition has accuracy.Enhancing includes with the institute with highest confidence level There is known main body-relationship-value complement to fill as a result, for example, each main body-relationship is to a value.Masking includes deleting from NLP output One or more triples.After step (532), enhancing data are returned.Therefore, it is alternatively possible to use different use-cases The explanation of search result is driven, can also be enhanced, returns to one or more data elements appropriate to input from NLP.
It is as shown in Figure 5A and 5B and described, it can be for the one or more inquiries of KG processing of creation.It is understood , KG operates to tool with a group organization data, and each triple reflected in figure indicates or be otherwise associated with accuracy Dispersed component, for example, stability, reliability and feedback.It is understood that, one or more accuracy score components can be dynamic State, such as value is with time change.In the KG of selection, this variation can be it is unified, to influence to indicate in KG Each triple or this variation can be skimble-scamble and selectively influence one or more triples in KG.
With reference to Fig. 6, flow chart (600) are provided, which depict the processes for dividing one or more KG.Here it shows The example of division be the variation based on reliability factor.This is only example, and in one embodiment, and division can be with base In the variation of stability or feedback factor.Reliability factor reflects the measurement of the reliability in the source of data.Reception reliability because Subvalue (602).In one embodiment, reliability factor value is a part via the NL input and feedback of KG API.Inquire KG With reliability value (604) that identify to receive relevant entry.It next determines whether to identify any KG entry (606).It is right The negative response of the determination of step (606) finishes partition process, carries out because being not based on received reliability factor to KG The basis (616) of constraint.However, after to the positive response of the determination of step (606), create the division in KG (608) and Created subregion (610) is filled with the entry in KG with identified reliability value.The partition creating of step (608) is effective Ground creates the second subregion (612) filled with the remaining entry in original KG.
It should be understood that the entry in the first and second subregions of KG has different reliability factor values.As described above, accurate Angle value is used as the combination of stability, reliability and value of feedback.The change of any single component value can all have an impact combination, this Any query result may be will affect.After step (612), the Accuracy evaluation in KG (including the first and second subregions) is carried out (614).The assessment of step (614) include the data (for example, first data) filled in first KG subregion with wherein second point The comparison for the data (for example, second data) filled in area.In one embodiment, accuracy is executed automatically after subregion Assessment.It is appreciated that the data filled in the first subregion will have the accuracy value different from the data in the second subregion.This In variation of the division based on the component indicated in accuracy value that shows.It in one embodiment, can be at two or more In a accuracy value component or to being divided in the change of component.Therefore, any one component including accuracy value changes Change may include one or more subregions of creation association KG.
As shown in fig. 6, KG may be divided.It can be by linking or being otherwise coupled to two or more K and phase Associated BC ledger, occurs opposite concept.With reference to Fig. 7, flow chart (700) are provided, it illustrates for linking two The process of KG and associated BC ledger.In one embodiment, it can connect at least slightly relevant KG.The relationship can Based on perhaps relationship in being indicated in KG.As indicated, inquiry is presented to knowledge base (702), and identify two or more A KG (704).In one embodiment, KG API identifies that two KG include data associated with the query.Similarly, at one In embodiment, KG API can identify more than two KG, therefore, the quantity property of should not be construed as limiting of the KG of identification 's.(706) are established the link between the KG identified.The link of two or more KG maintains the structure of independent KG, i.e. structure It remains unchanged.
It should be appreciated that the relationship between KG, specifically, wherein the data indicated can provide the triple with conflict Query result, for example, memory.In order to solve potentially to conflict, the assessment to the KG of link is carried out to compare data element (708).More specifically, this compares the assessment (710) of the data including indicating in the KG to each link, including they are corresponding Accuracy value component.Based at least one accuracy value component (such as stability, reliability and feedback), institute is selectively replaced The colliding data element (712) of identification.Replacement follows the structure of individual KG.In other words, the node in KG will not be deleted Or addition link.But the data indicated in identified node can be replaced.Therefore, the conflict entry in the KG of link is replaced Alleviate the query result of conflict.
With reference to Fig. 8 A and 8B, flow chart (800) are provided, it illustrates enhance inquiry input using MLM.More specifically Ground, the result for inquiring submission can indicate the mistake for being directed toward inquiry input.As indicated, receiving and handling natural language input (802).Based on context the received input (804) of inquiry institute, including the specified KG of one or more, and in one embodiment In, it further include corresponding BC ledger.Query processing is generated in the form of one or more triples as a result, for example remember, should Triple is (806) extracted or identified from specified KG.As described above, each triple includes main body, object and association Relationship.Variable XTotalIt is assigned to the quantity (808) for the triple extracted or identified from KG.Then determination is mentioned in step (808) Whether the amount taken includes at least one triple (810).After the positive response of the determination of step (810), initialization is related The triple counting variable (812) of connection.Each triple has the BC identifier corresponding to BC ledger entry comprising with this Triple is associated or assigned to the accuracy value of the triple.For the triple of each extraction or identification, such as triple X is obtained BC identifier (814), is therefrom inquired BC ledger and is identified corresponding accuracy value (816).Step (816) it Afterwards, it is incremented by triple counting variable (818), and is assessed each extraction or identification to determine whether to have had evaluated KG(820).After the negative response of the determination of step (820), return step (814), and positive response terminates triple The process extracted and identified.Therefore, for determining each triple associated with inquiry input, relevant accuracy is identified Value.
After the negative response of the determination at step (810), new three of the creation for the entry in associated KG Tuple (822).New triple corresponds to the natural language input (for example, inquiry is submitted) received, and accuracy score It is assigned to new triple (824).In addition, creation is corresponding to the entry (826) in the BC ledger of KG.Creation is classified with BC The relevant BC identifier of account entry, and the accuracy score that it is stored into (828) together with the new triple in KG, and is distributed It is stored in corresponding ledger entry (830).Therefore, cause from one group of sky triple that inquiry input returns to KG and corresponding BC ledger increase.
It should be appreciated that inquiry is submitted can be from pass as by being identified to the positive response of the determination at step (820) The KG of connection returns to the response of one or more triple forms.After the identified triple that handled and sorted (832), MLM enhances natural language input to correspond to the sequence (834) of identified triple.Enhancing can use one or more shapes Formula.For example, in one embodiment, conflict (836) of the enhancing between the triple that natural language inputs and sorts.Work as knowledge When other entry/exit conflicts, the enhancing of MLM is correct triple (838) to be identified from sequence, and modify NL input to correspond to identification Triple (840) form.The identification of step (838) can use different forms.For example, in one embodiment, knowing It can be based on associated accuracy value, as described above, the accuracy value is comprehensive score.Similarly, in one embodiment In, ternary Groups List can be ranked up as ranking factor using the one or more components for including accuracy value.? In another embodiment, sequence can based on it is associated with triple entry can not variable factor, be based on the immutable factor pair Triple is ranked up.Therefore, enhancing can be based on the conflict identified.
It should be appreciated that enhancing can take other shapes in response to matching (or in one embodiment, part matches) Formula.When enhancing is derived from the matching between at least one triple in natural language input and sequence (842), in corresponding KG Entry and BC ledger entry (844) of the middle creation for natural language input.Similarly, defeated from natural language when enhancing When entering the part between the triple that (846) are identified at least one and matching, the new triple entry in correlation KG is created (848).New triple corresponds to the received NL input of institute, for example, inquiry is submitted, and matches accuracy score for new ternary component (848).In addition, the entry (850) in creation BC ledger corresponding with KG.Create BC mark associated with BC ledger entry Know symbol, and it is collectively stored in KG (852) with new triple, and the accuracy score of distribution is stored in corresponding point In class account entry (854).In addition, the new triple entry in KG is connected (856) with the triple by part match cognization.Cause This, as indicated, the matched increase of matching or part, which is included in corresponding KG and relevant BC ledger, creates entry.
As shown in Fig. 3-8B and described, MLM is used for inquire the form submitted and support natural language processing to identify storage Data in KG, and submitted in one embodiment for enhancing inquiry.It should be understood that MLM is dynamic, it may occur that Variation.KG can be used for creating one or more new existing MLM of MLM and/or re -training.When ontology is modified, realize New entity and relationship.Then it can use the new information to automate the training of MLM, to support dynamically and gradual MLM, creation new MLM or the existing MLM of enhancing.
With reference to Fig. 9, flow chart (900) provides the process for training existing MLM.In process depicted herein, exist The library NLP of MLM.Based on its with KG (referred to here as the first KG) in being aligned for knowledge domain expressed identify or select in library MLM (referred to here as the first MLM) (902).It is directed to the natural language input of the first KG inquiry in response to receiving, identifies or selects The first MLM selected handles inquiry input and extracts one or more triples (904) from the first KG.In addition, the 2nd KG of identification (906), and in one embodiment, the 2nd KG is related to the first KG.MLM handles identical inquiry using the 2nd KG, and One or more triples (908) are extracted from the 2nd KG.The each triple extracted at step (904) and (908) is herein In also referred to as remember, and including main body, object and relationship.As described above, each triple has associated BC identifier, Indicate the BC ledger of the corresponding accuracy value of storage.After step (908), the triple of each extraction is handled to identify storage Related accuracy value (910) in its corresponding BC ledger entry.Assess and compare the triple and the 2nd KG of the first KG Triple (912).More specifically, whether the content and/or structure of the first KG of assessment assessment of step (912) undergo modification, Such as (914) reflected in the 2nd KG.For dynamic modification MLM, determine two main body KG whether have relevant structure and Content.It can prove to modify by comparing the triple returned from the first and second KG.To the negative of the assessment of step (914) Response terminates MLM modification (922).However, to being content and/or structure change after the active response of the assessment of step (914) Identification (916).In addition, assessing corresponding accuracy value to verify the authenticity (918) of variation.Based on testing for step (918) Card, the structure of MLM are subjected to dynamic and modify (920).
The modification of step (920) can use different forms.For example, in one embodiment, with the first KG entry phase Than the modification of MLM can meet the verified change reflected in the 2nd KG entry.In another embodiment, modification can With the assessment of the correspondence accuracy value based on the data to extraction.Therefore, the variation based on KG proves that MLM may be become It is dynamic.
Furthermore, it is to be understood that the data and correlativity that indicate in KG can be synchronic or last information.It can be in step (912) classification is imported in assessment.It should not change and the data for being proved to modification should not be reflected in MLM modification.Cause This, can sort data into and import data assessment and relevant MLM assessment.
With reference to Figure 10, flow chart (1000) are provided, it illustrates the processes for progressive and adaptive M LM configuration.KG The KG that API periodically searches for being associated with or identifying is to find new entity, relationship and data (1002).Identification in step (1002) By the data of the entry in inspection KG and/or time or will can include in the entity in existing MLM and relationship and KG Data are compared to complete.Generate the list of entity and relationship for existing in KG and being not present in interested MLM (1004).The list is generated with the format that the training tool for generating MLM can be used.Stream transmission can consumption data with more The structure (1006) of new existing MLM.In one embodiment, KG API generates the language statement from KG, expression each three Tuple, the triple can then be fed to MLM for training.After step (1006), using the MLM of update as new MLM (1008) are stored in the library MLM.In one embodiment, gradual MLM configuration is incremental, because it indicates existing MLM's Increment variation.Increment machine learning functionality can be by the structure synchronization of MLM and KG.Continuous or incremental changes are executed on target MLM, So that with each incremental changes, the ability that MLM extracts data from KG increases, and MLM is effectively adapted to.
System shown here and flow chart are also possible to set with the computer program that intelligence computation machine platform is used together Standby form, in order to NL processing.The equipment has the program code realized by it.Program code can be executed by processing unit with Support described function.
As shown in the figure and described, in one embodiment, existing KG and corresponding is searched in processing unit support in corpus The function of the evidence of MLM and corresponding BC ledger and associated entry.Compound accuracy score limits and/or quantization phase Data are closed, and the weight for carrying out one or more assessments is provided.Accuracy score and the associated component in corresponding BC ledger It is recorded as data and provides authenticity.Each entry in result set is all based on corresponding accuracy score and is assessed.Such as this Described in text, KG is modified, including subregion and link, and by accuracy score component distribute to expression or distribute to one or The data of multiple selection KG.Similarly, as described herein, MLM can be dynamically adjusted to reflect the knot to one or more KG Structure variation.More specifically, MLM adapts to new entity and entity relationship.
It should be appreciated that generating and enhancing disclosed herein is a kind of dynamic MLM for by using memory and external learning System, method, apparatus and computer program product.As disclosed, the system, method, apparatus and computer program product It is handled using NL to support MLM, and support KG persistence for MLM.
Although the particular embodiment of the present invention has been shown and described, to those skilled in the art show and Be clear to, be based on teachings herein, can not depart from the present invention and its broader aspect in the case where be changed and Modification.Therefore, appended claims include within its scope all these variations in true spirit and scope of the present invention and Modification.Moreover, it should be understood that the invention is solely defined by the appended claims.It will be understood by those skilled in the art that if being intended to draw Enter the claim element of certain amount of introducing, then will clearly state this intention in the claims, and do not having In the case where such narration, such limitation is not present.For non-limiting example, in order to help to understand, power appended below Benefit requires the use comprising introduction phrase "at least one" and " one or more " to introduce claim element.However, these The use of phrase is not necessarily to be construed as implying: the claim element introduced by indefinite article " one (a) " or " one (an) " Any specific rights requirement of claim element comprising this introducing is limited to hair only comprising such element It is bright, even if identical claim includes introduction phrase " one or more " or "at least one" and indefinite article (" one (a) " or " one (an) ");It is also such for the use in the claim of definite article.
The present invention can be system, method, and/or computer program product.In addition, selected aspect of the invention can adopt With complete hardware embodiment, complete software embodiment (including firmware, resident software, microcode etc.) or integration software and/ Or the form of the embodiment (referred to generally herein as circuit, " module " or " system ") of hardware aspect.In addition, each side of the invention Face, which can use, is embodied in the calculating with the computer-readable program instructions for making processor execute each aspect of the present invention The form of computer program product in machine readable storage medium storing program for executing.In this way, implementing, disclosed system, method, and/or meter Calculation machine program product can be operated to improve the function and operation of machine learning model based on accuracy value and using BC technology.
Computer readable storage medium can be tangible device, can retain with store instruction for instruction execution equipment It uses.Computer readable storage medium can be, and such as, but not limited to, electronic storage device, magnetic storage apparatus, optical storage are set It is standby, electric magnetic storage apparatus, semiconductor memory apparatus or any appropriate combination above-mentioned.Computer readable storage medium has more The exemplary non-exhaustive list of body includes the following contents: portable computer diskette, hard disk, dynamic or static random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), magnetic storage apparatus, portable light Disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding device (such as are remembered on it Record has punched card or bulge-structure in the groove of instruction) and foregoing teachings any appropriate combination.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-such as Java of object-oriented, Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user's computer, partly be executed on the user's computer, as one A independent software package executes, part executes on the remote computer or completely long-range on the user's computer for part It is executed on computer or server or server zone.In situations involving remote computers, remote computer can be by appointing The network-of meaning type includes that local area network (LAN) or wide area network (WAN)-are connected to the computer of user, alternatively, can connect To outer computer (such as being connected using ISP by internet).In some embodiments, pass through benefit With the status information of computer-readable program instructions come personalized customization electronic circuit, such as programmable logic circuit, scene can Programming gate array (FPGA) or programmable logic array (PLA), the electronic circuit can execute computer-readable program instructions, from And realize various aspects of the invention.
Referring herein to according to the method for the embodiment of the present invention, the flow chart of device (system) and computer program product and/ Or block diagram describes various aspects of the invention.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
It will be appreciated that though the particular embodiment of the present invention is described herein for illustrative purposes, but not Various modifications can be carried out in the case where disengaging the spirit and scope of the present invention.Particularly, natural language processing can be by difference Computing platform or execute across multiple equipment.In addition, data storage and/or corpus can be it is localization, long-range or It is distributed across multiple systems.Therefore, protection scope of the present invention is only limited by appended claims and its equivalent.

Claims (27)

1. a kind of computer system, comprising:
It is operatively coupled to the processing unit of memory;
The artificial intelligence platform communicated with the processing unit and the memory;
The processing unit is operatively coupled to the knowledge engine of training machine learning model MLM, comprising:
First MLM, the first MLM and the first knowledge are selected from the natural language NL of machine learning model MLM processing library The knowledge domain alignment expressed in figure KG;
It receives NL to input and be directed to the first KG inquiry input, and extracts one or more ternarys from the first KG Group;
Selected MLM is applied to twoth KG different from the first KG, and extracts one or more from the 2nd KG A triple, wherein each triple includes main body, object and relationship;
For the triple of each extraction:
Obtain block chain BC identifier relevant to each triple;And
Triple accuracy value is identified from corresponding BC ledger;
Detection carrys out the modification since the first KG of the 2nd KG one or more triples extracted, wherein modification choosing From: content and structure, and combinations thereof;
The modification detected is assessed, assesses the accuracy of the modification detected including using the BC identifier of acquisition;And
Dynamically enhance the first MLM in response to received NL input.
2. the system as claimed in claim 1, wherein the modification detected is content, and further includes that the knowledge is drawn It holds up to classify to the modification detected, wherein the classification is selected from: synchronic and last.
3. system as claimed in claim 2, wherein modification detected is classified as colliding data, and further includes described Knowledge engine with using the first and second data assessment accuracy value, and according to the accuracy value of the assessment limitation described in The modification of first MLM.
4. system as claimed in claim 2 further includes the contribution factor that the classification is used as to the modification assessment.
5. the system as claimed in claim 1, wherein the dynamic modification of the first MLM includes that the MLM creates new MLM.
6. a kind of method for handling natural language, comprising:
First MLM, the first MLM and the first knowledge are selected from the natural language NL of machine learning model MLM processing library The knowledge domain alignment expressed in figure KG;
It receives NL to input and be directed to the first KG inquiry input, and extracts one or more ternarys from the first KG Group;
Selected MLM is applied to twoth KG different from the first KG, and extracts one or more from the 2nd KG A triple, wherein each triple includes main body, object and relationship, and for the triple of each extraction:
Obtain block chain BC identifier relevant to each triple;And
Triple accuracy value is identified from corresponding BC ledger;
Detection carrys out the modification since the first KG of the 2nd KG one or more triples extracted, wherein modification choosing From: content and structure, and combinations thereof;
The modification detected is assessed, assesses the accuracy of the modification detected including using the BC identifier of acquisition;And
Dynamically enhance the first MLM in response to received NL input.
7. method as claimed in claim 6, wherein the modification detected is content, and further include:
Classify to the modification detected, wherein the classification is selected from: synchronic and last.
8. the method for claim 7, further including the contribution factor that the classification is used as to the modification assessment.
9. the method for claim 7, wherein modification detected is classified as colliding data, and further include: it utilizes The assessment accuracy value of first and second data, and limit according to the accuracy value of assessment the modification of the first MLM.
10. method as claimed in claim 6, wherein the dynamic modification of the first MLM includes that MLM creation is new MLM。
11. a kind of computer system, comprising:
Processing unit is operably connected to memory;
Artificial intelligence platform is communicated with the processing unit and the memory;
The knowledge engine communicated with the processing unit, to utilize machine learning model MLM manager, comprising:
Receive natural language NL input and based on context inquiry input, wherein context include specific knowledge graph KG and accordingly Block chain BC ledger;
One or more triples are extracted from the specific KG, wherein each triple includes main body, object and relationship;
Obtain BC identifier;
Identify corresponding accuracy value in the BC ledger;
It uses identified accuracy value to generate ternary Groups List, and is ranked up based on the ternary Groups List that factor pair generates; And
The MLM manager inputs to enhance one or more MLM using received natural language.
12. system as claimed in claim 11, further includes: the knowledge engine is to identify the NL input and column generated The conflict between entry in table, and further include the knowledge engine by with three identified in list generated Tuple replacement received NL input to correct.
13. system as claimed in claim 11, further includes: the knowledge engine is to identify the NL input and column generated The matching between at least one triple in table, and further include knowledge engine to create what the NL was inputted in the KG Entry and corresponding BC ledger.
14. system as claimed in claim 11, further includes: the knowledge engine is to identify the NL input and column generated The conflict between entry in table, and further include the knowledge engine with the selection component of the accuracy value identified to institute The list of generation is ranked up, and returns to the triple in the sorted lists corresponding with selected accuracy value component.
15. system as claimed in claim 11, wherein the knowledge engine identification can not variable factor and the NL input with In the list with it is described can not conflict between associated at least one entry of variable factor, and further include that the knowledge is drawn Hold up, with from have it is described can not variable factor and for the corresponding BC identifier of the triple of return list of entries return it is relevant Triple.
16. further including system as claimed in claim 11, the knowledge engine to identify the NL input and column generated Part matching between at least one triple in table, and further include the knowledge engine to create new item in the KG Mesh and corresponding BC ledger, and the new entry of creation is connected with the matched entry in part is corresponded to.
17. system as claimed in claim 11, wherein ternary Groups List generated is empty, and further include:
The knowledge engine matches accuracy to create new triple corresponding with received NL input, for the ternary component of creation Score is the new triple creation entry in KG, and is that new triple creates corresponding entry in BC ledger.
18. system as claimed in claim 17, further includes: knowledge engine, being used for will BC associated with BC ledger entry Identifier stores together with the new triple in KG, and the accuracy score distributed is deposited together with BC ledger entry Storage.
19. a kind of method for handling natural language NL, comprising:
It receives nature language in-put and is inputted for described in Context query, wherein context includes specific knowledge graph KG and phase The block chain BC ledger answered;
One or more triples are extracted from the specific KG, wherein each triple includes main body, object and relationship;
For the triple of each extraction, the BC identifier of corresponding accuracy value in identification BC ledger is obtained;
Ternary Groups List is generated according to the accuracy value identified, and is ranked up based on the ternary Groups List that factor pair generates; And
Enhancing one or more MLM is inputted with received natural language NL.
20. method as claimed in claim 19, wherein enhance the NL input and identify in the NL input and the list generated Entry between conflict, and further include:
It utilizes the selection component of identified accuracy value to be ranked up list generated, and returns corresponding to selected Triple in the list of the sequence of accuracy value;And
By received NL input replace with the identification triple in the list of sequence.
21. method as claimed in claim 19, wherein enhance the NL input and identify in the NL input and the list generated At least one triple between matching, and further include:
The entry of the NL input is created in the KG and corresponding BC ledger.
22. method as claimed in claim 19, wherein enhance NL input identification NL input in the list with can not Conflict between at least one associated entry of variable factor, and further include:
Return to associated triple from list of entries, the triple returned have can not variable factor and return triple phase Answer BC identifier.
23. method as claimed in claim 19, wherein enhance the NL input and identify the NL input and list generated In at least one triple between part matching, and further include:
New entry is created in the KG and corresponding BC ledger, and the new entry of creation is matched with the part is corresponded to Entry be connected.
24. method as claimed in claim 19, wherein ternary Groups List generated is empty, and further include:
Creation new triple corresponding with the received natural language input of institute;
Match authenticity score for the ternary component of creation;And
Entry is created for the new triple in the KG, and creates the respective entries of the new triple in BC ledger.
25. a kind of computer system, comprising:
Processing unit;
It is coupled to the computer readable storage devices of the processing unit, which includes instruction, works as institute State the method for implementing any one of claim 6-10,19-24 when instruction is executed by the processing unit.
26. a kind of computer program product, which includes that there is the computer-readable storage of program code to set Standby, said program code can be executed by processing unit to implement the method for any one of claim 6-10,19-24.
27. a kind of system for handling natural language, the system comprises be respectively used to perform claim to require 6-10,19-24 Any one of method each step module.
CN201910012993.7A 2018-01-10 2019-01-07 Machine learning model modification and natural language processing Active CN110019751B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US15/866,706 2018-01-10
US15/866,706 US10606958B2 (en) 2018-01-10 2018-01-10 Machine learning modification and natural language processing
US15/866,702 US10776586B2 (en) 2018-01-10 2018-01-10 Machine learning to integrate knowledge and augment natural language processing
US15/866,702 2018-01-10

Publications (2)

Publication Number Publication Date
CN110019751A true CN110019751A (en) 2019-07-16
CN110019751B CN110019751B (en) 2023-06-02

Family

ID=67188742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910012993.7A Active CN110019751B (en) 2018-01-10 2019-01-07 Machine learning model modification and natural language processing

Country Status (1)

Country Link
CN (1) CN110019751B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046241A (en) * 2019-11-27 2020-04-21 中国人民解放军国防科技大学 Graph storage method and device for stream graph processing
CN112347754A (en) * 2019-08-09 2021-02-09 国际商业机器公司 Building a Joint learning framework
TWI798513B (en) * 2019-12-20 2023-04-11 國立清華大學 Training method of natural language corpus for the decision making model of machine learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160247088A1 (en) * 2015-02-20 2016-08-25 International Business Machines Corporation Confidence weighting of complex relationships in unstructured data
CN105912559A (en) * 2015-02-20 2016-08-31 国际商业机器公司 Extracting complex entities and relationships from unstructured data
CH711033A2 (en) * 2015-05-04 2016-11-15 Kiodia Sàrl relational search engine.
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates
US9767094B1 (en) * 2016-07-07 2017-09-19 International Business Machines Corporation User interface for supplementing an answer key of a question answering system using semantically equivalent variants of natural language expressions
CN107368468A (en) * 2017-06-06 2017-11-21 广东广业开元科技有限公司 A kind of generation method and system of O&M knowledge mapping

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160247088A1 (en) * 2015-02-20 2016-08-25 International Business Machines Corporation Confidence weighting of complex relationships in unstructured data
CN105912559A (en) * 2015-02-20 2016-08-31 国际商业机器公司 Extracting complex entities and relationships from unstructured data
CH711033A2 (en) * 2015-05-04 2016-11-15 Kiodia Sàrl relational search engine.
US20170193393A1 (en) * 2016-01-04 2017-07-06 International Business Machines Corporation Automated Knowledge Graph Creation
US9767094B1 (en) * 2016-07-07 2017-09-19 International Business Machines Corporation User interface for supplementing an answer key of a question answering system using semantically equivalent variants of natural language expressions
CN107038257A (en) * 2017-05-10 2017-08-11 浙江大学 A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates
CN107368468A (en) * 2017-06-06 2017-11-21 广东广业开元科技有限公司 A kind of generation method and system of O&M knowledge mapping

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347754A (en) * 2019-08-09 2021-02-09 国际商业机器公司 Building a Joint learning framework
CN111046241A (en) * 2019-11-27 2020-04-21 中国人民解放军国防科技大学 Graph storage method and device for stream graph processing
CN111046241B (en) * 2019-11-27 2023-09-26 中国人民解放军国防科技大学 Graph storage method and device for flow graph processing
TWI798513B (en) * 2019-12-20 2023-04-11 國立清華大學 Training method of natural language corpus for the decision making model of machine learning

Also Published As

Publication number Publication date
CN110019751B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN111566654B (en) Machine learning integrating knowledge and natural language processing
US10725836B2 (en) Intent-based organisation of APIs
US10606958B2 (en) Machine learning modification and natural language processing
CN108288229B (en) User portrait construction method
CN103026356B (en) Semantic content is searched for
KR100816934B1 (en) Clustering system and method using search result document
US10776586B2 (en) Machine learning to integrate knowledge and augment natural language processing
CN110019751A (en) Machine learning model modification and natural language processing
CN103534696A (en) Exploiting query click logs for domain detection in spoken language understanding
CN114358657B (en) Post recommendation method and device based on model fusion
CN110362663A (en) Adaptive more perception similarity detections and parsing
CN117271767B (en) Operation and maintenance knowledge base establishing method based on multiple intelligent agents
CN113761219A (en) Knowledge graph-based retrieval method and device, electronic equipment and storage medium
CN112508177A (en) Network structure searching method and device, electronic equipment and storage medium
CN113656699B (en) User feature vector determining method, related equipment and medium
Prasanth et al. Effective big data retrieval using deep learning modified neural networks
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN110262906B (en) Interface label recommendation method and device, storage medium and electronic equipment
CN108256086A (en) Data characteristics statistical analysis technique
CN108256083A (en) Content recommendation method based on deep learning
CN108280176A (en) Data mining optimization method based on MapReduce
CN111538898B (en) Web service package recommendation method and system based on combined feature extraction
CN107220249A (en) Full-text search based on classification
CN109299381A (en) A kind of software defect retrieval and analysis system and method based on semantic concept
Venkataramani et al. Latent Co-development Analysis Based Semantic Search for Large Code Repositories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant