WO2021054512A1 - Système et procédé destinés au renforcement de base de connaissances - Google Patents

Système et procédé destinés au renforcement de base de connaissances Download PDF

Info

Publication number
WO2021054512A1
WO2021054512A1 PCT/KR2019/013640 KR2019013640W WO2021054512A1 WO 2021054512 A1 WO2021054512 A1 WO 2021054512A1 KR 2019013640 W KR2019013640 W KR 2019013640W WO 2021054512 A1 WO2021054512 A1 WO 2021054512A1
Authority
WO
WIPO (PCT)
Prior art keywords
predicate
vector
knowledge base
entity
context
Prior art date
Application number
PCT/KR2019/013640
Other languages
English (en)
Korean (ko)
Inventor
장희원
이경일
Original Assignee
주식회사 솔트룩스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 솔트룩스 filed Critical 주식회사 솔트룩스
Publication of WO2021054512A1 publication Critical patent/WO2021054512A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the technical idea of the present invention relates to a knowledge base, and in detail, to a system and method for reinforcing a knowledge base.
  • the present invention is derived from research conducted and conducted by Saltlux Co., Ltd. as part of the SW Computing Source Technology Development Project (SW) of the Ministry of Science, ICT and Future Planning. [Research Period: 2019.01.01 ⁇ 2019.12.31, Research management professional institution: Information and Communication Technology Promotion Center, Research project name: WiseKB: Development of self-learning knowledge base and reasoning technology based on understanding big data, project serial number: 2013-0 -00109]
  • SW SW Computing Source Technology Development Project
  • a knowledge base can be built that stores knowledge data and provides the stored knowledge data.
  • the knowledge base may include structured knowledge data, and the knowledge data may be generated in various ways. Due to the vast amount of knowledge, curation work by humans to build a knowledge base may be limited.
  • predicates representing relationships between entities may have a relatively small number when compared to entities, but curation of the same predicate, that is, multiple expressions representing the same meaning, or new predicates. It may not be practically easy to verify and add through.
  • the technical idea of the present invention provides a knowledge reinforcement system and method for reinforcing predicates included in a knowledge base.
  • a system for reinforcing knowledge data of a knowledge base includes a predicate extraction unit for extracting a predicate from input data, and an entity for extracting an entity from input data.
  • An entity extracting unit a context extracting unit that extracts the context of an entity including a predicate from the input data, a predicate evaluation unit that obtains a predicate score vector from the predicate, the entity, and the context based on the learned artificial neural network, and the predicate score vector.
  • It may include a knowledge base update unit that determines whether to update the knowledge base based on the determination result, and updates the knowledge base based on the predicate according to the determination result.
  • the predicate evaluating unit may obtain a predicate score vector from a plurality of contexts each corresponding to the plurality of entities and the plurality of entities extracted from the input data and including the predicate.
  • the predicate evaluating unit may obtain vectors corresponding to the predicate, entity, and context with reference to the word vector model, and provide the obtained vectors to the artificial neural network.
  • the predicate evaluation unit may obtain a semantic vector from the entity and context based on the learned first artificial neural network, and based on the learned second artificial neural network, the semantic Predicate score vectors can be obtained from vectors and predicates.
  • the predicate score vector may include a first vector indicating the degree of matching between the predicate and the regular predicates included in the knowledge base, and a second vector indicating the possibility of new generation of the predicate.
  • the expression type of the regular predicate corresponding to the maximum value is Predicates can be added.
  • the knowledge base update unit adds the predicate as a new regular predicate to the knowledge base when the maximum value among the values included in the predicate score vector is greater than or equal to a predefined reference value and is included in the second vector. can do.
  • the knowledge base update unit may terminate the update of the knowledge base based on the predicate when the maximum value among values included in the predicate score vector is less than a predefined reference value.
  • a method for reinforcing knowledge data of a knowledge base includes extracting a context of an entity including a predicate, an entity, and a predicate from the input data, based on the learned artificial neural network. , Obtaining a predicate score vector from the predicate, entity and context, determining whether to update the knowledge base based on the predicate score vector, and updating the knowledge base based on the predicate according to the determination result.
  • the predicate score vector may include a first vector indicating a degree of matching between the predicate and regular predicates included in the knowledge base, and a second vector indicating the possibility of new generation of the predicate.
  • the obtaining of the predicate score vector includes obtaining a semantic vector from the entity and context based on the learned first artificial neural network, and the learned second artificial neural network. Based on the semantic vector and the predicate score vector from the predicate.
  • a new predicate can be easily verified and added to a knowledge base using machine learning.
  • the knowledge base can be effectively reinforced by adding expressions of predicates and new predicates with high reliability, and the reliability and utilization of the knowledge base can be remarkably increased. have.
  • the effects obtainable in the exemplary embodiments of the present invention are not limited to the above-mentioned effects, and other effects not mentioned are common knowledge in the technical field to which the exemplary embodiments of the present invention belong from the following description. It can be clearly derived and understood by those who have. That is, unintended effects of implementing the exemplary embodiments of the present invention may also be derived from the exemplary embodiments of the present invention by a person having ordinary skill in the art.
  • FIG. 1 is a block diagram showing a system and an input/output relationship thereof according to an exemplary embodiment of the present invention.
  • 2A, 2B and 2C show examples in which a predicate is added to a knowledge base according to exemplary embodiments of the present invention.
  • FIG. 3 is a block diagram showing a predicate evaluating unit according to an exemplary embodiment of the present invention.
  • Fig. 4 is a block diagram showing a vector generator according to an exemplary embodiment of the present invention.
  • FIG. 5 is a diagram illustrating an example of an operation of a predicate evaluation unit according to an exemplary embodiment of the present invention.
  • Fig. 6 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.
  • Fig. 7 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.
  • Fig. 8 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.
  • a component indicated or described as a single block may be a hardware block or a software block.
  • each of the components may be an independent hardware block that transmits and receives signals from each other, or may be a software block executed by at least one processor.
  • the software block may include a series of instructions executable by at least one processor and/or source code from which such instructions may be generated through compilation, and an optical storage medium (eg, CD, DVD, etc.), a semiconductor memory It may be stored in a computer-readable non-transitory storage medium such as a device (eg, flash memory, EPROM, etc.), a magnetic disk device (eg, a hard disk drive, a magnetic tape, etc.).
  • “system” or “database” may refer to a computing system including at least one processor and a memory accessed by the processor.
  • the knowledge base reinforcement system 100 may receive input data 200 and may be communicatively connected with the knowledge base 300.
  • the knowledge base reinforcement system 100 may communicate with the knowledge base 300 through a network, or may communicate with each other through a dedicated channel for one-to-one communication.
  • the knowledge base augmentation system 100 may include a knowledge base 300.
  • the input data 200 may refer to data including various types of information.
  • the knowledge base augmentation system 100 may collect the input data 200 through the Internet.
  • the input data 200 may be an information document provided by Wikipedia.org, an article provided by a press company homepage, or documents created by a social networking service (SNS). May be.
  • the knowledge base enhancement system 100 may receive the input data 200 through a local network, or may receive the input data 200 stored in the storage medium by accessing the storage medium.
  • the input data 200 may include text, and the knowledge base reinforcement system 100 may determine whether to add a predicate included in the text of the input data 200 to the knowledge base 300. .
  • the knowledge base 300 may include knowledge (or knowledge data) structured based on an ontology.
  • Ontology is a representation of things that exist or can be recognized by humans in a form that can be handled by a computer.
  • Ontology components are, for example, entities (E) (or instances), classes (C), and attributes. It can contain (property; P) and value (V).
  • the ontology components may further include a relationship, a function term, a restriction, a rule, an event, and the like.
  • the relationship may represent, as a non-limiting example, an entity-entity relationship, an entity-class relationship, an entity-attribute relationship, an entity-value (number, text, etc.) relationship.
  • the relationship may be expressed as a predicate outside of the knowledge base 300 (eg, the real world). For example, it may be a predicate representing a relationship between entities "Suwon” and “Daegu", from “Suwon and Daegu meet in the final" to "match”.
  • the knowledge base 300 may store vast amounts of knowledge data based on an ontology.
  • the knowledge base 300 may include knowledge data expressed using a Resource Description Framework (RDF).
  • RDF Resource Description Framework
  • a triple may be used as a unit of knowledge data, and the knowledge base 300 may return a triple in response to a query, such as a SPARQL Protocol and RDF Query Language (SPARQL) query.
  • a triple may be composed of a "subject-predicate-object", and an entity may be not only a subject of a triple, but also an object, and may be a predicate in some embodiments.
  • the knowledge data stored in the knowledge base 300 may be referred to as a knowledge graph.
  • the entity and the predicate may each have a unique identifier, such as a Uniform Resource Identifier (URI), and may be accessed by the Unified Resource Identifier.
  • URI Uniform Resource Identifier
  • Entities and predicates (or relationships) may have various expressions (or expressions), for example, as will be described later with reference to Figs. 2A, 2B and 2C, and expressions corresponding to the same entity or the same predicate can be found in the knowledge base ( 300) can be critical in increasing its usefulness.
  • the knowledge base 300 when the knowledge base 300 is used in a question answering system, the user's query may have a vast format and expression, and in order to recognize such a vast expression and provide a response, the knowledge base 300 It may contain multiple expressions corresponding to an object, that is, an entity or a predicate.
  • the knowledge base 300 may associate these expressions with an entity or a predicate using a predicate (or relationship), such as “label”.
  • the knowledge base 300 may include triples such as "eat-label-eat” and "eat-label-take” in relation to the predicate "eat”, as will be described later with reference to FIG. 2B. I can.
  • the knowledge base reinforcement system 100 can reinforce the knowledge base 300 by automatically verifying the predicate based on the input data 200 and adding it to the knowledge base 300, Accordingly, the usefulness of the knowledge base 300 may be remarkably improved.
  • the knowledge base reinforcement system 100 may include a preprocessor 110, a predicate evaluation unit 120, and a knowledge base update unit 130, as shown in FIG. 1.
  • the preprocessor 110 may extract an entity (ENT), a predicate (PRE), and a context (CTX) from the input data 200.
  • the entity extracting unit 112 may extract an entity (ENT) from the input data 200
  • the predicate extracting unit 114 may extract a predicate ( PRE) may be extracted
  • the context extracting unit 116 may extract a context (CTX) from the input data 200.
  • the entity (ENT), the predicate (PRE), and the context (CTX) are individually extracted, different from that shown in Fig. 1, in the process of commonly processing the input data 200, the entity (ENT) , Predicate (PRE) and context (CTX) may be extracted.
  • the context (CTX) is a unit including an entity (ENT) and a predicate (PRE) in the input data 200, and a length may be determined according to a predefined window size. For example, in the above-described example "Suwon and Daegu meet in the final", “Suwon” and “Daegu” may be extracted as an entity (ENT), and “Attached” as a predicate (PRE) may be extracted, and the context As (CTX), "Finals/In/Suwon/Daegu/Meet" can be extracted.
  • the entity (ENT), predicate (PRE), and context (CTX) may be extracted from the input data 200 in any manner.
  • the preprocessor 110 may process text included in the input data 200 based on natural language processing including morpheme analysis and syntax analysis.
  • the preprocessor 110 may extract a triple through sentence analysis included in the text of the input data 200 based on dependency parsing and/or Semantic Role Labeling (SRL). .
  • SRL Semantic Role Labeling
  • the preprocessing unit 110 is a morpheme It may include at least one of an analysis unit, a syntax analysis unit, an entity name analysis unit, a filtering analysis unit, an intention analysis unit, a domain analysis unit), and a Semantic Role Labeling (SRL) unit.
  • An entity (ENT) and a predicate (PRE) may be extracted from the extracted triple, and a context (CTX) including the entity (ENT) and the predicate (PRE) may be extracted from the input data 200.
  • CTX context
  • the operation of extracting a triple from the input data 200 may be performed with reference to the knowledge base 300.
  • the extracted entity ENT may not match the entity included in the knowledge base 300.
  • the predicate (PRE) is extracted from the input data 200 based on the dependencies between words given in a sentence, that is, the roles of the words, the predicates included in the predicate (PRE) and the knowledge base 300 (in this specification) May be referred to as regular predicates) may be in an unknown state.
  • the predicate (PRE) may be one of the expression types of the regular predicate included in the knowledge base 300, may be a new expression type of the regular predicate included in the knowledge base 300, and the knowledge base 300 It may correspond to a predicate that needs to be added as a new regular predicate.
  • the predicate PRE may correspond to a predicate that is not added to the knowledge base 300 due to an error in the input data 200 and/or the preprocessor 110.
  • the predicate evaluation unit 120 may receive not only the predicate (PRE), but also an entity (ENT) and context (CTX) related to the predicate (PRE) from the preprocessing unit 110, and may receive an artificial neural network (ANN). Can be used to evaluate the predicate (PRE).
  • An artificial neural network may refer to a structure in which neurons (or neuron models) are interconnected. Artificial neurons can generate outputs by performing simple operations on input data, and outputs can be passed as inputs to other artificial neurons.
  • the predicate evaluation unit 120 may evaluate the predicate PRE based on machine learning, and is not limited to the name of the artificial neural network (ANN).
  • an artificial neural network may be referred to as a deep learning network, or a deep neural network (DNN), a convolution neural network (CNN), or a recurrent neural network. ; RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Deep Q-Network.
  • the predicate evaluation unit 120 may generate a predicate score vector (SCR) by evaluating a predicate (PRE), and the predicate score vector (SCR) is a predicate (PRE) and a knowledge base (300). ) Can represent relations with regular predicates included in ).
  • PRE predicate score vector
  • PRE predicate
  • knowledge base 300
  • Can represent relations with regular predicates included in An example of the predicate evaluation unit 120 will be described later with reference to FIGS. 3 and 5.
  • the knowledge base update unit 130 may receive a predicate score vector (SCR) from the predicate evaluation unit 120, and update the knowledge base 300 based on a predicate (PRE) based on the predicate score vector (SCR). You can do it.
  • the knowledge base update unit 130 may determine whether or not to update the knowledge base 300 and an update method based on the predicate score vector (SCR), and knowledge of the predicate (PRE) according to the determined update method. It can be added to the base 300.
  • SCR predicate score vector
  • PRE predicate score vector
  • FIGS. 2A, 2B and 2C show examples in which a predicate is added to a knowledge base according to exemplary embodiments of the present invention.
  • FIGS. 2A and 2B show an example in which a predicate is added as an expression form of a regular predicate
  • FIG. 2C shows an example in which a predicate is added as a new regular predicate.
  • overlapping contents will be omitted, and FIGS. 2A, 2B, and 2C will be described with reference to FIG. 1.
  • the knowledge base 300 may include “accept” as a regular predicate.
  • the regular predicate may have a plurality of phenotypes, for example, as shown in FIG. 2A, the regular predicate “accept” has the phenotypes “accept”, “accept”, etc. I can.
  • the regular predicate “accept” may form triples as the phenotypes "accept", "accept”, and the like "label” predicates, respectively.
  • the preprocessor 110 may extract "accept” as a predicate (PRE) from the input data 200, and the predicate evaluating part 120 has the most "accept” among the existing regular predicates.
  • PRE predicate
  • a predicate score vector (SCR) indicating a match may be provided to the knowledge base update unit 130. Accordingly, the knowledge base update unit 130 may obtain the URI of the entity “accept” from the knowledge base 300, and generate a new triple by connecting the obtained URI and “accept” with the predicate “label”. It can be added to the knowledge base 300.
  • SCR predicate score vector
  • the knowledge base 300 may include “lift” and “eat” as regular predicates.
  • the preprocessor 110 extracts “lift” as a predicate (PRE) from the input data 200, although “lift” already exists as the expression type of the regular predicate "lift”, the predicate evaluation unit 120
  • the knowledge base update unit (SCR) which indicates that "to hold” is “to hold”, which means to eat food or the like based on the entity (ENT) and the context (CTX), which best matches the regular predicate "eat” ( 130).
  • the knowledge base update unit 130 may obtain the URI of the entity "eat” from the knowledge base 300, and generate a new triple by connecting the obtained URI and "Listen” with the predicate "label” It can be added to the base 300.
  • the preprocessor 110 may extract “follow” or “follow” from the input data 200.
  • “Follow” is a term used in social network services, etc., and may be used as a meaning of establishing a relationship with another user, and the knowledge base 300 may not include a regular predicate corresponding thereto.
  • the predicate evaluating unit 120 determines that the predicate (PRE) “follows” or “follows” does not match any of the regular predicates included in the current knowledge base 300.
  • a predicate score vector (SCR) indicating non-matching may be provided to the knowledge base update unit 130.
  • the knowledge base update unit 130 may generate a new URI of "follow1" as a new regular predicate, or request the creation of a new URI from the knowledge base 300, and the new URI and "follow” and “follow” New triples may be created and added to the knowledge base 300 by connecting each of "follow” with the predicate "label”.
  • the knowledge base update unit 130 provides a predicate score vector (SCR) indicating that it does not match any of the regular predicates included in the knowledge base 300 from the predicate evaluation unit 120.
  • SCR predicate score vector
  • information including a predicate (PRE), an entity (ENT), and a context (CTX) may be provided externally or separately recorded for curation by the administrator.
  • the predicate evaluating unit 120 may receive an entity (ENT), a predicate (PRE), and a context (CTX) from the preprocessing unit 110, and obtain a predicate score vector (SCR). Can be printed.
  • the predicate evaluation unit 120 may include a vector generation unit 121 and an artificial neural network (ANN).
  • ANN artificial neural network
  • the vector generator 121 may generate vectors corresponding to each of the entity (ENT), the predicate (PRE), and the context (CTX) by referring to the word vector model 400.
  • the word vector model 400 may refer to a multidimensional space in which a word (or token, word, etc.) having meaning is expressed as a single coordinate, that is, a word vector, or a system that includes word vectors and updates word vectors. .
  • the word vector model 400 may include an artificial neural network, and may be learned by machine learning. Words that are semantically similar may be arranged adjacent to each other in a multidimensional space, and accordingly, word vectors corresponding to words that are semantically similar may have similar values.
  • the word vector model 400 may be included in the knowledge base reinforcement system 100, and the predicate evaluation unit 120 is applied to the word vector model 400 outside the knowledge base reinforcement system 100. You can also access it.
  • An example of the vector generator 121 will be described later with reference to FIG. 4.
  • the artificial neural network (ANN) receives vectors corresponding to the entities (ENT), predicates (PRE), and contexts (CTX) each having a meaning.
  • the vector generation unit 121 refers to the word vector model 400 and refers to an entity vector corresponding to an entity (ENT), a predicate (PRE), and a context (CTX).
  • a predicate vector (PRE') can be output.
  • the artificial neural network may be in a learned state based on samples of an entity vector (ENT'), a predicate vector (PRE'), and a context vector (CTX').
  • an artificial neural network may be trained based on reinforcement learning.
  • the predicate score vector (SCR) output from the artificial neural network (ANN) may indicate which regular predicate matches when the predicate (PRE) matches regular predicates, and the predicate (PRE ) May indicate that it corresponds to a new regular predicate, or it may indicate that it does not match any regular predicates including the new regular predicate.
  • the vector generator 40 includes an entity vector (ENT'), a context vector (CTX'), and a predicate vector (PRE) from an entity (ENT), a context (CTX), and a predicate (PRE). ') can be created.
  • the vector generation unit 40 may include an entity vector generation unit 41, a context vector generation unit 42, and a predicate vector generation unit 43. It will be explained with reference to 3.
  • the word vector model 400 of FIG. 3 may include a first word vector model 410 and a second word vector model 420, and the vector generator 40 is a first word vector model. Reference may be made to the model 410 and the second word vector model 420.
  • the first word vector model 410 may provide a word vector corresponding to a word included in text of the input data 200. Accordingly, the context vector generation unit 42 and the predicate vector generation unit 43 refer to the first word vector model 410, and the context vector (CTX') and the predicate vector ( PRE') can be created respectively.
  • the first word vector model 410 may be referred to as a word vector model for phenotype.
  • the second word vector model 420 may provide an entity vector representing an entity included in the knowledge base 300.
  • entity vector representing an entity included in the knowledge base 300.
  • “Suwon” and “Daegu” may mean sports teams related to a city rather than a corresponding city.
  • the entity vector generator 41 refers to the second word vector model 420 so that the entities extracted by the preprocessor 110 correspond to one of the entities included in the knowledge base 300. ') can be created.
  • the entity vector generator 41 may obtain a word vector corresponding to the entity ENT with reference to the first word vector model 410, and obtain the word vector with reference to the second word vector model 420
  • An entity vector (ENT') may be generated from the converted word vector and the context vector (CTX').
  • the entity vector generation unit 41 has knowledge due to the context vector (CTX') generated from the context (CTX) including the "final/in” from the above-described exemplary entities "Suwon” and "Daegu"
  • an entity vector ENT' corresponding to entities representing a sports team rather than entities representing a city may be generated.
  • the entity vector ENT' while learning the artificial neural network (ANN), the entity vector ENT' may be directly provided by the user by curation.
  • the second word vector model 420 may be referred to as an entity linking model.
  • the predicate evaluating unit 120 includes an entity vector (ENT'), a predicate vector (PRE'), and a context respectively corresponding to an entity (ENT), a predicate (PRE), and a context (CTX).
  • ENT entity vector
  • PRE predicate vector
  • CTX context vector
  • a vector (CTX') can be received and a predicate score vector (SCR) can be output.
  • SCR predicate score vector
  • the predicate evaluation unit 120 may receive a plurality of vector pairs each composed of an entity (ENT) and a context (CTX) including the same, and the vector generation unit 121 is a word vector model 400 ), a plurality of vector pairs and corresponding vectors may be generated.
  • the vector generator 121 includes a first vector pair (PAIR1) and a second context vector consisting of a first context vector (CTX1') and a first entity vector (ENT1').
  • a second vector pair PAIR2 consisting of (CTX2') and a second entity vector ENT2' may be generated.
  • the first context vector CTX1 ′ and the second context vector CTX2 ′ may commonly include a vector corresponding to an expression similar to the predicate vector PRE′ or the predicate vector PRE′.
  • the predicate evaluation unit 120 may include a first artificial neural network ANN1 and a second artificial neural network ANN2.
  • the first predicate evaluation unit ANN1 may receive a first vector pair PAIR1 and a second vector pair PAIR2, and may generate a semantic vector SEM′ therefrom.
  • the semantic vector (SEM') may represent the meaning of the entities (ENT1, ENT2) and the predicate (PRE) in the contexts (CTX1, CTX2), and 2 related to the predicate (PRE) due to two or more vector pairs. It may have a meaning including the above usage aspects. As shown in FIG.
  • the second artificial neural network ANN2 may receive a semantic vector SEM' and a predicate vector PRE', and may output a predicate score vector SCR.
  • a first vector pair (PAIR1), a second vector pair (PAIR2), and a predicate vector (PRE') in one artificial neural network eg, ANN of FIG. 3 May be provided.
  • the first artificial neural network (ANN1) is trained by passing the errors generated from the predicate score vector (SCR) in the training step to the first artificial neural network (ANN1) through the second artificial neural network (ANN2). I can.
  • the predicate score vector SCR may include a first vector V1 representing a degree of matching with regular predicates included in the knowledge base 300 and a second vector V2 corresponding to a new regular predicate.
  • the first vector V1 corresponds to n regular predicates, respectively. It may include n elements (RP 1 , RP 2 , RP 3 ,..., RP n ), and the second vector V2 may include one element (R n+1 ).
  • Fig. 6 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.
  • the method for reinforcing the knowledge base may include a plurality of steps S20, S40, S60, and S80.
  • the method of FIG. 6 may be performed by the knowledge base augmentation system 100 of FIG. 1, and FIG. 6 will be described below with reference to FIG. 1.
  • step S20 an operation of extracting a predicate (PRE), an entity (ENT), and a context (CTX) may be performed.
  • the preprocessor 110 may process text included in the input data 200 in natural language.
  • the predicate (PRE), the entity (ENT), and the context (CTX) may be referred to the knowledge base 300. Can be extracted.
  • step S40 an operation of obtaining a predicate score vector (SCR) may be performed.
  • the predicate evaluation unit 120 uses the learned artificial neural network (ANN) to obtain a predicate score vector (SCR) from the predicate (PRE), entity (ENT), and context (CTX) extracted in step S20. can do.
  • ANN learned artificial neural network
  • SCR predicate score vector
  • PRE predicate
  • ENT entity
  • CTX context
  • step S60 an operation of determining whether to update the knowledge base 300 may be performed.
  • the knowledge base updater 130 may determine whether to reflect the predicate PRE to the knowledge base 300 based on values included in the predicate score vector SCR.
  • the knowledge base updater 130 may determine whether to reflect the predicate PRE to the knowledge base 300 based on a maximum value among values included in the predicate score vector SCR.
  • An example of step S60 will be described with reference to FIG. 8. As shown in FIG. 6, when it is determined that the knowledge base 300 is updated, step S80 may follow, while otherwise, the method of FIG. 6 may end.
  • step S80 an operation of adding the predicate PRE to the knowledge base 300 may be performed.
  • the knowledge base update unit 130 may add a predicate (PRE) as an expression type of a regular predicate included in the knowledge base 300, as described above with reference to FIGS. 2A and 2B, and FIG. 2C
  • a new regular predicate different from the regular predicates included in the knowledge base 300 may be generated, and a predicate (PRE) may be added as a phenotype of the new regular predicate.
  • PRE predicate
  • An example of step S80 will be described later with reference to FIG. 8.
  • Fig. 7 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention. Specifically, the flowchart of FIG. 7 shows an example of step S40 of FIG. 6. As described above with reference to FIG. 6, an operation of obtaining a predicate score vector (SCR) may be performed in step S40' of FIG. 7. As shown in FIG. 7, step S40' may include steps S42 and S44. In some embodiments, step S40' may be performed by the predicate evaluating unit 120, and FIG. 7 will be described below with reference to FIGS. 3 and 5.
  • SCR predicate score vector
  • step S42 an operation of obtaining a semantic vector (SEM') from the entity (ENT) and the context (CTX) may be performed.
  • the vector generator 121 may generate an entity vector ENT' and a context vector CTX' from the entity ENT and the context CTX.
  • the entity vector ENT' and the context vector CTX' may be provided to the first artificial neural network ANN1, and the first artificial neural network ANN1 may output a semantic vector SEM'.
  • the vector generator 121 may receive a plurality of entities and a plurality of contexts, and may generate a plurality of vector pairs.
  • a plurality of vector pairs may be provided to the first artificial neural network ANN1, and the first artificial neural network ANN1 may output a semantic vector SEM'.
  • step S44 an operation of obtaining a predicate score vector SCR from the semantic vector SEM' and the predicate PRE may be performed.
  • the vector generator 121 may generate a predicate vector PRE' from the predicate PRE.
  • the second artificial neural network ANN2 may receive a semantic vector SEM' and a predicate vector PRE', and may output a predicate score vector SCR.
  • Fig. 8 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention. Specifically, the flowchart of FIG. 8 shows examples of steps S60 and S80 of FIG. 6. As described above with reference to FIG. 6, it may be determined whether the knowledge base 300 is updated in step S60' of FIG. 8, and an operation of adding a predicate (PRE) to the knowledge base 300 in step S80' is performed. Can be done. As shown in FIG. 8, step S60' may include steps S62 and S64, and step S80' may include steps S82, S84, and S86. In some embodiments, the method of FIG. 8 may be performed by the knowledge base update unit 130 of FIG. 1, and FIG. 8 will be described below with reference to FIGS. 1 and 5.
  • PRE predicate
  • step S62 an operation of detecting a maximum value among elements included in the predicate score vector SCR may be performed.
  • the predicate score vector (SCR) may include n+1 elements (RP 1 ,..., RP n+1 ), and the knowledge base update unit 130 may detect a maximum value RP k of n+1 elements RP 1 ,..., RP n+1 (1 ⁇ k ⁇ n+1).
  • the knowledge base update unit 130 may further detect not only the largest value (which may be referred to as the first maximum value) but also the second largest value (which may be referred to as the second maximum value). May be.
  • step S64 an operation of comparing a maximum value among elements of the predicate score vector SCR with a predefined criterion may be performed.
  • the knowledge base update unit 130 may compare the maximum value RP k detected in step S62 with the threshold value THR.
  • the maximum value RP k is greater than or equal to the threshold value THR, that is, when the predicate PRE has a high degree of matching with a specific regular predicate or a new regular predicate
  • step S82 of step S80' follows. can do.
  • step S80' may not be performed.
  • the knowledge base update unit 130 compares the difference between the first maximum value and the second maximum value detected in step S62 with a predefined reference. Can be. Accordingly, when the first maximum value and the second maximum value exceed a predefined criterion, step S80' may be subsequently performed, while otherwise, step S80' may not be performed.
  • step S82 an operation of determining whether the detected maximum value belongs to the first vector V1 of the predicate score vector SCR may be performed.
  • a knowledge base update unit 130 may check whether or not the index k is 1 or greater than n corresponding to the maximum value k RP.
  • the predicate PRE is one of the regular predicates of the knowledge base 300 (that is, a regular predicate corresponding to RP k). )
  • step S84 may be performed subsequently.
  • the predicate (PRE) corresponds to the new normal predicate. If so, it may be performed subsequent to step S86.
  • step S84 an operation of adding the predicate PRE to the expression type of the kth regular predicate may be performed.
  • the knowledge base update unit 130 may add a predicate (PRE) to the expression type of the kth regular predicate corresponding to the maximum value RP k among n regular predicates. Accordingly, a triple including the kth regular predicate and the predicate PRE connected by the predicate “label” may be added to the knowledge base 300.
  • PRE predicate
  • step S86 an operation of adding the predicate PRE as an n+1th regular predicate may be performed.
  • the knowledge base update unit 130 may add a predicate (PRE) as an n+1th regular predicate different from the existing n regular predicates as a new regular predicate.
  • PRE predicate
  • a URI of an entity corresponding to the n+1th regular predicate may be generated, and a corresponding entity connected by the predicate “label” and a triple including the predicate PRE may be added to the knowledge base 300.
  • the knowledge base updater 130 may determine whether to update the knowledge base 300 and an update method using a plurality of threshold values, differently from that shown in FIG. 8.
  • the knowledge base update unit 130 may include a threshold value used when determining whether to add to one of the existing regular predicates, that is, a threshold value at which elements included in the first vector V1 are compared, and A threshold value used when determining whether to add as a new regular predicate, that is, a threshold value at which elements included in the second vector V2 are compared may be set differently.
  • the knowledge base update unit 130 may calculate not only the maximum value of the elements of the predicate score vector (SCR), but also statistical characteristics, such as average and variance, and calculate the calculated values at least one It is also possible to determine whether or not to update and how to update by comparing with a threshold value.
  • SCR predicate score vector

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un système destiné au renforcement de données de connaissances d'une sbase de connaissances selon un mode de réalisation donné à titre d'exemple de la présente invention qui peut consister : en une unité d'extraction de prédicats destinée à extraire un prédicat à partir des données d'entrée ; en une unité d'extraction d'entités destinée à extraire une entité à partir des données d'entrée ; en une unité d'extraction de contexte destinée à extraire un contexte à partir d'une entité comprenant un prédicat à partir des données d'entrée ; en une unité d'évaluation de prédicat destinée à acquérir un vecteur de score de prédicat à partir d'un prédicat, d'une entité et d'un contexte sur la base d'un réseau neuronal artificiel appris ; et en une unité de mise à jour de base de connaissances destinée à déterminer s'il faut mettre à jour la base de connaissances sur la base du vecteur de score de prédicat et mettre à jour la base de connaissances sur la base du prédicat selon un résultat de la détermination.
PCT/KR2019/013640 2019-09-18 2019-10-17 Système et procédé destinés au renforcement de base de connaissances WO2021054512A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020190114967A KR102324196B1 (ko) 2019-09-18 2019-09-18 지식 베이스 보강을 위한 시스템 및 방법
KR10-2019-0114967 2019-09-18

Publications (1)

Publication Number Publication Date
WO2021054512A1 true WO2021054512A1 (fr) 2021-03-25

Family

ID=74883198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/013640 WO2021054512A1 (fr) 2019-09-18 2019-10-17 Système et procédé destinés au renforcement de base de connaissances

Country Status (2)

Country Link
KR (1) KR102324196B1 (fr)
WO (1) WO2021054512A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128689A (zh) * 2021-04-27 2021-07-16 中国电力科学研究院有限公司 一种调控知识图谱的实体关系路径推理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496561B2 (en) * 2001-01-18 2009-02-24 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
KR20090112157A (ko) * 2008-04-23 2009-10-28 재단법인서울대학교산학협력재단 시맨틱 웹 자원의 랭킹처리방법
WO2012040676A1 (fr) * 2010-09-24 2012-03-29 International Business Machines Corporation Utilisation d'informations ontologiques dans une coercition de type à domaine ouvert
KR20160108886A (ko) * 2015-03-09 2016-09-21 포항공과대학교 산학협력단 개방형 정보 추출을 이용한 지식베이스 확장 방법 및 장치
KR20170089142A (ko) * 2016-01-26 2017-08-03 경북대학교 산학협력단 트리플 데이터의 생성 방법 및 시스템

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496561B2 (en) * 2001-01-18 2009-02-24 Science Applications International Corporation Method and system of ranking and clustering for document indexing and retrieval
KR20090112157A (ko) * 2008-04-23 2009-10-28 재단법인서울대학교산학협력재단 시맨틱 웹 자원의 랭킹처리방법
WO2012040676A1 (fr) * 2010-09-24 2012-03-29 International Business Machines Corporation Utilisation d'informations ontologiques dans une coercition de type à domaine ouvert
KR20160108886A (ko) * 2015-03-09 2016-09-21 포항공과대학교 산학협력단 개방형 정보 추출을 이용한 지식베이스 확장 방법 및 장치
KR20170089142A (ko) * 2016-01-26 2017-08-03 경북대학교 산학협력단 트리플 데이터의 생성 방법 및 시스템

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128689A (zh) * 2021-04-27 2021-07-16 中国电力科学研究院有限公司 一种调控知识图谱的实体关系路径推理方法及系统

Also Published As

Publication number Publication date
KR102324196B1 (ko) 2021-11-11
KR20210033345A (ko) 2021-03-26

Similar Documents

Publication Publication Date Title
US11042794B2 (en) Extensible validation framework for question and answer systems
CN105701253B (zh) 中文自然语言问句语义化的知识库自动问答方法
WO2018192269A1 (fr) Procédé pour ordinateur simulant un cerveau humain en vue d'apprendre des connaissances, machine d'inférence logique et plateforme de service d'intelligence artificielle de type cérébral
Pandita et al. Inferring method specifications from natural language API descriptions
US10545999B2 (en) Building features and indexing for knowledge-based matching
WO2021049706A1 (fr) Système et procédé de réponse aux questions d'ensemble
CN107430612A (zh) 查找描述对计算问题的解决方案的文档
WO2014069779A1 (fr) Appareil d'analyse syntaxique fondée sur un prétraitement syntaxique, et son procédé
KR20220028038A (ko) 자연어 이해 프레임워크에서 발화에 대한 복수의 의미 표현들의 도출
WO2020111314A1 (fr) Appareil et procédé d'interrogation-réponse basés sur un graphe conceptuel
CN110581864B (zh) 一种sql注入攻击的检测方法及装置
Fernandez-Álvarez et al. Automatic extraction of shapes using sheXer
Sellam et al. Deepbase: Deep inspection of neural networks
US20220245353A1 (en) System and method for entity labeling in a natural language understanding (nlu) framework
KR102143157B1 (ko) 온톨로지 기반 패러프레이즈 문장 생성을 위한 시스템 및 방법
US20220245361A1 (en) System and method for managing and optimizing lookup source templates in a natural language understanding (nlu) framework
WO2018088664A1 (fr) Dispositif de détection automatique d'erreur de corpus d'étiquetage morphosyntaxique au moyen d'ensembles approximatifs, et procédé associé
WO2022121146A1 (fr) Procédé et appareil de détermination de l'importance d'un segment de code
Feng et al. Probing and fine-tuning reading comprehension models for few-shot event extraction
Zhao et al. Knowledge-enhanced self-supervised prototypical network for few-shot event detection
WO2021054512A1 (fr) Système et procédé destinés au renforcement de base de connaissances
Mezghanni et al. Deriving ontological semantic relations between Arabic compound nouns concepts
Imtiaz Malik et al. Extraction of use case diagram elements using natural language processing and network science
Ashfaq et al. An intelligent analytics approach to minimize complexity in ambiguous software requirements
Heres Source code plagiarism detection using machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945937

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945937

Country of ref document: EP

Kind code of ref document: A1