WO2021054512A1

WO2021054512A1 - System and method for reinforcing knowledge base

Info

Publication number: WO2021054512A1
Application number: PCT/KR2019/013640
Authority: WO
Inventors: 장희원; 이경일
Original assignee: 주식회사 솔트룩스
Priority date: 2019-09-18
Filing date: 2019-10-17
Publication date: 2021-03-25
Also published as: KR102324196B1; KR20210033345A

Abstract

A system for reinforcing knowledge data of a knowledge base according to an exemplary embodiment of the present invention may comprise: a predicate extraction unit for extracting a predicate from input data; an entity extraction unit for extracting an entity from the input data; a context extracting unit for extracting a context of an entity including a predicate from the input data; a predicate evaluation unit for acquiring a predicate score vector from a predicate, an entity, and a context on the basis of a learned artificial neural network; and a knowledge base update unit for determining whether to update the knowledge base on the basis of the predicate score vector, and updating the knowledge base on the basis of the predicate according to a result of the determination.

Description

Systems and methods for reinforcing knowledge base

The technical idea of the present invention relates to a knowledge base, and in detail, to a system and method for reinforcing a knowledge base.

The present invention is derived from research conducted and conducted by Saltlux Co., Ltd. as part of the SW Computing Source Technology Development Project (SW) of the Ministry of Science, ICT and Future Planning. [Research Period: 2019.01.01~2019.12.31, Research management professional institution: Information and Communication Technology Promotion Center, Research project name: WiseKB: Development of self-learning knowledge base and reasoning technology based on understanding big data, project serial number: 2013-0 -00109]

A knowledge base can be built that stores knowledge data and provides the stored knowledge data. For example, the knowledge base may include structured knowledge data, and the knowledge data may be generated in various ways. Due to the vast amount of knowledge, curation work by humans to build a knowledge base may be limited. For example, in the knowledge base, predicates representing relationships between entities may have a relatively small number when compared to entities, but curation of the same predicate, that is, multiple expressions representing the same meaning, or new predicates. It may not be practically easy to verify and add through.

The technical idea of the present invention provides a knowledge reinforcement system and method for reinforcing predicates included in a knowledge base.

In order to achieve the above object, according to the technical idea of the present invention, a system for reinforcing knowledge data of a knowledge base includes a predicate extraction unit for extracting a predicate from input data, and an entity for extracting an entity from input data. An entity extracting unit, a context extracting unit that extracts the context of an entity including a predicate from the input data, a predicate evaluation unit that obtains a predicate score vector from the predicate, the entity, and the context based on the learned artificial neural network, and the predicate score vector. It may include a knowledge base update unit that determines whether to update the knowledge base based on the determination result, and updates the knowledge base based on the predicate according to the determination result.

According to an exemplary embodiment of the present invention, the predicate evaluating unit may obtain a predicate score vector from a plurality of contexts each corresponding to the plurality of entities and the plurality of entities extracted from the input data and including the predicate.

According to an exemplary embodiment of the present invention, the predicate evaluating unit may obtain vectors corresponding to the predicate, entity, and context with reference to the word vector model, and provide the obtained vectors to the artificial neural network.

According to an exemplary embodiment of the present invention, the predicate evaluation unit may obtain a semantic vector from the entity and context based on the learned first artificial neural network, and based on the learned second artificial neural network, the semantic Predicate score vectors can be obtained from vectors and predicates.

According to an exemplary embodiment of the present invention, the predicate score vector may include a first vector indicating the degree of matching between the predicate and the regular predicates included in the knowledge base, and a second vector indicating the possibility of new generation of the predicate. .

According to an exemplary embodiment of the present invention, when the maximum value among the values included in the predicate score vector is greater than or equal to a predefined reference value and is included in the first vector, the expression type of the regular predicate corresponding to the maximum value is Predicates can be added.

According to an exemplary embodiment of the present invention, the knowledge base update unit adds the predicate as a new regular predicate to the knowledge base when the maximum value among the values included in the predicate score vector is greater than or equal to a predefined reference value and is included in the second vector. can do.

According to an exemplary embodiment of the present invention, the knowledge base update unit may terminate the update of the knowledge base based on the predicate when the maximum value among values included in the predicate score vector is less than a predefined reference value.

According to an aspect of the technical idea of the present invention, a method for reinforcing knowledge data of a knowledge base includes extracting a context of an entity including a predicate, an entity, and a predicate from the input data, based on the learned artificial neural network. , Obtaining a predicate score vector from the predicate, entity and context, determining whether to update the knowledge base based on the predicate score vector, and updating the knowledge base based on the predicate according to the determination result. The predicate score vector may include a first vector indicating a degree of matching between the predicate and regular predicates included in the knowledge base, and a second vector indicating the possibility of new generation of the predicate.

According to an exemplary embodiment of the present invention, the obtaining of the predicate score vector includes obtaining a semantic vector from the entity and context based on the learned first artificial neural network, and the learned second artificial neural network. Based on the semantic vector and the predicate score vector from the predicate.

According to the system and method according to the technical idea of the present invention, expressions corresponding to the same predicate can be easily verified and added to a knowledge base using machine learning.

Further, according to the system and method according to the technical idea of the present invention, a new predicate can be easily verified and added to a knowledge base using machine learning.

In addition, according to the system and method according to the technical idea of the present invention, the knowledge base can be effectively reinforced by adding expressions of predicates and new predicates with high reliability, and the reliability and utilization of the knowledge base can be remarkably increased. have.

The effects obtainable in the exemplary embodiments of the present invention are not limited to the above-mentioned effects, and other effects not mentioned are common knowledge in the technical field to which the exemplary embodiments of the present invention belong from the following description. It can be clearly derived and understood by those who have. That is, unintended effects of implementing the exemplary embodiments of the present invention may also be derived from the exemplary embodiments of the present invention by a person having ordinary skill in the art.

1 is a block diagram showing a system and an input/output relationship thereof according to an exemplary embodiment of the present invention.

2A, 2B and 2C show examples in which a predicate is added to a knowledge base according to exemplary embodiments of the present invention.

3 is a block diagram showing a predicate evaluating unit according to an exemplary embodiment of the present invention.

Fig. 4 is a block diagram showing a vector generator according to an exemplary embodiment of the present invention.

5 is a diagram illustrating an example of an operation of a predicate evaluation unit according to an exemplary embodiment of the present invention.

Fig. 6 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.

Fig. 7 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.

Fig. 8 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Embodiments of the present invention are provided to more completely describe the present invention to those with average knowledge in the art. In the present invention, various modifications may be made and various forms may be applied, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to a specific form disclosed, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals are used for similar elements. In the accompanying drawings, the dimensions of the structures are shown to be enlarged or reduced compared to the actual one for clarity of the present invention.

The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance the possibility of the presence or addition.

Unless otherwise defined, all terms used herein including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms as defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. It doesn't work.

In the following drawings and description, a component indicated or described as a single block may be a hardware block or a software block. For example, each of the components may be an independent hardware block that transmits and receives signals from each other, or may be a software block executed by at least one processor. The software block may include a series of instructions executable by at least one processor and/or source code from which such instructions may be generated through compilation, and an optical storage medium (eg, CD, DVD, etc.), a semiconductor memory It may be stored in a computer-readable non-transitory storage medium such as a device (eg, flash memory, EPROM, etc.), a magnetic disk device (eg, a hard disk drive, a magnetic tape, etc.). In addition, in the present specification, “system” or “database” may refer to a computing system including at least one processor and a memory accessed by the processor.

1 is a block diagram showing a system and an input/output relationship thereof according to an exemplary embodiment of the present invention. As shown in FIG. 1, the knowledge base reinforcement system 100 may receive input data 200 and may be communicatively connected with the knowledge base 300. In some embodiments, the knowledge base reinforcement system 100 may communicate with the knowledge base 300 through a network, or may communicate with each other through a dedicated channel for one-to-one communication. In some embodiments, the knowledge base augmentation system 100 may include a knowledge base 300.

The input data 200 may refer to data including various types of information. In some embodiments, the knowledge base augmentation system 100 may collect the input data 200 through the Internet. For example, the input data 200 may be an information document provided by Wikipedia.org, an article provided by a press company homepage, or documents created by a social networking service (SNS). May be. In addition, in some embodiments, the knowledge base enhancement system 100 may receive the input data 200 through a local network, or may receive the input data 200 stored in the storage medium by accessing the storage medium. . The input data 200 may include text, and the knowledge base reinforcement system 100 may determine whether to add a predicate included in the text of the input data 200 to the knowledge base 300. .

The knowledge base 300 may include knowledge (or knowledge data) structured based on an ontology. Ontology is a representation of things that exist or can be recognized by humans in a form that can be handled by a computer. Ontology components are, for example, entities (E) (or instances), classes (C), and attributes. It can contain (property; P) and value (V). Additionally, the ontology components may further include a relationship, a function term, a restriction, a rule, an event, and the like. The relationship may represent, as a non-limiting example, an entity-entity relationship, an entity-class relationship, an entity-attribute relationship, an entity-value (number, text, etc.) relationship. The relationship may be expressed as a predicate outside of the knowledge base 300 (eg, the real world). For example, it may be a predicate representing a relationship between entities "Suwon" and "Daegu", from "Suwon and Daegu meet in the final" to "match".

The knowledge base 300 may store vast amounts of knowledge data based on an ontology. For example, the knowledge base 300 may include knowledge data expressed using a Resource Description Framework (RDF). In one embodiment, a triple may be used as a unit of knowledge data, and the knowledge base 300 may return a triple in response to a query, such as a SPARQL Protocol and RDF Query Language (SPARQL) query. A triple may be composed of a "subject-predicate-object", and an entity may be not only a subject of a triple, but also an object, and may be a predicate in some embodiments. Accordingly, the knowledge data stored in the knowledge base 300 may be referred to as a knowledge graph. The entity and the predicate may each have a unique identifier, such as a Uniform Resource Identifier (URI), and may be accessed by the Unified Resource Identifier.

Entities and predicates (or relationships) may have various expressions (or expressions), for example, as will be described later with reference to Figs. 2A, 2B and 2C, and expressions corresponding to the same entity or the same predicate can be found in the knowledge base ( 300) can be critical in increasing its usefulness. For example, when the knowledge base 300 is used in a question answering system, the user's query may have a vast format and expression, and in order to recognize such a vast expression and provide a response, the knowledge base 300 It may contain multiple expressions corresponding to an object, that is, an entity or a predicate. The knowledge base 300 may associate these expressions with an entity or a predicate using a predicate (or relationship), such as “label”. For example, the knowledge base 300 may include triples such as "eat-label-eat" and "eat-label-take" in relation to the predicate "eat", as will be described later with reference to FIG. 2B. I can.

As will be described later with reference to the drawings, the knowledge base reinforcement system 100 can reinforce the knowledge base 300 by automatically verifying the predicate based on the input data 200 and adding it to the knowledge base 300, Accordingly, the usefulness of the knowledge base 300 may be remarkably improved. The knowledge base reinforcement system 100 may include a preprocessor 110, a predicate evaluation unit 120, and a knowledge base update unit 130, as shown in FIG. 1.

The preprocessor 110 may extract an entity (ENT), a predicate (PRE), and a context (CTX) from the input data 200. For example, as shown in FIG. 1, the entity extracting unit 112 may extract an entity (ENT) from the input data 200, and the predicate extracting unit 114 may extract a predicate ( PRE) may be extracted, and the context extracting unit 116 may extract a context (CTX) from the input data 200. In some embodiments, the entity (ENT), the predicate (PRE), and the context (CTX) are individually extracted, different from that shown in Fig. 1, in the process of commonly processing the input data 200, the entity (ENT) , Predicate (PRE) and context (CTX) may be extracted. The context (CTX) is a unit including an entity (ENT) and a predicate (PRE) in the input data 200, and a length may be determined according to a predefined window size. For example, in the above-described example "Suwon and Daegu meet in the final", "Suwon" and "Daegu" may be extracted as an entity (ENT), and "Attached" as a predicate (PRE) may be extracted, and the context As (CTX), "Finals/In/Suwon/Daegu/Meet" can be extracted.

The entity (ENT), predicate (PRE), and context (CTX) may be extracted from the input data 200 in any manner. For example, the preprocessor 110 may process text included in the input data 200 based on natural language processing including morpheme analysis and syntax analysis. In some embodiments, the preprocessor 110 may extract a triple through sentence analysis included in the text of the input data 200 based on dependency parsing and/or Semantic Role Labeling (SRL). . For example, like the "natural language understanding unit" described in Korean Patent Application No. 10-2018-0150093, filed by the same applicant as the present application and incorporated herein by reference in its entirety, the preprocessing unit 110 is a morpheme It may include at least one of an analysis unit, a syntax analysis unit, an entity name analysis unit, a filtering analysis unit, an intention analysis unit, a domain analysis unit), and a Semantic Role Labeling (SRL) unit. An entity (ENT) and a predicate (PRE) may be extracted from the extracted triple, and a context (CTX) including the entity (ENT) and the predicate (PRE) may be extracted from the input data 200. In some embodiments, as indicated by a dotted arrow in FIG. 1, the operation of extracting a triple from the input data 200 may be performed with reference to the knowledge base 300. In some embodiments, as described later with reference to FIG. 4, the extracted entity ENT may not match the entity included in the knowledge base 300.

Since the predicate (PRE) is extracted from the input data 200 based on the dependencies between words given in a sentence, that is, the roles of the words, the predicates included in the predicate (PRE) and the knowledge base 300 (in this specification) May be referred to as regular predicates) may be in an unknown state. For example, the predicate (PRE) may be one of the expression types of the regular predicate included in the knowledge base 300, may be a new expression type of the regular predicate included in the knowledge base 300, and the knowledge base 300 It may correspond to a predicate that needs to be added as a new regular predicate. In addition, the predicate PRE may correspond to a predicate that is not added to the knowledge base 300 due to an error in the input data 200 and/or the preprocessor 110.

The predicate evaluation unit 120 may receive not only the predicate (PRE), but also an entity (ENT) and context (CTX) related to the predicate (PRE) from the preprocessing unit 110, and may receive an artificial neural network (ANN). Can be used to evaluate the predicate (PRE). An artificial neural network may refer to a structure in which neurons (or neuron models) are interconnected. Artificial neurons can generate outputs by performing simple operations on input data, and outputs can be passed as inputs to other artificial neurons. The predicate evaluation unit 120 may evaluate the predicate PRE based on machine learning, and is not limited to the name of the artificial neural network (ANN). For example, an artificial neural network (ANN) may be referred to as a deep learning network, or a deep neural network (DNN), a convolution neural network (CNN), or a recurrent neural network. ; RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Deep Q-Network. As shown in FIG. 1, the predicate evaluation unit 120 may generate a predicate score vector (SCR) by evaluating a predicate (PRE), and the predicate score vector (SCR) is a predicate (PRE) and a knowledge base (300). ) Can represent relations with regular predicates included in ). An example of the predicate evaluation unit 120 will be described later with reference to FIGS. 3 and 5.

The knowledge base update unit 130 may receive a predicate score vector (SCR) from the predicate evaluation unit 120, and update the knowledge base 300 based on a predicate (PRE) based on the predicate score vector (SCR). You can do it. For example, the knowledge base update unit 130 may determine whether or not to update the knowledge base 300 and an update method based on the predicate score vector (SCR), and knowledge of the predicate (PRE) according to the determined update method. It can be added to the base 300. An example of the operation of the knowledge base update unit 130 will be described later with reference to FIG. 8.

2A, 2B and 2C show examples in which a predicate is added to a knowledge base according to exemplary embodiments of the present invention. Specifically, FIGS. 2A and 2B show an example in which a predicate is added as an expression form of a regular predicate, and FIG. 2C shows an example in which a predicate is added as a new regular predicate. Hereinafter, in the description of FIGS. 2A, 2B, and 2C, overlapping contents will be omitted, and FIGS. 2A, 2B, and 2C will be described with reference to FIG. 1.

Referring to FIG. 2A, the knowledge base 300 may include “accept” as a regular predicate. As described above with reference to FIG. 1, the regular predicate may have a plurality of phenotypes, for example, as shown in FIG. 2A, the regular predicate “accept” has the phenotypes “accept”, “accept”, etc. I can. In the knowledge base 300, the regular predicate "accept" may form triples as the phenotypes "accept", "accept", and the like "label" predicates, respectively. The preprocessor 110 may extract "accept" as a predicate (PRE) from the input data 200, and the predicate evaluating part 120 has the most "accept" among the existing regular predicates. A predicate score vector (SCR) indicating a match may be provided to the knowledge base update unit 130. Accordingly, the knowledge base update unit 130 may obtain the URI of the entity “accept” from the knowledge base 300, and generate a new triple by connecting the obtained URI and “accept” with the predicate “label”. It can be added to the knowledge base 300.

Referring to FIG. 2B, the knowledge base 300 may include “lift” and “eat” as regular predicates. When the preprocessor 110 extracts "lift" as a predicate (PRE) from the input data 200, although "lift" already exists as the expression type of the regular predicate "lift", the predicate evaluation unit 120 The knowledge base update unit (SCR), which indicates that "to hold" is "to hold", which means to eat food or the like based on the entity (ENT) and the context (CTX), which best matches the regular predicate "eat" ( 130). Accordingly, the knowledge base update unit 130 may obtain the URI of the entity "eat" from the knowledge base 300, and generate a new triple by connecting the obtained URI and "Listen" with the predicate "label" It can be added to the base 300.

Referring to FIG. 2C, the preprocessor 110 may extract "follow" or "follow" from the input data 200. "Follow" is a term used in social network services, etc., and may be used as a meaning of establishing a relationship with another user, and the knowledge base 300 may not include a regular predicate corresponding thereto. Based on the entity (ENT) and the context (CTX), the predicate evaluating unit 120 determines that the predicate (PRE) “follows” or “follows” does not match any of the regular predicates included in the current knowledge base 300. A predicate score vector (SCR) indicating non-matching may be provided to the knowledge base update unit 130. Accordingly, the knowledge base update unit 130 may generate a new URI of "follow1" as a new regular predicate, or request the creation of a new URI from the knowledge base 300, and the new URI and "follow" and "follow" New triples may be created and added to the knowledge base 300 by connecting each of "follow" with the predicate "label". In some embodiments, the knowledge base update unit 130 provides a predicate score vector (SCR) indicating that it does not match any of the regular predicates included in the knowledge base 300 from the predicate evaluation unit 120. In this case, information including a predicate (PRE), an entity (ENT), and a context (CTX) may be provided externally or separately recorded for curation by the administrator.

3 is a block diagram showing a predicate evaluating unit 120 according to an exemplary embodiment of the present invention. As described above with reference to FIG. 1, the predicate evaluating unit 120 may receive an entity (ENT), a predicate (PRE), and a context (CTX) from the preprocessing unit 110, and obtain a predicate score vector (SCR). Can be printed. As shown in FIG. 3, the predicate evaluation unit 120 may include a vector generation unit 121 and an artificial neural network (ANN). In the following, FIG. 3 will be described with reference to FIG. 1.

The vector generator 121 may generate vectors corresponding to each of the entity (ENT), the predicate (PRE), and the context (CTX) by referring to the word vector model 400. The word vector model 400 may refer to a multidimensional space in which a word (or token, word, etc.) having meaning is expressed as a single coordinate, that is, a word vector, or a system that includes word vectors and updates word vectors. . In some embodiments, the word vector model 400 may include an artificial neural network, and may be learned by machine learning. Words that are semantically similar may be arranged adjacent to each other in a multidimensional space, and accordingly, word vectors corresponding to words that are semantically similar may have similar values. In some embodiments, the word vector model 400 may be included in the knowledge base reinforcement system 100, and the predicate evaluation unit 120 is applied to the word vector model 400 outside the knowledge base reinforcement system 100. You can also access it. An example of the vector generator 121 will be described later with reference to FIG. 4.

In order for the artificial neural network (ANN) to properly output the predicate score vector (SCR), the artificial neural network (ANN) receives vectors corresponding to the entities (ENT), predicates (PRE), and contexts (CTX) each having a meaning. I can. For example, as shown in FIG. 3, the vector generation unit 121 refers to the word vector model 400 and refers to an entity vector corresponding to an entity (ENT), a predicate (PRE), and a context (CTX). '), a predicate vector (PRE'), and a context vector (CTX') can be output. The artificial neural network (ANN) may be in a learned state based on samples of an entity vector (ENT'), a predicate vector (PRE'), and a context vector (CTX'). In some embodiments, an artificial neural network (ANN) may be trained based on reinforcement learning. As described above with reference to FIG. 1, the predicate score vector (SCR) output from the artificial neural network (ANN) may indicate which regular predicate matches when the predicate (PRE) matches regular predicates, and the predicate (PRE ) May indicate that it corresponds to a new regular predicate, or it may indicate that it does not match any regular predicates including the new regular predicate.

4 is a block diagram showing a vector generator 40 according to an exemplary embodiment of the present invention. As described above with reference to FIG. 3, the vector generator 40 includes an entity vector (ENT'), a context vector (CTX'), and a predicate vector (PRE) from an entity (ENT), a context (CTX), and a predicate (PRE). ') can be created. As shown in FIG. 4, the vector generation unit 40 may include an entity vector generation unit 41, a context vector generation unit 42, and a predicate vector generation unit 43. It will be explained with reference to 3.

In some embodiments, the word vector model 400 of FIG. 3 may include a first word vector model 410 and a second word vector model 420, and the vector generator 40 is a first word vector model. Reference may be made to the model 410 and the second word vector model 420. The first word vector model 410 may provide a word vector corresponding to a word included in text of the input data 200. Accordingly, the context vector generation unit 42 and the predicate vector generation unit 43 refer to the first word vector model 410, and the context vector (CTX') and the predicate vector ( PRE') can be created respectively. In some embodiments, the first word vector model 410 may be referred to as a word vector model for phenotype.

The second word vector model 420 may provide an entity vector representing an entity included in the knowledge base 300. For example, in the above-described example "Suwon and Daegu meet in the final", "Suwon" and "Daegu" may mean sports teams related to a city rather than a corresponding city. The entity vector generator 41 refers to the second word vector model 420 so that the entities extracted by the preprocessor 110 correspond to one of the entities included in the knowledge base 300. ') can be created. For example, the entity vector generator 41 may obtain a word vector corresponding to the entity ENT with reference to the first word vector model 410, and obtain the word vector with reference to the second word vector model 420 An entity vector (ENT') may be generated from the converted word vector and the context vector (CTX'). Accordingly, the entity vector generation unit 41 has knowledge due to the context vector (CTX') generated from the context (CTX) including the "final/in" from the above-described exemplary entities "Suwon" and "Daegu" Among the entities included in the base 300, an entity vector ENT' corresponding to entities representing a sports team rather than entities representing a city may be generated. In some embodiments, while learning the artificial neural network (ANN), the entity vector ENT' may be directly provided by the user by curation. In some embodiments, the second word vector model 420 may be referred to as an entity linking model.

5 is a diagram illustrating an example of an operation of the predicate evaluating unit 120 according to an exemplary embodiment of the present invention. As described above with reference to FIG. 3, the predicate evaluating unit 120 includes an entity vector (ENT'), a predicate vector (PRE'), and a context respectively corresponding to an entity (ENT), a predicate (PRE), and a context (CTX). A vector (CTX') can be received and a predicate score vector (SCR) can be output. In the following, FIG. 5 will be described with reference to FIG. 3.

In some embodiments, the predicate evaluation unit 120 may receive a plurality of vector pairs each composed of an entity (ENT) and a context (CTX) including the same, and the vector generation unit 121 is a word vector model 400 ), a plurality of vector pairs and corresponding vectors may be generated. For example, as shown in FIG. 5, the vector generator 121 includes a first vector pair (PAIR1) and a second context vector consisting of a first context vector (CTX1') and a first entity vector (ENT1'). A second vector pair PAIR2 consisting of (CTX2') and a second entity vector ENT2' may be generated. The first context vector CTX1 ′ and the second context vector CTX2 ′ may commonly include a vector corresponding to an expression similar to the predicate vector PRE′ or the predicate vector PRE′.

In some embodiments, the predicate evaluation unit 120 may include a first artificial neural network ANN1 and a second artificial neural network ANN2. As shown in FIG. 5, the first predicate evaluation unit ANN1 may receive a first vector pair PAIR1 and a second vector pair PAIR2, and may generate a semantic vector SEM′ therefrom. . The semantic vector (SEM') may represent the meaning of the entities (ENT1, ENT2) and the predicate (PRE) in the contexts (CTX1, CTX2), and 2 related to the predicate (PRE) due to two or more vector pairs. It may have a meaning including the above usage aspects. As shown in FIG. 5, the second artificial neural network ANN2 may receive a semantic vector SEM' and a predicate vector PRE', and may output a predicate score vector SCR. In some embodiments, different from that shown in FIG. 5, a first vector pair (PAIR1), a second vector pair (PAIR2), and a predicate vector (PRE') in one artificial neural network (eg, ANN of FIG. 3) May be provided. In some embodiments, the first artificial neural network (ANN1) is trained by passing the errors generated from the predicate score vector (SCR) in the training step to the first artificial neural network (ANN1) through the second artificial neural network (ANN2). I can.

The predicate score vector SCR may include a first vector V1 representing a degree of matching with regular predicates included in the knowledge base 300 and a second vector V2 corresponding to a new regular predicate. For example, as shown in FIG. 5, when the knowledge base 300 includes n regular predicates (n is an integer greater than 1), the first vector V1 corresponds to n regular predicates, respectively. It may include n elements (RP ₁ , RP ₂ , RP ₃ ,..., RP _n ), and the second vector V2 may include one element (R _n+1 ). An example of an operation of updating the knowledge base 300 according to the value of the predicate score vector SCR will be described later with reference to FIG. 8.

Fig. 6 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention. As shown in FIG. 6, the method for reinforcing the knowledge base may include a plurality of steps S20, S40, S60, and S80. In some embodiments, the method of FIG. 6 may be performed by the knowledge base augmentation system 100 of FIG. 1, and FIG. 6 will be described below with reference to FIG. 1.

In step S20, an operation of extracting a predicate (PRE), an entity (ENT), and a context (CTX) may be performed. For example, the preprocessor 110 may process text included in the input data 200 in natural language. For example, the predicate (PRE), the entity (ENT), and the context (CTX) may be referred to the knowledge base 300. Can be extracted.

In step S40, an operation of obtaining a predicate score vector (SCR) may be performed. For example, the predicate evaluation unit 120 uses the learned artificial neural network (ANN) to obtain a predicate score vector (SCR) from the predicate (PRE), entity (ENT), and context (CTX) extracted in step S20. can do. An example of step S40 will be described later with reference to FIG. 7.

In step S60, an operation of determining whether to update the knowledge base 300 may be performed. For example, the knowledge base updater 130 may determine whether to reflect the predicate PRE to the knowledge base 300 based on values included in the predicate score vector SCR. In some embodiments, the knowledge base updater 130 may determine whether to reflect the predicate PRE to the knowledge base 300 based on a maximum value among values included in the predicate score vector SCR. An example of step S60 will be described with reference to FIG. 8. As shown in FIG. 6, when it is determined that the knowledge base 300 is updated, step S80 may follow, while otherwise, the method of FIG. 6 may end.

In step S80, an operation of adding the predicate PRE to the knowledge base 300 may be performed. For example, the knowledge base update unit 130 may add a predicate (PRE) as an expression type of a regular predicate included in the knowledge base 300, as described above with reference to FIGS. 2A and 2B, and FIG. 2C As described above with reference to, a new regular predicate different from the regular predicates included in the knowledge base 300 may be generated, and a predicate (PRE) may be added as a phenotype of the new regular predicate. An example of step S80 will be described later with reference to FIG. 8.

Fig. 7 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention. Specifically, the flowchart of FIG. 7 shows an example of step S40 of FIG. 6. As described above with reference to FIG. 6, an operation of obtaining a predicate score vector (SCR) may be performed in step S40' of FIG. 7. As shown in FIG. 7, step S40' may include steps S42 and S44. In some embodiments, step S40' may be performed by the predicate evaluating unit 120, and FIG. 7 will be described below with reference to FIGS. 3 and 5.

In step S42, an operation of obtaining a semantic vector (SEM') from the entity (ENT) and the context (CTX) may be performed. For example, the vector generator 121 may generate an entity vector ENT' and a context vector CTX' from the entity ENT and the context CTX. The entity vector ENT' and the context vector CTX' may be provided to the first artificial neural network ANN1, and the first artificial neural network ANN1 may output a semantic vector SEM'. In addition, as described above with reference to FIG. 5, the vector generator 121 may receive a plurality of entities and a plurality of contexts, and may generate a plurality of vector pairs. A plurality of vector pairs may be provided to the first artificial neural network ANN1, and the first artificial neural network ANN1 may output a semantic vector SEM'.

In step S44, an operation of obtaining a predicate score vector SCR from the semantic vector SEM' and the predicate PRE may be performed. For example, the vector generator 121 may generate a predicate vector PRE' from the predicate PRE. The second artificial neural network ANN2 may receive a semantic vector SEM' and a predicate vector PRE', and may output a predicate score vector SCR.

Fig. 8 is a flow chart showing a method for reinforcing a knowledge base according to an exemplary embodiment of the present invention. Specifically, the flowchart of FIG. 8 shows examples of steps S60 and S80 of FIG. 6. As described above with reference to FIG. 6, it may be determined whether the knowledge base 300 is updated in step S60' of FIG. 8, and an operation of adding a predicate (PRE) to the knowledge base 300 in step S80' is performed. Can be done. As shown in FIG. 8, step S60' may include steps S62 and S64, and step S80' may include steps S82, S84, and S86. In some embodiments, the method of FIG. 8 may be performed by the knowledge base update unit 130 of FIG. 1, and FIG. 8 will be described below with reference to FIGS. 1 and 5.

In step S62, an operation of detecting a maximum value among elements included in the predicate score vector SCR may be performed. For example, as described above with reference to FIG. 5, the predicate score vector (SCR) may include n+1 elements (RP ₁ ,..., RP _n+1 ), and the knowledge base update unit 130 may detect a maximum value RP _k of n+1 elements RP ₁ ,..., RP _n+1 (1≤k≤n+1). In some embodiments, the knowledge base update unit 130 may further detect not only the largest value (which may be referred to as the first maximum value) but also the second largest value (which may be referred to as the second maximum value). May be.

In step S64, an operation of comparing a maximum value among elements of the predicate score vector SCR with a predefined criterion may be performed. For example, as shown in FIG. 8, the knowledge base update unit 130 may _{compare the maximum value RP k} detected in step S62 with the threshold value THR. As shown in FIG. 8, when the maximum value RP _k is greater than or equal to the threshold value THR, that is, when the predicate PRE has a high degree of matching with a specific regular predicate or a new regular predicate, step S82 of step S80' follows. can do. On the other hand, when the maximum value RP _k is less than the threshold value (THR), that is, the predicate (PRE) does not remarkably match any of the n regular predicates included in the knowledge base 300, and even with the new regular predicate. If not significantly matched, step S80' may not be performed. In some embodiments, differently from that shown in FIG. 8, the knowledge base update unit 130 compares the difference between the first maximum value and the second maximum value detected in step S62 with a predefined reference. Can be. Accordingly, when the first maximum value and the second maximum value exceed a predefined criterion, step S80' may be subsequently performed, while otherwise, step S80' may not be performed.

In step S82, an operation of determining whether the detected maximum value belongs to the first vector V1 of the predicate score vector SCR may be performed. For example, a knowledge base update unit 130 may check whether or not the index k is 1 or greater than n corresponding to the maximum value _k RP. As shown in FIG. 8, when the maximum value RP _k belongs to the first vector V1, that is, the predicate PRE is one of the regular predicates of the knowledge base 300 (that is, a regular predicate corresponding to _{RP k).} ), step S84 may be performed subsequently. On the other hand, if the maximum value RP _k does not belong to the first vector (V1) (or if the maximum value RP _k belongs to the second vector (V2)), that is, the predicate (PRE) corresponds to the new normal predicate. If so, it may be performed subsequent to step S86.

In step S84, an operation of adding the predicate PRE to the expression type of the kth regular predicate may be performed. For example, the knowledge base update unit 130 may add a predicate (PRE) to the expression type of the kth regular predicate corresponding to the _{maximum value RP k among n regular predicates.} Accordingly, a triple including the kth regular predicate and the predicate PRE connected by the predicate “label” may be added to the knowledge base 300.

On the other hand, in step S86, an operation of adding the predicate PRE as an n+1th regular predicate may be performed. For example, the knowledge base update unit 130 may add a predicate (PRE) as an n+1th regular predicate different from the existing n regular predicates as a new regular predicate. Accordingly, a URI of an entity corresponding to the n+1th regular predicate may be generated, and a corresponding entity connected by the predicate “label” and a triple including the predicate PRE may be added to the knowledge base 300.

In some embodiments, the knowledge base updater 130 may determine whether to update the knowledge base 300 and an update method using a plurality of threshold values, differently from that shown in FIG. 8. For example, the knowledge base update unit 130 may include a threshold value used when determining whether to add to one of the existing regular predicates, that is, a threshold value at which elements included in the first vector V1 are compared, and A threshold value used when determining whether to add as a new regular predicate, that is, a threshold value at which elements included in the second vector V2 are compared may be set differently. In addition, in some embodiments, the knowledge base update unit 130 may calculate not only the maximum value of the elements of the predicate score vector (SCR), but also statistical characteristics, such as average and variance, and calculate the calculated values at least one It is also possible to determine whether or not to update and how to update by comparing with a threshold value.

As described above, exemplary embodiments have been disclosed in the drawings and specification. In the present specification, embodiments have been described using specific terms, but these are only used for the purpose of describing the technical idea of the present invention, and are not used to limit the meaning or the scope of the present invention described in the claims. . Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of the present invention should be determined by the technical spirit of the appended claims.

Claims

As a system to reinforce the knowledge data of the knowledge base,

A predicate extraction unit configured to extract a predicate from input data;

An entity extracting unit configured to extract an entity from the input data;

A context extracting unit configured to extract a context of the entity including the predicate from the input data;

A predicate evaluation unit configured to obtain a predicate score vector from the predicate, the entity, and the context based on the learned artificial neural network; And

And a knowledge base update unit configured to determine whether to update the knowledge base based on the predicate score vector, and update the knowledge base based on the predicate according to a determination result.
The method according to claim 1,

And the predicate evaluating unit is configured to obtain the predicate score vector from a plurality of entities extracted from the input data and a plurality of contexts each corresponding to the plurality of entities and including the predicate.
The method according to claim 1,

And the predicate evaluation unit is configured to obtain vectors corresponding to the predicate, the entity, and the context by referring to a word vector model, and to provide the obtained vectors to the artificial neural network.
The method according to claim 1,

The predicate evaluation unit,

Based on the learned first artificial neural network, obtain a semantic vector from the entity and the context,

And obtaining the predicate score vector from the semantic vector and the predicate based on the learned second artificial neural network.
The method according to claim 1,

The predicate score vector is,

A first vector indicating a degree of matching between the predicate and regular predicates included in the knowledge base; And

And a second vector indicating the possibility of new generation of the predicate.
The method of claim 5,

The knowledge base update unit is configured to add the predicate to a phenotype of a regular predicate corresponding to the maximum value when a maximum value among values included in the predicate score vector is greater than or equal to a predefined reference value and is included in the first vector. A system, characterized in that.
The method of claim 5,

The knowledge base update unit is configured to add the predicate to the knowledge base as a new regular predicate when a maximum value among values included in the predicate score vector is greater than or equal to a predefined reference value and is included in the second vector. System.
The method of claim 5,

The knowledge base update unit, when a maximum value among values included in the predicate score vector is less than a predefined reference value, terminates updating the knowledge base based on the predicate.
As a method to reinforce the knowledge data of the knowledge base,

Extracting, from input data, a predicate, an entity, and the context of the entity including the predicate;

Obtaining a predicate score vector from the predicate, the entity, and the context based on the learned artificial neural network;

Determining whether to update the knowledge base based on the predicate score vector; And

In accordance with the determination result, including the step of updating the knowledge base based on the predicate,

The predicate score vector is,

A first vector indicating a degree of matching between the predicate and regular predicates included in the knowledge base; And

And a second vector indicating the possibility of a new generation of the predicate.
The method of claim 9,

Obtaining the predicate score vector,

Obtaining a semantic vector from the entity and the context based on the learned first artificial neural network; And

And obtaining the predicate score vector from the semantic vector and the predicate based on the learned second artificial neural network.