KR101739540B1 - System and method for building integration knowledge base based - Google Patents

System and method for building integration knowledge base based Download PDF

Info

Publication number
KR101739540B1
KR101739540B1 KR1020160010071A KR20160010071A KR101739540B1 KR 101739540 B1 KR101739540 B1 KR 101739540B1 KR 1020160010071 A KR1020160010071 A KR 1020160010071A KR 20160010071 A KR20160010071 A KR 20160010071A KR 101739540 B1 KR101739540 B1 KR 101739540B1
Authority
KR
South Korea
Prior art keywords
integrated
data
entity
knowledge
knowledge data
Prior art date
Application number
KR1020160010071A
Other languages
Korean (ko)
Inventor
이경일
정교성
Original Assignee
주식회사 솔트룩스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 솔트룩스 filed Critical 주식회사 솔트룩스
Priority to KR1020160010071A priority Critical patent/KR101739540B1/en
Application granted granted Critical
Publication of KR101739540B1 publication Critical patent/KR101739540B1/en

Links

Images

Classifications

    • G06F17/30286
    • G06F17/30289
    • G06F17/30345
    • G06F17/30604

Abstract

An integrated knowledge base construction system according to the present invention includes: an individual knowledge base building module for converting external data received from a first data server to generate first internal knowledge data; And a knowledge base integration module for integrating the integrated knowledge data and the first internal knowledge data, wherein the knowledge base integration module compares entities of the integrated knowledge data with the object of the first internal knowledge data, Searching the integrated knowledge data for selecting an integrated entity candidate of knowledge data, generating similarity between the target entity and the integrated entity candidate, and selecting an integrated entity based on the similarity among the integrated entity candidates And an integrated entity conversion unit for adding the object related data among the first internal knowledge data to the integrated knowledge data using integrated information including a selection result for the integrated entity.

Figure R1020160010071

Description

[0001] SYSTEM AND METHOD FOR BUILDING INTEGRATION KNOWLEDGE BASE BASED [0002]

Technical aspects of the present invention relate to an integrated knowledge base building system, and more particularly, to a system and method for building an integrated knowledge base including a knowledge base integration module.

The present invention is derived from research conducted and conducted by Saltlux Co., Ltd. as part of the SW Technology Computing Industry Source Technology Development Project (SW) of the future Creation Science Department. [Research period: 2015.03.01 ~ 2016.02.29] Research institute: Information and Communication Technology Promotion Center, Research title: WiseKB: Development of self-learning knowledge base and reasoning technology based on big data understanding, 15-0054]

Korean knowledge LOD (Linked Open Data) is limited to some special knowledge. There is also a knowledge base which includes information on general knowledge such as DBPedia, but the construction of knowledge base constructed from various data sources is not applied. Therefore, there is a need for a means for integrating the data of various date sources into one integrated knowledge data and building an integrated knowledge base.

The technical idea of the present invention is to provide a system and method for constructing an integrated knowledge base system.

An integrated knowledge base construction system according to the present invention includes: an individual knowledge base building module for converting external data received from a first data server to generate first internal knowledge data; And a knowledge base integration module for integrating the integrated knowledge data and the first internal knowledge data, wherein the knowledge base integration module compares entities of the integrated knowledge data with the object of the first internal knowledge data, Searching the integrated knowledge data for selecting an integrated entity candidate of knowledge data, generating similarity between the target entity and the integrated entity candidate, and selecting an integrated entity based on the similarity among the integrated entity candidates And an integrated entity converting unit for adding the object related data among the first internal knowledge data to the integrated knowledge data, based on the integrated information including the selection result for the integrated entity.

In addition, the individual knowledge base rescue module may convert external data received from the second data server to generate second knowledge data separately, and the knowledge base integration module may extract the integrated knowledge data and the second knowledge data Integration.

The entity similarity analyzing unit may be configured to classify a predetermined attribute (or relationship) of a target entity of the first inner knowledge data and a value (or an entity) corresponding thereto and the predetermined property (Or relationships) and corresponding values (or entities) of the integrated knowledge data to compare attributes (or relationships) with the predetermined attributes (or relationships) of the objects and the corresponding values (or entities) And an entity having the same value (or entity) is selected as the integrated entity candidate.

The entity similarity analyzing unit may generate similarity between the target entity and the candidate entity based on at least one of graph information, ontology information, and syntax information corresponding to the target entity and the candidate entity, And the integration entity is selected based on the similarity degree generated among the integration entity candidates.

The integrated entity conversion unit may convert the identifier of the target entity into the integrated identifier of the integrated entity.

Also, when the object similarity analyzing unit fails to select an integrated entity candidate of the integrated knowledge data or fails to select the integrated entity among the integrated entity candidates, the integrated entity conversion unit generates a new integrated identifier, And converts an identifier for the entity into the new unified identifier.

The integrated knowledge base establishing system may further include a curation module for selecting one of the plurality of integrated entities based on input data received from the outside when the selected integrated entity is a plurality .

In addition, the integrated knowledge base construction system may further include a function of adding or deleting entity-related data of the integrated knowledge data based on input data received from outside, changing the integrated knowledge data, detecting the changed integrated knowledge data And a curation module for generating change data information.

The knowledge base integration module may further include an integrated knowledge data updating unit that stores the change data information received from the curation module and updates the integrated knowledge data based on the change information.

The integrated knowledge data updating unit compares the changed data information with the selected integrated entity related data, and determines whether to add the integrated entity related data based on the comparison result.

According to the technical idea of the present invention, it is possible to update the integrated knowledge data continuously based on the input data, to add the external data received from the new data server to the integrated knowledge data, .

1 is a block diagram illustrating an integrated knowledge base building system and its input / output relationship according to an embodiment of the present invention.
FIGS. 2A and 2B are block diagrams illustrating an embodiment of an individual knowledge base building module according to an embodiment of the present invention.
FIG. 2C is a view for explaining an operation method of an individual knowledge base module according to an embodiment of the present invention.
FIG. 3A is a block diagram illustrating an embodiment of a knowledge base integration module according to an embodiment of the present invention.
3B is a diagram illustrating an integrated knowledge base including integrated knowledge data generated by a knowledge base integration module according to an embodiment of the present invention.
4A is a block diagram illustrating an embodiment of an entity similarity analyzer according to an embodiment of the present invention.
FIG. 4B is a diagram for explaining the operation of the integrated entity candidate search unit according to an embodiment of the present invention.
4C is a block diagram illustrating an embodiment of an entity similarity calculation unit according to an embodiment of the present invention.
4D to 4E are views for explaining the operation of the integrated entity selecting unit according to an embodiment of the present invention.
FIG. 5A is a block diagram illustrating an example of an integrated knowledge base building system according to an embodiment of the present invention.
5B and 5C are views for explaining the operation of the curation module according to an embodiment of the present invention.
6A is a block diagram illustrating an exemplary implementation of a knowledge base integration module and a curation module according to an embodiment of the present invention.
6B is a view for explaining the operation of the curation module and the knowledge base integration module according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating an operation of building an integrated knowledge base according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. The present invention is capable of various modifications and various forms, and specific embodiments are illustrated and described in detail in the drawings. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for similar elements in describing each drawing. In the accompanying drawings, the dimensions of the structures are enlarged or reduced from the actual dimensions for the sake of clarity of the present invention.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "comprises", "having", and the like are used to specify that a feature, a number, a step, an operation, an element, a part or a combination thereof is described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, components, parts, or combinations thereof.

Also, the terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

1 is a block diagram illustrating an integrated knowledge base building system and its input / output relationship according to an embodiment of the present invention.

Linked Open Data (LOD) can refer to formatted data provided on a network in a form of representing resource information on a network, such as RDF (Resource Description Framework), in an integrated knowledge base construction system on a network. Such an LOD may include a plurality of entities, each of which may be accessed and used via an identifier, such as a Uniform Resource Identifier (URI), and may be represented on the web by using an HTTP protocol or the like. Data sharing between systems is possible by LOD, and a knowledge base containing vast amount of data can be implemented. Hereinafter, the knowledge data may refer to the LOD, which is the data in which the knowledge information is formulated.

1, the integrated knowledge base establishing system 100 may include an individual knowledge base building module 120, a knowledge base integrating module 140, and a curation module 160. As shown in FIG. The integrated knowledge base construction system 100 can exchange data with a database (or a system, a server) 300 through a network 400. The network 400 may be a communication network using an Internet protocol such as an extranet, an intranet, or the like.

The database 300 may indicate that data such as the Internet, cloud sourcing, and social networks may be generated, maintained and distributed. The database 300 may include a plurality of data servers 300-1 to 300-n. The plurality of data servers 300-1 to 300-n may include different unstructured data, semi-structured data, and structured data. Unstructured data is data that is not implemented in a fixed form and is compared with formal data (structured data) containing contents corresponding to a corresponding field. For example, a database or a spreadsheet may be regular data, and a text document, voice data, image data, and the like may be unstructured data. Although XML or HTML may not be stored in a fixed field but may include metadata or schema, XML or HTML may be classified as semi-structured data, but the present invention is not limited to semi-structured data, It can be assumed that it is assumed to be a work type.

The individual knowledge base building module 120 can receive a plurality of external data from the database 300 via the network 400. [ The individual knowledge base building module 120 can convert the external data received by the data servers 300-1 to 300-n individually into internal knowledge data. For example, the individual knowledge base building module 120 converts the first external data received from the first data server 300-1 into first internal knowledge data, and receives the first internal knowledge data from the second data server 300-2 And convert the received second external data into second internal knowledge data. However, this is an embodiment, and each of the data servers 300-1 to 300-n may include a plurality of data sources. For example, a data server, such as a divider, may contain multiple data sources per language. The detailed conversion process will be described in FIGS. 2A and 2B. In an embodiment, the individual knowledge base building module 120 includes a plurality of internal knowledge bases, generates internal knowledge data converted into the data servers 300-1 to 300-n, And may provide the internal knowledge data to the knowledge base integration module 140. [ The internal knowledge data includes first internal knowledge data (generated by converting the first external data of the first data server 300-1) to nth internal knowledge data (nth external data of the nth data server 300-n) May be transformed and generated).

The knowledge base integration module 140 may integrate the received internal knowledge data with the integrated knowledge data stored in the integrated knowledge base 200. [ The target entity of the internal knowledge data to be described below may be any entity selected from a plurality of entities of the internal knowledge data to be integrated and selected when performing the integration operation. A plurality of entities of the internal knowledge data are sequentially selected as a target entity, and a plurality of integrating operations using the selected entities can be performed. In one embodiment, the knowledge base integration module 140 compares a plurality of entities of the integrated knowledge data with a target entity of the first inner knowledge data, . ≪ / RTI > Thereafter, the knowledge base integration module 140 may calculate a similarity degree between the object entity and the selected integration entity candidate, and select an integration entity based on the similarity among the integration entity candidates. The knowledge base integration module 140 extracts data related to the selected integration entity from the first internal knowledge data based on the integration information including the selection result for the integration entity, An integrated operation for building a knowledge base can be performed. In addition, when a predetermined entity among the objects of the first inner knowledge data is selected as the target entity, and the integrated entity candidate or the integrated entity of the integrated knowledge data is not selected, 1 new integrated identifier to integrate the predetermined entity related data among the internal knowledge data into the integrated knowledge data. By converting the identifier of the predetermined entity into the new integrated identifier, an integration operation can be performed. Such an integration operation may be applied to internal knowledge data of another internal knowledge base in addition to the first internal knowledge base, and a detailed description thereof will be described later.

The knowledge base integration module 140 may activate the curation module 160 based on the number of selected integration entities. In one embodiment, the knowledge base integration module 140 may stop the integration operation and activate the curation module 160 when the number of selected integration entities is two or more. The curation module 160 may receive the input data 500 through the communication channel 5. [ The communication channel 5 may be a communication network using an Internet protocol, and may be a one-to-one channel using a serial or parallel interface. Based on the received input data 500, the curation module 160 can select one integrated entity from two or more selected integrated entities. The knowledge base integration module 140 may resume the integration operation after the curation module 160 completes the selection of the integrated entity and the integration module 140 may resume the integration operation after the curation module 160 selects one Can be added. A detailed description thereof will be described later.

In one embodiment, the curation module 160 adds or deletes entity-related information to the integrated knowledge data extracted from the integrated knowledge base 200 based on the input data 500, changes the integrated knowledge data , The changed integrated knowledge data can be detected and the changed data information can be generated. The knowledge base integration module 120 may update the integrated knowledge data based on the change information. In one embodiment, the input data 500 includes selection criteria information (e.g., similarity-related reference values) for selecting one or more integration entities among a plurality of selected integration entities, and change criterion information for changing the integrated knowledge data (E.g., censorship related information, fashion data information, etc.).

According to the integrated knowledge base establishing system 100 according to the present invention, the integrated knowledge data can be continuously updated based on the input data, and the external data received from the new data server can be added to the integrated knowledge data, It is possible to secure data.

FIGs. 2A and 2B are block diagrams illustrating an embodiment of an individual knowledge base building module according to an embodiment of the present invention, and FIG. 2C is a diagram illustrating an operation method of an individual knowledge base module.

As shown in FIG. 2A, the individual knowledge base building module 120 may include an internal knowledge base conversion unit 122 and an internal knowledge base 124. [ The internal knowledge base conversion unit 122 can convert the external data received by the data servers 300-1 to 300-n individually into internal knowledge data. 2B, the internal knowledge base conversion unit 122 includes a data normalization unit 122a, a data correction unit 122b, a knowledge data conversion unit 122c, and a conversion rule data storage unit 122d. In one embodiment, the internal knowledge base conversion unit 122 converts the first external data received from the first data server 300-1 (including unstructured data or formatted data) You can normalize the data in a format that you can do. The data refinement unit 122b can classify and refine data from the normalized first external data through natural language processing and meaning giving. The knowledge data converting unit 122c may convert the first external data into the first internal knowledge data by performing screening and semantic integration on the normalized and refined first external data. The conversion rule data storage unit 122d may store conversion rule information based on the knowledge data conversion unit 122c when performing the data conversion operation. In this manner, the internal knowledge base conversion unit 122 can convert the first to n-th external data received by the data servers 300-1 to 300-n individually into first to n-th internal knowledge data have.

The inner knowledge base conversion module 122 may provide the first to nth inner knowledge data to the knowledge base integration module 140, May be stored in different areas within the internal knowledge base 124, and flag information indicating a data server corresponding to the stored internal knowledge data may be stored in each of the areas. The knowledge base integration module 140 can selectively extract the internal knowledge data corresponding to each data server from the internal knowledge base 124 by referring to the flag information.

FIG. 2C illustrates an internal knowledge base 10, 20 including internal knowledge data in which the individual knowledge base building module 120 transforms external data. In one embodiment, the first inner knowledge base 10 may store the first inner knowledge data generated by transforming the first outer data received from the first data server 300-1. The second internal knowledge base 20 may store the second internal knowledge data generated by converting the second external data received from the second data server 300-2.

Internal knowledge data included in each internal knowledge base can be expressed in the form of a triple like RDF. For example, the first internal knowledge data may be expressed in a triple form including a first entity, a relation between a first entity and a second entity, a second entity, or a first entity, an attribute (or a data type) Can be expressed in a triple form including a value. In addition, the identifier of the internal knowledge data may include path information for accessing the entity included in such a triple, and access to knowledge data related to the entity, that is, triple data, as well as the entity through the identifier. In addition, an entity may have a textual value for an attribute called a label, so that the knowledge data may include triple data of an entity, a label, and a value. However, the identifier and the entity are shown separately in the drawings for convenience of explanation, but the identifier corresponds to one means that can express the identity of the entity, and the identifier may be the same as the entity. For example, the identifier 'URI_1_A1' of the first internal knowledge data of the first internal knowledge base 10 may be the same as each other, indicating the object 'A1'.

The first inner knowledge data of the first inner knowledge base 10 and the second inner knowledge data of the second inner knowledge base 20 may each include a plurality of triple data, And the triple data of the second inner knowledge data can be accessed through the second type identifier. The internal knowledge base conversion unit 122 may generate the first type identifier and the second type identifier based on different conversion rules, respectively.

In one embodiment, the first inner knowledge data of the first inner knowledge base 10 may include data corresponding to five triples with respect to entity 'A1' accessed by the identifier 'URI_1_A1'. For example, the first inner knowledge data may include a plurality of triple data, such as triple data (A1-P1-N1) meaning an object 'A1' and a value 'N1' In addition, the second inner knowledge data of the second inner knowledge base 20 may include data corresponding to five triples with respect to the entity 'A2' accessed by the identifier 'URI_2_A2'. For example, the second inner knowledge data may include a plurality of triple data, such as triple data (A2-P1-N2), which means entity 'A2' and value 'N2' of attribute 'P1'. However, the first internal knowledge base 10 and the second internal knowledge base 20 shown are by way of example only and may include various numbers of triple data.

FIG. 3A is a block diagram illustrating an exemplary embodiment of a knowledge base integration module according to an exemplary embodiment of the present invention. FIG. 3B is a block diagram illustrating an integrated knowledge base including integrated knowledge data generated by a knowledge base integration module according to an exemplary embodiment of the present invention. Fig.

As shown in FIG. 3A, the integrated knowledge base construction system 100 may include an individual knowledge base construction module 120 and a knowledge base integration module 140. The knowledge base integration module 140 may include an object similarity analysis unit 142, an integrated information storage unit 144, and an integrated entity conversion unit 146. The knowledge base integration module 140 can build an integrated knowledge base by integrating a plurality of internal knowledge data stored in the individual internal knowledge base generated by the individual knowledge base building module 120 into integrated knowledge data. In one embodiment, the knowledge base integration module 140 may select one of a plurality of internal knowledge bases and integrate the internal knowledge data of the selected internal knowledge base with the integrated knowledge data of the integrated knowledge base. However, it is not limited to this, and knowledge data of a plurality of internal knowledge bases can be integrated with the integrated knowledge data at the same time. A detailed description of the integration will be given later.

Referring to FIG. 3B, the knowledge base integration module 140 may generate an integrated knowledge base 30 including integrated knowledge data.

For example, when the knowledge base integration module 140 starts the integration operation for the internal knowledge data of the internal knowledge base for the first time when the entity candidates of the integrated knowledge data corresponding to the target object of the internal knowledge data can not be selected, The object conversion unit 146 may select at least one internal knowledge base among a plurality of internal knowledge bases and generate a new integrated identifier for converting the internal knowledge data of the selected internal knowledge base into the integrated knowledge data. In one embodiment, the integration entity transformer 146 may select the first inner knowledge base 10 disclosed in FIG. 2C to generate new unified identifiers to transform the first inner knowledge data into the aggregate knowledge data . However, the present invention is not limited to this, and the integrated entity conversion unit 136 may select a plurality of internal knowledge bases and generate new integrated identifiers to simultaneously convert a plurality of internal knowledge data into integrated knowledge data And the integrated knowledge data generated in this way can be stored in the integrated knowledge base 30.

The integrated knowledge data stored in the integrated knowledge base 30 can be expressed in a triple form such as RDF as the internal knowledge data of the internal knowledge base described above.

However, in order to integrate knowledge data of a plurality of internal knowledge bases, triple data of integrated knowledge data can be accessed through an integrated identifier. Accordingly, the integrated entity conversion unit 136 first generates new integrated identifiers, converts the first type identifiers of the first knowledge data of the first internal knowledge base 10 into new integrated identifiers, respectively, Lt; / RTI > As described above, the identifier and the entity may be the same concept, and the conversion of the first type identifier into the unified identifier may be the same concept as the conversion of the entity of the first internal knowledge data into the unified entity.

In one embodiment, the aggregate knowledge data of the integrated knowledge base 30 may include data corresponding to five triples with respect to entity 'A1' accessed by the identifier 'CURI_A1'. For example, the integrated knowledge data may include a plurality of triple data such as triple data (A1-P1-N1) indicating the value of the object 'A1' and the value of the property 'P1' 'N1'. When the integrated knowledge base 30 does not exist, the integrated entity conversion unit 136 selects some of the internal knowledge bases and selects each of the identifiers for accessing the internal knowledge data of the selected internal knowledge base The basic integrated knowledge data necessary for integrating knowledge data of a plurality of internal knowledge bases can be generated and stored in the integrated knowledge base 30. Hereinafter, the integrated knowledge data is assumed to be generated by transforming the first internal knowledge data, but this is merely an exemplary embodiment, and not limited thereto.

FIG. 4A is a block diagram illustrating an embodiment of an entity similarity analysis unit according to an embodiment of the present invention, and FIG. 4B is a diagram for explaining an operation of the integrated entity candidate search unit. FIG. 4C is a block diagram illustrating an embodiment of an entity similarity calculation unit according to an embodiment of the present invention, and FIGS. 4D to 4E are views for explaining operations of the integrated entity selection unit.

4A, the object similarity analyzer 142 may include an integrated entity candidate search unit 142a, an object similarity calculation unit 142b, and an integrated entity selection unit 142c. The integrated entity candidate search unit 142a according to an embodiment of the present invention can search for an integrated entity candidate of integrated knowledge data by comparing a plurality of entities of the integrated knowledge data with a target entity of the inner knowledge data. Referring to FIG. 2C, the integrated entity candidate search unit 142a may select the object 'A2' of the second inner knowledge data stored in the second inner knowledge base 20 as a target entity. The integrated entity candidate search unit 142a may select any one of the relationships or attributes of the object entity 'A2' and compare the selected objects with a plurality of entities of the integrated knowledge data based on the selected relationship. In one embodiment, the integrated entity candidate search unit 142a may compare the plurality of entities of the integrated knowledge data based on the attribute 'P1' of the object entity 'A2'. Referring to FIG. 4B, the integrated entity candidate search unit 142a may search for an integrated entity candidate of the integrated knowledge data stored in the integrated knowledge base 30. The integrated entity candidate search unit 142a searches for a target object 'A2' P1 'having the same attribute as the value' N2 'of the attribute' P1 'of the attribute' P1 'and the entity having the same value as the entity candidate' P1 '. For example, the objects 'A1', 'B1' and 'C1' of the integrated knowledge data have values 'N1', 'Y1' and 'X1' When 'X1' is the same value as 'N2', the integrated entity candidate search unit 142a can select the objects 'A1', 'B1', and 'C1' as integration entity candidates.

The object similarity calculation unit 142b can generate the similarities of the target entity 'A2' and the selected integrated entity candidates 'A1', 'B1', and 'C1', respectively. According to one embodiment, the object similarity calculation unit 142b may include a graph similarity analysis unit (GSA), an ontology similarity analysis unit (OTSA), and a syntax similarity analysis unit (CSA). The graph similarity analysis unit (GSA) generates a graph structure having nodes (representing individual entities) and arcs (connecting lines representing the attribute relationships between nodes) according to attributes and relationships for the integrated entity candidates and the target entities, The similarity of the graph structure can be calculated. Also, the ontology similarity analysis unit (OTSA) can calculate the similarity degree between the class including the integrated entity candidate and the class including the target object based on the respective ontologies of the internal knowledge base and the integrated knowledge base. When the CSA has a text format value for each attribute of the combined entity candidate and the target entity, the CSA can calculate the similarity degree of the text corresponding to each of the combined entity candidate and the target entity. The object similarity calculation unit 142b may include a similarity analyzing unit (CSA), an ontology similarity analyzing unit (OTSA), and a graph similarity analyzing unit (GSA) As shown in FIG.

Based on the degree of similarity generated by the object similarity calculation unit 142b, the integrated object selection unit 142c can select an integrated object among a plurality of integrated object candidates and generate integrated information indicating a target object corresponding to the selected integrated object The integrated entity selecting unit 142c may select the similarity generated in the GSA, the similarity generated in the ontology similarity analyzer (OTSA), and the similarity generated in the syntax similarity analyzer (CSA) Based on the degree of similarity, and select the integrated entity among the plurality of integrated entity candidates based on the selected similarity degree, and generate the integrated information. As described above, the integrated information generated by reflecting only some similarities may be changed based on the similarity that is not selected in the future.

Furthermore, the integrated entity selecting unit 142c may calculate the total of the degrees of similarity using the similarities generated by the graph similarity analyzing unit (GSA), the ontology similarity analyzing unit (OTSA), and the similarity analyzing unit (CSA) The sum of similarities can be calculated by assigning different weights to the degrees of similarity generated in each of them. The integrated entity selection unit 142c can set weights by various methods and various values. The integrated entity selecting unit 142c can select an integrated entity among a plurality of unified entity candidates based on the calculated similarity sum, and can generate integrated information indicating a target entity corresponding to the selected integrated entity.

For example, the integrated entity selecting unit 142c may select a plurality of integrated entity candidates having a degree of similarity higher than a reference value among the plurality of integrated entity candidates as an integrated entity by using the reference value entity selection method. In addition, the integrated entity selecting unit 142c may select a combination entity having the highest total similarity degree among the plurality of integration entity candidates using the maximum value entity selection method. In addition, the integrated entity selecting unit 142c may select a plurality of integrated entity candidates having a degree of similarity with the target entity equal to or higher than a reference value corresponding to each degree of similarity, using the decision tree method. For example, an integrated entity may be selected such that a graph similarity degree to a target entity among a plurality of integration entity candidates is equal to or greater than a first reference value, an ontology similarity degree is equal to or greater than a second reference value, and a syntax similarity corresponds to a third reference value or more. In addition, the integrated entity selecting unit 142c may select, as an integrated entity, a plurality of integrated entity candidates that satisfy the learned requirements of the degree of similarity with the object entity, using a machine learning method. However, the integrated entity selection method of the integrated entity selection unit 142c is not limited to the exemplary embodiment, and the integrated entity can be selected in various manners.

It is possible to select an integrated entity candidate having the highest degree of similarity as an integrated entity. For example, referring to FIG. 4D, the integrated entity selecting unit 142c selects the integrated object candidates 'A1', 'B1', and 'C1' having the similarity with the target object 'A2' Can be selected as an integrated entity.

According to another embodiment of the present invention, the entity similarity calculation unit 142b refers to the integrated information when performing the integration operation with respect to the object 'A2' of the second internal knowledge data and another object, Can be generated. That is, the object similarity calculation unit 142b can generate the similarity with respect to another object by referring to the integrated information representing the object 'A1' of the integrated knowledge data corresponding to the object 'A2' of the second internal knowledge data .

Referring to FIG. 3A, the entity similarity analyzer 142 provides the integrated information including the information on the target entity of the internal knowledge data and the corresponding integrated entity to the integrated information storage unit 144, The storage unit 144 may store the integrated information. For example, the integration information may include information indicating the target entity 'A2' of the second internal knowledge data and the integrated entity 'A1' of the integrated knowledge data corresponding thereto as described above. Also, in one embodiment of the present invention, the knowledge base integration module 140 may perform an integrated knowledge base construction operation using the integrated information stored in the integrated information storage 144 first. For example, the knowledge base integration module 140 performs an integration operation using the integrated information without performing a similarity analysis on an entity having an identifier such as the object entity 'A2' of the second internal knowledge data can do. The integrated information can be previously set from the outside and can be updated with new information by the object similarity analyzer 142.

Referring to FIG. 4E, the second type identifier 'URI_2_A2' for accessing the 'A2' entity selected as the target entity among the entities of the second inner knowledge data stored in the second inner knowledge base 20 is replaced with the unified identifier 'CURI_A' And can integrate the second internal knowledge data with the integrated knowledge data of the integrated knowledge base 30. Accordingly, data related to the objects 'A1' and 'A2', that is, triple data, can be accessed through the integrated identifier 'CURI_A'. For convenience of explanation, in order to distinguish the integrated data from the first internal knowledge data and the integrated data from the second internal knowledge data, the object 'A1' and the object 'A2' The object 'A2' is the same object, and the object 'A1' and the object 'A2' may have the same concept as the unified identifier 'CURI_A'.

Then, the integrated entity candidate search unit 142a selects the object 'B2' of the second internal knowledge data stored in the second internal knowledge base 20 in FIG. 2C as a target entity, and the knowledge base integration module 140 selects The above-described integration operation can be performed using the object 'B2'.

FIG. 5A is a block diagram illustrating an example of an integrated knowledge base building system according to an embodiment of the present invention, and FIGS. 5B and 5C are views for explaining the operation of the curation module according to an embodiment of the present invention .

As shown in FIG. 5A, the integrated knowledge base construction system 100 may include a knowledge base integration module 140 and a curation module 160. As described above, the object similarity analyzer 142 can select an integrated entity based on the degree of similarity with the target entity among the selected integrated entity candidates. However, as shown in FIG. 5B, a plurality of integration entities may be selected as 'A1' and 'B1'. At this time, the object similarity analyzer 142 may activate the curation module 160. [ The curation module 160 may include a selection integrated entity modification unit 160a. The selected integrated entity modification unit 160a receives a plurality of selected integrated entity related information from the object similarity analysis unit 142 from the object similarity analysis unit 142 and receives a plurality The selected integrated entity related information can be modified so that one or less integrated entities among the selected integrated entities are selected. For example, as shown in 5c, the selected integrated entity modification unit 160a may modify the integrated entity related information such that only the entity 'A1' among the integrated entity candidates A1 and B1 'is selected as an integration entity . The input data 500 may include selection criterion information (for example, a similarity-related reference value or the like) for selecting one or less integrated entities among a plurality of selected integrated entities.

FIG. 6A is a block diagram illustrating an exemplary embodiment of a knowledge base integration module and a curation module according to an embodiment of the present invention. FIG. 6B is a flowchart illustrating an operation of a curation module and a knowledge base integration module according to an exemplary embodiment of the present invention. Fig.

As shown in FIG. 6A, the integrated knowledge base construction system 100 may include a knowledge base integration module 140 and a curation module 160. The knowledge base integration module 140 may further include an integrated knowledge data update unit 148 and the curation module 160 may further include an integrated knowledge data modification unit 160b. The integrated knowledge data changing unit 160b may include a data extracting unit 160b-1, a data changing unit 160b-2, and a change data detecting unit 160b-3. The data extracting unit 160b-1 can extract the integrated knowledge data stored in the integrated knowledge base 200. [ The data changing unit 160b-2 can change the integrated knowledge data based on the input data 500 received from the outside. 6B, the data changing unit 160b-2 may delete the triple data (A1-R1-K1, A1-R2-K2) that can be accessed by the unified identifier 'CURI_A' (B1-R3-M) which can delete the data 51 and can be accessed with the integrated identifier 'CURI_A' and the triple data (A1-P2-K4) and the integrated identifier 'CURI_B' Additional knowledge data 52 may be added. The change data detection unit 160b-3 can detect the changed knowledge data by comparing the pre-change integrated knowledge data with the changed knowledge data from the data change unit 160b-2.

The integrated knowledge data updating unit 148 can receive information related to the changed knowledge data from the change data detecting unit 160b-3. The integrated knowledge data updating section 148 may include an additional data information storage section 148a and a deletion data information storage section 148b. The additional data information storage unit 148a may store information related to the additional knowledge data 52 of FIG. 6B, and the deletion data information storage unit 148b may store information related to the deletion knowledge data 51 Can be stored. The integrated knowledge data updating unit 148 can update the integrated knowledge data by directly changing the integrated knowledge data stored in the integrated knowledge base 200 based on the information related to the changed knowledge data. The integrated knowledge data updating unit 148 compares the additional data information or the deletion data information with the integrated object related data of the internal knowledge data to be integrated thereafter and judges whether or not to add the integrated object related data based on the comparison result can do. For example, the integrated knowledge data updating unit 148 may not add the integrated entity related data to the integrated knowledge data when the integrated entity related data matches the data included in the additional knowledge data information or the deleted knowledge data information have.

Accordingly, when the same knowledge data as the integrated entity related data exists, the storage capacity of the integrated knowledge base 200 can be efficiently used by not adding the integrated entity related data to the integrated knowledge data. In addition, by not adding the same integrated entity-related data as the deletion knowledge data information to the integrated knowledge data, it is possible to selectively integrate only the knowledge data to be integrated.

FIG. 7 is a flowchart illustrating an operation of building an integrated knowledge base according to an embodiment of the present invention.

7, external data is received from a plurality of data servers, and each external data is subjected to normalization and refinement based on a predetermined conversion rule, and a plurality of knowledge bases (S100). A plurality of entities of the integrated knowledge data are compared with a target entity of the inner knowledge data to search for a plurality of entities of the integrated knowledge data to select entities having the same attribute or relationship value or relationship as the entity candidate (S110). The degree of similarity between the candidate of the selected integrated entity and the target object of the internal knowledge data is analyzed (S120). Based on the analyzed similarity, an integrated entity candidate having a degree of similarity higher than the reference value or having the highest degree of similarity is selected as an integrated entity (S130). It is determined whether the number of selected integrated entities is one or less (S140). If the number of selected integrated entities is equal to or less than one (S140, YES), in order to integrate the knowledge data related to the target entity with the integrated knowledge data, the integrated information including the selection result for the integrated entity is used The integrated data is integrated with the integrated knowledge data to establish an integrated knowledge base (S150). If the number of selected integrated entities is not less than one (S140, NO), the integrated entity selected based on the input data received from the outside is modified so that only one or less entities are selected as integration entities (S160) , And integrates the data related to the target entity with the integrated knowledge data using the integrated information including the modified selection result for the integrated entity (S150).

While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims (10)

An individual knowledge base building module for converting external data received from the first data server to generate first internal knowledge data; A knowledge base integration module for integrating the integrated knowledge data and the first internal knowledge data; And a curation module,
The knowledge base integration module includes:
Searching the integrated knowledge data for selecting an integrated entity candidate of the integrated knowledge data by comparing the object of the first inner knowledge data with the objects of the integrated knowledge data, An entity similarity analyzer for selecting an integrated entity based on the similarity among the integrated entity candidates,
And an integrated entity converting unit for adding the object related data among the first internal knowledge data to the integrated knowledge data based on the integrated information including the selection result for the integrated entity,
Wherein the curation module comprises:
When a plurality of the integration entities are selected,
And selects one integrated entity of the plurality of integrated entities based on input data received from the outside.
The method according to claim 1,
Wherein the individual knowledge base rescue module comprises:
Converting the external data received from the second data server to generate second knowledge data individually,
The knowledge base integration module includes:
And integrates the integrated knowledge data and the second knowledge data.
The method according to claim 1,
Wherein the object similarity analyzer comprises:
(Or relationship) of the object of the first internal knowledge data and the value (or object) corresponding thereto and the predetermined attribute (or relation) of each entity of the integrated knowledge data and the value (Or relationship) and the same value (or object) as the predetermined property (or relationship) of the target object among the objects of the integrated knowledge data and the corresponding value (or object) And selects an entity having the same as the integrated entity candidate.
The method according to claim 1,
Wherein the object similarity analyzer comprises:
Generating similarity between the target entity and the integrated entity candidate based on at least one of graph information, ontology information, and syntax information corresponding to the target entity and the individual entity candidates,
And selects the integrated entity based on the degree of similarity generated among the integrated entity candidates.
The method according to claim 1,
Wherein the integrated entity conversion unit comprises:
And converting an identifier for the target entity into an integrated identifier for the integrated entity.
The method according to claim 1,
Wherein the integrated entity conversion unit comprises:
When the object similarity analyzing unit fails to select an integrated entity candidate of the integrated knowledge data or fails to select the integrated entity among the integrated entity candidates,
And converting the identifier for the target entity into the new integrated identifier.
delete The method according to claim 1,
Wherein the curation module comprises:
The method according to any one of claims 1 to 3, further comprising the steps of: adding or deleting entity-related data of the integrated knowledge data based on input data received from the outside, modifying the integrated knowledge data, Integrated knowledge base building system.
9. The method of claim 8,
The knowledge base integration module includes:
And an integrated knowledge data updating unit for storing the change data information received from the curation module and updating the integrated knowledge data based on the change data information.
10. The method of claim 9,
Wherein the integrated knowledge data updating unit comprises:
Comparing the change data information with the selected integrated entity related data, and determining whether to add the integrated entity related data based on the comparison result.
KR1020160010071A 2016-01-27 2016-01-27 System and method for building integration knowledge base based KR101739540B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020160010071A KR101739540B1 (en) 2016-01-27 2016-01-27 System and method for building integration knowledge base based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160010071A KR101739540B1 (en) 2016-01-27 2016-01-27 System and method for building integration knowledge base based

Publications (1)

Publication Number Publication Date
KR101739540B1 true KR101739540B1 (en) 2017-06-08

Family

ID=59221161

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160010071A KR101739540B1 (en) 2016-01-27 2016-01-27 System and method for building integration knowledge base based

Country Status (1)

Country Link
KR (1) KR101739540B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190079805A (en) * 2017-12-28 2019-07-08 주식회사 솔트룩스 System and method for building integration knowledge base based a plurality of data sources
KR102098255B1 (en) * 2018-11-30 2020-04-07 주식회사 솔트룩스 System and method for consolidating knowledge based on knowledge embedding
KR102111734B1 (en) * 2018-11-29 2020-05-15 주식회사 솔트룩스 System and method for building integration knowledge base based
KR102121504B1 (en) * 2018-11-29 2020-06-10 주식회사 솔트룩스 System and method for building integration knowledge data base based a plurality of data sources
KR20210050206A (en) * 2019-10-28 2021-05-07 주식회사 한글과컴퓨터 Knowledge database management device for building a knowledge database using tables included in spreadsheet documents and enabling user access to the knowledge database, and operating method thereof
KR20210077251A (en) * 2019-12-17 2021-06-25 주식회사 한글과컴퓨터 Database building device that can build a knowledge database from a table-inserted image and operating method thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101467707B1 (en) * 2013-12-23 2014-12-02 포항공과대학교 산학협력단 Method for instance-matching in knowledge base and device therefor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101467707B1 (en) * 2013-12-23 2014-12-02 포항공과대학교 산학협력단 Method for instance-matching in knowledge base and device therefor

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190079805A (en) * 2017-12-28 2019-07-08 주식회사 솔트룩스 System and method for building integration knowledge base based a plurality of data sources
KR102006214B1 (en) 2017-12-28 2019-08-02 주식회사 솔트룩스 System and method for building integration knowledge base based a plurality of data sources
KR102111734B1 (en) * 2018-11-29 2020-05-15 주식회사 솔트룩스 System and method for building integration knowledge base based
WO2020111371A1 (en) * 2018-11-29 2020-06-04 주식회사 솔트룩스 Integrated knowledge base construction system and method
KR102121504B1 (en) * 2018-11-29 2020-06-10 주식회사 솔트룩스 System and method for building integration knowledge data base based a plurality of data sources
KR102098255B1 (en) * 2018-11-30 2020-04-07 주식회사 솔트룩스 System and method for consolidating knowledge based on knowledge embedding
KR20210050206A (en) * 2019-10-28 2021-05-07 주식회사 한글과컴퓨터 Knowledge database management device for building a knowledge database using tables included in spreadsheet documents and enabling user access to the knowledge database, and operating method thereof
KR102300467B1 (en) 2019-10-28 2021-09-09 주식회사 한글과컴퓨터 Knowledge database management device for building a knowledge database using tables included in spreadsheet documents and enabling user access to the knowledge database, and operating method thereof
KR20210077251A (en) * 2019-12-17 2021-06-25 주식회사 한글과컴퓨터 Database building device that can build a knowledge database from a table-inserted image and operating method thereof
KR102328034B1 (en) 2019-12-17 2021-11-17 주식회사 한글과컴퓨터 Database building device that can build a knowledge database from a table-inserted image and operating method thereof

Similar Documents

Publication Publication Date Title
KR101739540B1 (en) System and method for building integration knowledge base based
US11068439B2 (en) Unsupervised method for enriching RDF data sources from denormalized data
CN111782965B (en) Intention recommendation method, device, equipment and storage medium
US9495345B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US20200192727A1 (en) Intent-Based Organisation Of APIs
US11960513B2 (en) User-customized question-answering system based on knowledge graph
US10747958B2 (en) Dependency graph based natural language processing
US20170262868A1 (en) Methods and systems for analyzing customer care data
KR20220115046A (en) Method and appartuas for semantic retrieval, device and storage medium
US11281864B2 (en) Dependency graph based natural language processing
CN107679035B (en) Information intention detection method, device, equipment and storage medium
US11836120B2 (en) Machine learning techniques for schema mapping
US20170103125A1 (en) Apparatus and method of exploring and accessing relevant data from big data repository
Dyvak et al. Recognition of Relevance of Web Resource Content Based on Analysis of Semantic Components
CN114996549A (en) Intelligent tracking method and system based on active object information mining
Rahmani et al. Entity resolution in disjoint graphs: an application on genealogical data
US20170124090A1 (en) Method of discovering and exploring feature knowledge
KR20150112442A (en) System and method for generating knowledge
Kumar et al. Efficient structuring of data in big data
US20150154198A1 (en) Method for in-loop human validation of disambiguated features
US20230032208A1 (en) Augmenting data sets for machine learning models
Shafi et al. [WiP] Web Services Classification Using an Improved Text Mining Technique
US9910890B2 (en) Synthetic events to chain queries against structured data
US20230142351A1 (en) Methods and systems for searching and retrieving information
CN114648121A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
GRNT Written decision to grant