CN112214611B - Enterprise knowledge graph construction system and method - Google Patents

Enterprise knowledge graph construction system and method Download PDF

Info

Publication number
CN112214611B
CN112214611B CN202011030017.3A CN202011030017A CN112214611B CN 112214611 B CN112214611 B CN 112214611B CN 202011030017 A CN202011030017 A CN 202011030017A CN 112214611 B CN112214611 B CN 112214611B
Authority
CN
China
Prior art keywords
data
module
rule
sub
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011030017.3A
Other languages
Chinese (zh)
Other versions
CN112214611A (en
Inventor
王志刚
吴士泓
徐静
陈文旋
冯荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuanguang Software Co Ltd
Original Assignee
Yuanguang Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuanguang Software Co Ltd filed Critical Yuanguang Software Co Ltd
Priority to CN202011030017.3A priority Critical patent/CN112214611B/en
Publication of CN112214611A publication Critical patent/CN112214611A/en
Application granted granted Critical
Publication of CN112214611B publication Critical patent/CN112214611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a system and a method for constructing an enterprise knowledge graph, belongs to the technical field of knowledge graphs, and solves the problems that data of various structure types cannot be effectively integrated and associated and the value mining difficulty of the data is high in the prior art. The system comprises: the map creating module is used for creating a knowledge map card; the map design module is used for defining entities and relations in the knowledge map; the data configuration module is used for respectively configuring basic data sources through the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source; and the rule design module is used for realizing rule configuration and rule reasoning in the knowledge graph. The above-mentioned enterprise data management difficulty is overcome.

Description

Enterprise knowledge graph construction system and method
Technical Field
The application relates to the technical field of knowledge graphs, in particular to a system and a method for constructing an enterprise knowledge graph.
Background
In the digital economic age, enterprises have vast amounts of structured, semi-structured, and unstructured (hereinafter, unstructured document data) and the like types of data (heterogeneous), and these data are often stored in different locations (multisources). The characteristics of fragmentation, insufficient relevance and the like of multi-source heterogeneous data easily cause the problems of information island, incapability of converting data into knowledge and the like, and further influence the depth of data value mining. The data can be effectively utilized only through understanding and analysis, and the construction of the knowledge graph is an important way for extracting the data and further refining the effective knowledge.
The knowledge graph is one of application fields of artificial intelligence technology, and has strong semantic processing and data structuring organization capacity, thus providing a foundation for intelligent information application. The knowledge graph integrates, cross-correlates, analyzes and compares large-scale data/knowledge by constructing a semantic network of the entity and the relation, deep mines the data, supports intelligent understanding representation, reasoning, retrieval and service of the knowledge, and provides self-service iterative analysis capability for users. However, conventional databases and analytical mining technology tools are almost useless for the needs and applications of unstructured, semi-structured data integration correlations, knowledge extraction representations, and the like.
At present, organization and storage of mass data of enterprises are often fragmented, data of various structures are stored in different databases and file systems due to limitations in data structures, database storage capacity and the like, and traditional database and analysis mining technical tools have weak processing capacity for unstructured and semi-structured data, so that data of various structure types cannot be effectively integrated and associated, and the value mining difficulty of the data is high.
Disclosure of Invention
In view of the above analysis, the embodiment of the application aims to provide a system and a method for constructing an enterprise knowledge graph, which are used for solving the problems that the existing database and analysis mining technical tool has weak processing capacity for unstructured and semi-structured data, so that data of various structure types cannot be effectively integrated and associated, and the value mining difficulty of the data is high.
In one aspect, an embodiment of the present application provides a system for constructing an enterprise knowledge graph, including: the map creating module is used for creating a knowledge map card; the map design module is used for designing a knowledge map Schema aiming at the knowledge map card so as to define entities and relations in the knowledge map; the data configuration module is used for configuring basic data sources through the data importing sub-module, the data source configuration sub-module and the knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and the rule design module is used for realizing rule configuration and rule reasoning in the knowledge graph so as to display the rule reasoning and reasoning result through the visual knowledge graph.
The beneficial effects of the technical scheme are as follows: the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
Based on a further improvement of the above system, the knowledge graph Schema of the enterprise knowledge graph according to claim 1 includes a normal view mode, a visual view mode and a template import mode to switch among the normal view mode, the visual view mode and the template import mode.
Based on further improvement of the system, the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph.
Based on further improvement of the system, the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph.
Based on a further improvement of the above system, the structured data map comprises: entity mapping, attribute mapping and relation mapping, wherein the entity mapping associates the entities defined by the map design module with a data table in the relation database one by one; the attribute mapping maps the attributes of the entity with the fields in the associated data table; and the relation mapping is to establish a relation between the head entity and the tail entity.
Based on further improvement of the system, the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management sub-module is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and a model operation sub-module for constructing a model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
Based on further improvement of the system, the enterprise knowledge graph construction system further comprises a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module, wherein the data cleaning module is used for configuring a regular expression and filtering attribute value types according to the regular expression so as to unify data formats; the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities; the knowledge graph reconstruction module is used for manually adjusting the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
Based on a further improvement of the above system, the rule design module is configured to perform rule configuration and rule reasoning, and the rule configuration includes: filling rule basic description information; selecting a rule-related entity; forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph; configuring one rule sub-expression in the plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity; selecting the basic attribute or the relation attribute to configure attribute data of the first entity; when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, the configuration of the one rule sub-expression is completed, and the remaining rule sub-expressions are continuously configured in the same mode as the one rule sub-expression; when the relation attribute is selected, the entity selection box is automatically switched to a tail entity corresponding to the relation attribute, the attribute data is converted into attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as the rule sub-expressions; relationships between the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration.
Based on a further improvement of the system, the rule reasoning is used for selecting configured rules and applying the selected rules to the knowledge graph to display the rule reasoning and reasoning results through the visualized knowledge graph.
On the other hand, the embodiment of the application provides a method for constructing an enterprise knowledge graph, which comprises the following steps: creating a knowledge graph card; designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph; the method comprises the steps of respectively configuring basic data sources through an importing data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to a map construction module; constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and realizing rule configuration and rule reasoning in the knowledge graph so as to display the rule reasoning and reasoning result through the visual knowledge graph.
Compared with the prior art, the application has at least one of the following beneficial effects:
1. the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
2. The knowledge graph connects the entities defined in the data with different structures on a large scale through the relation to form a data network, so as to realize multi-source heterogeneous data integration and deep cross correlation, further provide the capability of analyzing the problem from the relation angle and achieve the aim of fully mining the data value.
3. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. And a knowledge graph is quickly constructed, so that the difficulty of integration of multi-source heterogeneous data of an enterprise is solved, and the application value of the enterprise data is further improved.
In the application, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the application, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a block diagram of a system for building an enterprise knowledge graph, in accordance with an embodiment of the application;
FIG. 2 is a specific block diagram of a system for constructing an enterprise knowledge graph in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of knowledge extraction in accordance with an embodiment of the application; and
fig. 4 is a flowchart of a method for constructing an enterprise knowledge graph, according to an embodiment of the application.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
The application discloses a system for constructing an enterprise knowledge graph. Referring to fig. 1, the system for constructing an enterprise knowledge graph includes: a creating map module 102, configured to create a knowledge map card; the atlas design module 104 is configured to design a knowledge atlas Schema for the knowledge atlas card, so as to define entities and relationships in the knowledge atlas; the data configuration module 106 is configured to configure the basic data sources through the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module, where the knowledge extraction sub-module performs structural processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module 108 is configured to construct a knowledge map based on the knowledge map Schema by using the basic data source selected by the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and a rule design module 110 for implementing rule configuration and rule reasoning in the knowledge graph to display rule reasoning and reasoning results through the visualized knowledge graph.
Compared with the prior art, in the system for constructing the enterprise knowledge graph, the knowledge graph is a graph-based data structure and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
Hereinafter, a construction system of the enterprise knowledge graph will be described in detail with reference to fig. 1 to 3.
Referring to fig. 1, the system for constructing an enterprise knowledge graph includes: a create map module 102, a map design module 104, a data configuration module 106, a map construction module 108, and a rule design module 110. Referring to fig. 2, the system for constructing the enterprise knowledge graph further includes: the system comprises a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module.
The create map module 102 is used to create a knowledge-map card. Each knowledge-graph card is a knowledge-graph designed based on a specific application scene. The system supports a user to create a plurality of knowledge-graph cards.
The atlas design module 104 is configured to design a knowledge atlas Schema for the knowledge atlas card, so as to define entities and relations in the knowledge atlas. The knowledge graph Schema includes a normal view mode, a visual view mode, and a template import mode to switch between the normal view mode, the visual view mode, and the template import mode.
The data configuration module 106 is configured to configure the basic data sources by respectively importing a data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module performs a structuring process on unstructured data and provides the structured data as one of the basic data sources to the map construction module. The data configuration module 106 is referred to as my data module of fig. 2. Specifically, the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph. The data source configuration submodule is used for adding the relational database in a URL connection mode and mapping the structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph. The structured data map comprises: entity mapping, attribute mapping and relation mapping, wherein the entity mapping associates the entities defined by the map design module with the data tables in the relation database one by one; mapping the attribute of the entity with the field in the associated data table; and the relation mapping is to establish a relation between the head entity and the tail entity. Hereinafter, the knowledge extraction sub-module will be described in detail with reference to fig. 3.
Referring to fig. 3, the knowledge extraction submodule is configured to perform knowledge extraction on unstructured text data, and includes an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, where the ontology management submodule is configured to define entities extracted from unstructured text data and relationships thereof as ontologies according to service scene requirements; the corpus management sub-module is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked on the basis of the ontology; the algorithm management sub-module is used for managing an entity and entity relation extraction algorithm in unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and a model operation sub-module for constructing a model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
The map construction module 108 is configured to construct a knowledge map based on the knowledge map Schema by using the basic data source selected by the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module.
The data cleaning module is used for configuring the regular expression and filtering attribute value types according to the regular expression so as to unify data formats; the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities; and the knowledge graph reconstruction module is used for manually optimizing the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
The rule design module 110 is configured to implement rule configuration and rule reasoning in the knowledge graph, so as to display rule reasoning and reasoning results through the visual knowledge graph. Specifically, the rule design module is used for performing rule configuration and rule reasoning, and the rule configuration comprises: filling rule basic description information; selecting a rule-related entity; forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph (i.e. a reconstructed knowledge graph is also called an optimized knowledge graph); configuring one rule sub-expression in a plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity; selecting basic attributes or relation attributes to configure attribute data of the first entity; when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, one rule sub-expression is configured, and the rest rule sub-expressions are continuously configured in the same way as one rule sub-expression; when the relation attribute is selected, the entity selection box is automatically switched to the tail entity corresponding to the relation attribute, the attribute data is converted to the attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as one rule sub-expression; relationships between the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration. Rule reasoning is used to select configured rules and apply the selected rules to the optimized knowledge-graph to demonstrate rule reasoning and reasoning results through the visualized knowledge-graph.
The application discloses a construction method of an enterprise knowledge graph. Referring to fig. 4, the method for constructing the enterprise knowledge graph includes: step S402, creating a knowledge graph card; step S404, designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph; step S406, configuring basic data sources through an import data sub-module, a data source configuration sub-module and a knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; step S408, constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and step S410, rule configuration and rule reasoning are realized in the knowledge graph to display rule reasoning and reasoning results through the visual knowledge graph.
Under the promotion of the rapid increase of the scale of the unstructured, semi-structured and structured data available to enterprises, the integration of multi-source heterogeneous data is solved, and the construction of a business knowledge graph is an important means for the informatization development of the enterprises. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. The data application value can be improved by constructing the knowledge graph application. The method and the system for constructing the enterprise knowledge graph, namely the method and the system for constructing the knowledge graph in a semi-automatic manner, can realize the rapid construction of the knowledge graph from 0 to 1, solve the difficult problem of integration of enterprise multi-source heterogeneous data, and further improve the application value of enterprise data.
Hereinafter, a construction system of an enterprise knowledge graph will be described in detail by way of a specific example with reference to fig. 2. Specifically, the implementation flow of the system for constructing the enterprise knowledge graph is as follows:
(1) The user logs in.
(2) The user clicks a 'create map' to create a knowledge map card, wherein the function of the create map module is to manage a plurality of knowledge maps which are created in the system and oriented to specific services. Each knowledge-graph card is a knowledge-graph designed based on a specific application scene. The system supports a user to create a plurality of knowledge-graph cards.
(3) Clicking the knowledge graph card, entering a graph design functional module, and designing a knowledge graph Schema, namely defining an entity and a relation in the knowledge graph. The Schema definition supports three modes: generic view, visual view, and template importation.
(4) After the Schema design is completed, a "My data" function module is entered. The module mainly provides basic data for the knowledge graph, and comprises three sub-modules of 'import data', 'data source configuration' and 'knowledge extraction'. (a) The 'import data' submodule provides functions of example import template downloading, semi-structured excel file example data batch import operation and the like. (b) The data source configuration submodule comprises two functions of adding a data source and mapping data, and mainly maps structured data in a relational database into a map. The sub-module of adding data source adds common relational database source such as MySQL, oracle and the like in a URL connection mode for use by the function of data mapping. The "data mapping" comprises three steps: entity mapping, attribute mapping, and relationship mapping. The entity mapping associates entities defined by the map design with tables in the relational database one by one. "Attribute mapping" maps attributes of an entity to fields in a table with which it is associated. The relation mapping is to establish the relation between the head entity and the tail entity, and express the relation between one field in the table corresponding to the head entity and one field in the table corresponding to the tail entity, and the field relation name is the relation name of the head entity and the tail entity. (c) The knowledge extraction submodule carries out knowledge extraction on unstructured text data, and the knowledge extraction submodule comprises functions of ontology management, corpus management, algorithm management, model training, model operation and the like. The ontology management is constructed according to the Schema defined by the Schema design module, and the category of the ontology defined by the ontology management does not exceed the category defined by the Schema. The "corpus management" includes original corpus management and corpus labeling. Corpus labeling is entity-entity relationship labeling in unstructured text based on ontology, and unstructured text is structured manually. The algorithm management manages the extraction algorithm of unstructured data entity and entity relation, and the system provides a pre-training model Bert (Bidirectional Encoder Representations from Transformers), a comprehensive extraction model BiLSTM+CRF+Capsule (BiLSTM: bi-directional Long Short-Term Memory neural network combined by forward LSTM and backward LSTM), a CRF conditional random field algorithm, a conditional random field, a Capsule neural network, a comprehensive extraction model BiLSTM+CRF+CNN (CNN: convolutional Neural Networks, convolutional neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+GCN (GCN: graph Convolutional Network, graph convolution neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+Bert combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+RNN (RNN: recurrent Neural Network, recursive neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+Bitrans-form combined with three algorithms, and the like, which are used for different extraction scenes. The model training is to train and tune the model based on the labeling corpus and seven algorithms, and store the trained model for the model operation. And (3) carrying out knowledge extraction of non-labeling corpus by using a trained model in 'model operation', and preserving an extraction result for reservation. The three sub-modules are used for carrying out structuring processing on data with different structures, and the processed data are used as basic data for constructing a map.
Referring to fig. 3, the knowledge extraction flow includes: a. and (3) ontology management: the ontology management module comprises functions of adding, deleting, modifying and checking the ontology, and the ontology design is carried out on the basis of an 'atlas design' module by defining entities extracted from unstructured text data and relations (namely, newly adding the ontology) according to the requirements of business scenes. The entities of the ontology design and their relationships herein correspond to a subset of the schemas of the "atlas design" module. b. Corpus management: the corpus management module manages the uploaded unstructured text corpus, and comprises basic operations such as adding, deleting, modifying and checking and text marking functions. c. And (3) algorithm management: the algorithm management module is used for managing corresponding algorithms according to different unstructured text extraction requirements, and comprises functions of adding, deleting, checking and the like. d. Model training: the model training module selects an algorithm and a training sample according to task requirements and then performs model training. e. Model operation: the model operation is a model construction operation using a trained model and a new original sample to be processed, and the entity and its relation are extracted from the new original sample.
(5) And after the design of the knowledge graph Schema and the preparation of the basic data are completed, entering a 'graph construction' functional module to carry out the construction process of the knowledge graph. The construction process is subjected to two steps of configuring basic data provided in the My data function module and calling a knowledge graph schema, so that the knowledge graph can be constructed. After the knowledge graph construction is completed, whether the data mapping is completely correct or not can be checked through entity mapping and attribute mapping.
(6) After the knowledge graph is constructed, the manual tuning of the knowledge graph can be realized through operations such as entity mapping, attribute mapping, data cleaning, normalization and disambiguation and the like. Wherein, the data cleaning is carried out by configuring a regular expression mode; "normalization disambiguation" is achieved by providing an entity similarity algorithm. And then reconstructing the knowledge graph to obtain the knowledge graph meeting the quality requirement.
(7) After the knowledge graph construction is completed, a 'rule design' functional module is entered, and the module comprises two functions of 'rule management' and 'rule reasoning'. The "rule management" includes functions of "rule classification" and "rule addition", etc. The rule classification realizes operations such as adding, deleting and the like of the rule. The rule adding is to configure inference rules in the knowledge graph, and the flow is as follows: and (I) filling out rule basic description information. (II) selecting a rule-related entity. (III) entering rule configuration management, forming a rule expression according to rule intention and decomposing the rule expression into a plurality of rule sub-expressions (each rule sub-expression is an entity and relationship path in the map). (IV) configuring a rule sub-expression, namely selecting an entity at the top of the rule, and displaying all basic attributes and relationship attributes of the entity. (V) selecting a base attribute or a relationship attribute of the entity. If the basic attribute is selected, completing configuration of the rule sub-expression, and jumping to the step (VII); and (VI) if the relation attribute is selected, automatically switching the entity selection box into the tail entity corresponding to the relation attribute, converting the attribute data into the attribute data corresponding to the tail entity, and jumping to the step (V). (VII) repeating the step (IV) until all the rule sub-expressions are configured, and constructing the relations among the rule sub-expressions, the rule sub-expression functions (the relations between the rule sub-expressions are described in the scheme) and the conventional operators (adding, subtracting, multiplying, dividing, greater than, less than, identical, unequal, parallel, intersecting and the like) to complete the rule expression configuration. The rule reasoning can be applied to the visual map display for realizing rule reasoning and reasoning results in the knowledge map. The module can rapidly realize rule configuration in the knowledge graph and provide a rule reasoning function.
(8) After the rule design is completed, a pattern issuing functional module can be entered, the constructed knowledge pattern is issued, and the knowledge pattern is provided for third party service call.
The scheme provides a semi-automatic knowledge graph construction method and system, wherein the system mainly comprises six functional modules of graph creation, graph design, my data, graph construction, rule design, graph release and the like. The 'create map' functional module is used for managing a plurality of knowledge maps which are created by the system and are oriented to specific services. The "map design" functional module is to perform Schema design for each knowledge map, namely define "entity" and "relation" in the knowledge map. The My data function module configures a base data source for the knowledge graph. The ' map construction ' functional module selects a basic data source to construct a map based on the Schema, and supports the operations of ' data cleaning ', normalization and disambiguation ' and the like to realize the manual tuning of the knowledge map. The rule design functional module can rapidly realize rule configuration in the knowledge graph and support rule reasoning. The map publishing function module realizes the functions of map publishing and checking and publishing records. By using the system, the knowledge graph can be quickly built from 0 to 1.
Compared with the prior art, the application has at least one of the following beneficial effects:
1. the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
2. The knowledge graph connects the entities defined in the data with different structures on a large scale through the relation to form a data network, so as to realize multi-source heterogeneous data integration and deep cross correlation, further provide the capability of analyzing the problem from the relation angle and achieve the aim of fully mining the data value.
3. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. And a knowledge graph is quickly constructed, so that the difficulty of integration of multi-source heterogeneous data of an enterprise is solved, and the application value of the enterprise data is further improved.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.

Claims (7)

1. The system for constructing the enterprise knowledge graph is characterized by comprising the following components:
the map creating module is used for creating a knowledge map card;
the map design module is used for designing a knowledge map Schema aiming at the knowledge map card so as to define entities and relations in the knowledge map;
the data configuration module is used for configuring basic data sources through the data importing sub-module, the data source configuration sub-module and the knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module;
the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and
a rule design module for realizing rule configuration and rule reasoning in the knowledge graph to display the rule reasoning and reasoning result through the visual knowledge graph,
the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph;
the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph;
the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management submodule is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and the model operation submodule is used for constructing model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
2. The system for building an enterprise knowledge graph according to claim 1, wherein the knowledge graph Schema includes a normal view mode, a visual view mode, and a template import mode to switch among the normal view mode, the visual view mode, and the template import mode.
3. The system for building an enterprise knowledge graph as claimed in claim 1, wherein the structured data map comprises: entity mapping, attribute mapping, and relationship mapping, wherein,
the entity mapping associates the entities defined by the map design module with the data tables in the relational database one by one;
the attribute mapping maps the attributes of the entity with the fields in the associated data table; and
the relation mapping is to establish a relation between a head entity and a tail entity.
4. The system for constructing an enterprise knowledge graph according to claim 1, further comprising a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module, wherein,
the data cleaning module is used for configuring a regular expression and filtering attribute value types according to the regular expression so as to unify data formats;
the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities;
the knowledge graph reconstruction module is used for manually adjusting the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
5. The system for building an enterprise knowledge graph as claimed in claim 4, wherein the rule design module is configured to perform rule configuration and rule reasoning, and the rule configuration includes:
setting rule basic description information;
selecting a rule-related entity;
forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph;
configuring one rule sub-expression in the plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity;
selecting the basic attribute or the relation attribute to configure attribute data of the first entity;
when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, the configuration of the one rule sub-expression is completed, and the remaining rule sub-expressions are continuously configured in the same mode as the one rule sub-expression;
when the relation attribute is selected, the entity selection box is automatically switched to a tail entity corresponding to the relation attribute, the attribute data is converted into attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as the rule sub-expressions;
the relationships among the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration.
6. The system for constructing an enterprise knowledge graph according to claim 1, wherein the rule reasoning is used to select a rule for configuration and apply the selected rule to the knowledge graph to show the rule reasoning and reasoning result through a visual knowledge graph.
7. The method for constructing the enterprise knowledge graph is characterized by comprising the following steps of:
creating a knowledge graph card;
designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph;
the method comprises the steps of respectively configuring basic data sources through an importing data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to a map construction module;
constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and
rule configuration and rule reasoning are realized in the knowledge graph to display the rule reasoning and reasoning result through the visual knowledge graph,
the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph;
the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph;
the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management submodule is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and the model operation submodule is used for constructing model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
CN202011030017.3A 2020-09-24 2020-09-24 Enterprise knowledge graph construction system and method Active CN112214611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011030017.3A CN112214611B (en) 2020-09-24 2020-09-24 Enterprise knowledge graph construction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011030017.3A CN112214611B (en) 2020-09-24 2020-09-24 Enterprise knowledge graph construction system and method

Publications (2)

Publication Number Publication Date
CN112214611A CN112214611A (en) 2021-01-12
CN112214611B true CN112214611B (en) 2023-10-31

Family

ID=74051971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011030017.3A Active CN112214611B (en) 2020-09-24 2020-09-24 Enterprise knowledge graph construction system and method

Country Status (1)

Country Link
CN (1) CN112214611B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559773A (en) * 2021-02-24 2021-03-26 北京通付盾人工智能技术有限公司 Knowledge graph system building method and device
CN112926855A (en) * 2021-02-24 2021-06-08 北京通付盾人工智能技术有限公司 Marketing activity risk control system and method based on knowledge graph
CN113190689B (en) * 2021-05-25 2023-04-18 广东电网有限责任公司广州供电局 Construction method, device, equipment and medium of electric power safety knowledge graph
CN113407688B (en) * 2021-06-15 2022-09-16 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113468340B (en) * 2021-06-28 2024-05-07 北京众标智能科技有限公司 Construction system and construction method of industrial knowledge graph
CN113537355A (en) * 2021-07-19 2021-10-22 金鹏电子信息机器有限公司 Multi-element heterogeneous data semantic fusion method and system for security monitoring
CN113590846B (en) * 2021-09-24 2021-12-17 天津汇智星源信息技术有限公司 Legal knowledge map construction method and related equipment
CN114138930B (en) * 2021-10-23 2024-02-02 西安电子科技大学 Intent characterization system and method based on knowledge graph
CN114417018B (en) * 2022-03-28 2022-07-15 金现代信息产业股份有限公司 Full-process visual configuration system and method for knowledge graph
CN115952301A (en) * 2023-03-16 2023-04-11 浪潮软件科技有限公司 Construction method and system of knowledge graph management platform
CN116108146B (en) * 2023-04-13 2023-06-27 天津数域智通科技有限公司 Information extraction method based on knowledge graph construction
CN117592561B (en) * 2024-01-18 2024-04-19 国网江苏省电力工程咨询有限公司 Enterprise digital operation multidimensional data analysis method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN109471949A (en) * 2018-11-09 2019-03-15 袁琦 A kind of semi-automatic construction method of pet knowledge mapping
CN110516077A (en) * 2019-08-20 2019-11-29 北京中亦安图科技股份有限公司 Knowledge mapping construction method and device towards enterprise's market conditions
CN110674311A (en) * 2019-09-05 2020-01-10 国家电网有限公司 Knowledge graph-based power asset heterogeneous data fusion method
CN110689385A (en) * 2019-10-16 2020-01-14 国网山东省电力公司信息通信公司 Power customer service user portrait construction method based on knowledge graph
CN110781249A (en) * 2019-10-16 2020-02-11 华电国际电力股份有限公司技术服务分公司 Knowledge graph-based multi-source data fusion method and device for thermal power plant
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN110929042A (en) * 2019-11-26 2020-03-27 昆明能讯科技有限责任公司 Knowledge graph construction and query method based on power enterprise
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150095303A1 (en) * 2013-09-27 2015-04-02 Futurewei Technologies, Inc. Knowledge Graph Generator Enabled by Diagonal Search
US10915577B2 (en) * 2018-03-22 2021-02-09 Adobe Inc. Constructing enterprise-specific knowledge graphs

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN109471949A (en) * 2018-11-09 2019-03-15 袁琦 A kind of semi-automatic construction method of pet knowledge mapping
CN110516077A (en) * 2019-08-20 2019-11-29 北京中亦安图科技股份有限公司 Knowledge mapping construction method and device towards enterprise's market conditions
CN110674311A (en) * 2019-09-05 2020-01-10 国家电网有限公司 Knowledge graph-based power asset heterogeneous data fusion method
CN110825882A (en) * 2019-10-09 2020-02-21 西安交通大学 Knowledge graph-based information system management method
CN110689385A (en) * 2019-10-16 2020-01-14 国网山东省电力公司信息通信公司 Power customer service user portrait construction method based on knowledge graph
CN110781249A (en) * 2019-10-16 2020-02-11 华电国际电力股份有限公司技术服务分公司 Knowledge graph-based multi-source data fusion method and device for thermal power plant
CN110929042A (en) * 2019-11-26 2020-03-27 昆明能讯科技有限责任公司 Knowledge graph construction and query method based on power enterprise
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于知识图谱的企业知识服务模型构建研究;张肃;许慧;;情报科学(08);全文 *
大规模企业级知识图谱实践综述;王昊奋;丁军;胡芳槐;王鑫;;计算机工程(07);全文 *

Also Published As

Publication number Publication date
CN112214611A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112214611B (en) Enterprise knowledge graph construction system and method
CN110147437B (en) Knowledge graph-based searching method and device
CN104615755B (en) A kind of new question answering system based on mass-rent
Chebotko et al. A big data modeling methodology for Apache Cassandra
CN109446344B (en) Intelligent analysis report automatic generation system based on big data
Karnitis et al. Migration of relational database to document-oriented database: Structure denormalization and data transformation
CN107193967A (en) A kind of multi-source heterogeneous industry field big data handles full link solution
US20150095303A1 (en) Knowledge Graph Generator Enabled by Diagonal Search
US20230177078A1 (en) Conversational Database Analysis
An et al. Methodology for automatic ontology generation using database schema information
US20140201203A1 (en) System, method and device for providing an automated electronic researcher
CN111506621A (en) Data statistical method and device
Li et al. Discovering enterprise concepts using spreadsheet tables
CN116361487A (en) Multi-source heterogeneous policy knowledge graph construction and storage method and system
CN107958004A (en) The construction method and device of a kind of knowledge base
Di Blas et al. Exploratory computing: a comprehensive approach to data sensemaking
CN114564482A (en) Multi-entity-oriented label system and processing method
Ben Kraiem et al. OLAP operators for social network analysis
Álvarez-García et al. Compact and efficient representation of general graph databases
CN113312342A (en) Scientific and technological resource integration system based on multi-source database
CN111125045B (en) Lightweight ETL processing platform
Fernández et al. Management of big semantic data
CN116467291A (en) Knowledge graph storage and search method and system
Tsvetovat et al. NetIntel: A database for manipulation of rich social network data
CN112835920A (en) Distributed SPARQL query optimization method based on hybrid storage mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant