CN112214611B - Enterprise knowledge graph construction system and method - Google Patents
Enterprise knowledge graph construction system and method Download PDFInfo
- Publication number
- CN112214611B CN112214611B CN202011030017.3A CN202011030017A CN112214611B CN 112214611 B CN112214611 B CN 112214611B CN 202011030017 A CN202011030017 A CN 202011030017A CN 112214611 B CN112214611 B CN 112214611B
- Authority
- CN
- China
- Prior art keywords
- data
- module
- rule
- sub
- knowledge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 54
- 238000013461 design Methods 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000013507 mapping Methods 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 20
- 230000000007 visual effect Effects 0.000 claims description 15
- 230000008676 import Effects 0.000 claims description 14
- 238000004140 cleaning Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 abstract description 9
- 238000013523 data management Methods 0.000 abstract description 5
- 238000007726 management method Methods 0.000 description 29
- 230000006872 improvement Effects 0.000 description 8
- 230000010354 integration Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000002372 labelling Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000004888 barrier function Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000013506 data mapping Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011049 filling Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000002775 capsule Substances 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a system and a method for constructing an enterprise knowledge graph, belongs to the technical field of knowledge graphs, and solves the problems that data of various structure types cannot be effectively integrated and associated and the value mining difficulty of the data is high in the prior art. The system comprises: the map creating module is used for creating a knowledge map card; the map design module is used for defining entities and relations in the knowledge map; the data configuration module is used for respectively configuring basic data sources through the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source; and the rule design module is used for realizing rule configuration and rule reasoning in the knowledge graph. The above-mentioned enterprise data management difficulty is overcome.
Description
Technical Field
The application relates to the technical field of knowledge graphs, in particular to a system and a method for constructing an enterprise knowledge graph.
Background
In the digital economic age, enterprises have vast amounts of structured, semi-structured, and unstructured (hereinafter, unstructured document data) and the like types of data (heterogeneous), and these data are often stored in different locations (multisources). The characteristics of fragmentation, insufficient relevance and the like of multi-source heterogeneous data easily cause the problems of information island, incapability of converting data into knowledge and the like, and further influence the depth of data value mining. The data can be effectively utilized only through understanding and analysis, and the construction of the knowledge graph is an important way for extracting the data and further refining the effective knowledge.
The knowledge graph is one of application fields of artificial intelligence technology, and has strong semantic processing and data structuring organization capacity, thus providing a foundation for intelligent information application. The knowledge graph integrates, cross-correlates, analyzes and compares large-scale data/knowledge by constructing a semantic network of the entity and the relation, deep mines the data, supports intelligent understanding representation, reasoning, retrieval and service of the knowledge, and provides self-service iterative analysis capability for users. However, conventional databases and analytical mining technology tools are almost useless for the needs and applications of unstructured, semi-structured data integration correlations, knowledge extraction representations, and the like.
At present, organization and storage of mass data of enterprises are often fragmented, data of various structures are stored in different databases and file systems due to limitations in data structures, database storage capacity and the like, and traditional database and analysis mining technical tools have weak processing capacity for unstructured and semi-structured data, so that data of various structure types cannot be effectively integrated and associated, and the value mining difficulty of the data is high.
Disclosure of Invention
In view of the above analysis, the embodiment of the application aims to provide a system and a method for constructing an enterprise knowledge graph, which are used for solving the problems that the existing database and analysis mining technical tool has weak processing capacity for unstructured and semi-structured data, so that data of various structure types cannot be effectively integrated and associated, and the value mining difficulty of the data is high.
In one aspect, an embodiment of the present application provides a system for constructing an enterprise knowledge graph, including: the map creating module is used for creating a knowledge map card; the map design module is used for designing a knowledge map Schema aiming at the knowledge map card so as to define entities and relations in the knowledge map; the data configuration module is used for configuring basic data sources through the data importing sub-module, the data source configuration sub-module and the knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and the rule design module is used for realizing rule configuration and rule reasoning in the knowledge graph so as to display the rule reasoning and reasoning result through the visual knowledge graph.
The beneficial effects of the technical scheme are as follows: the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
Based on a further improvement of the above system, the knowledge graph Schema of the enterprise knowledge graph according to claim 1 includes a normal view mode, a visual view mode and a template import mode to switch among the normal view mode, the visual view mode and the template import mode.
Based on further improvement of the system, the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph.
Based on further improvement of the system, the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph.
Based on a further improvement of the above system, the structured data map comprises: entity mapping, attribute mapping and relation mapping, wherein the entity mapping associates the entities defined by the map design module with a data table in the relation database one by one; the attribute mapping maps the attributes of the entity with the fields in the associated data table; and the relation mapping is to establish a relation between the head entity and the tail entity.
Based on further improvement of the system, the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management sub-module is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and a model operation sub-module for constructing a model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
Based on further improvement of the system, the enterprise knowledge graph construction system further comprises a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module, wherein the data cleaning module is used for configuring a regular expression and filtering attribute value types according to the regular expression so as to unify data formats; the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities; the knowledge graph reconstruction module is used for manually adjusting the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
Based on a further improvement of the above system, the rule design module is configured to perform rule configuration and rule reasoning, and the rule configuration includes: filling rule basic description information; selecting a rule-related entity; forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph; configuring one rule sub-expression in the plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity; selecting the basic attribute or the relation attribute to configure attribute data of the first entity; when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, the configuration of the one rule sub-expression is completed, and the remaining rule sub-expressions are continuously configured in the same mode as the one rule sub-expression; when the relation attribute is selected, the entity selection box is automatically switched to a tail entity corresponding to the relation attribute, the attribute data is converted into attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as the rule sub-expressions; relationships between the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration.
Based on a further improvement of the system, the rule reasoning is used for selecting configured rules and applying the selected rules to the knowledge graph to display the rule reasoning and reasoning results through the visualized knowledge graph.
On the other hand, the embodiment of the application provides a method for constructing an enterprise knowledge graph, which comprises the following steps: creating a knowledge graph card; designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph; the method comprises the steps of respectively configuring basic data sources through an importing data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to a map construction module; constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and realizing rule configuration and rule reasoning in the knowledge graph so as to display the rule reasoning and reasoning result through the visual knowledge graph.
Compared with the prior art, the application has at least one of the following beneficial effects:
1. the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
2. The knowledge graph connects the entities defined in the data with different structures on a large scale through the relation to form a data network, so as to realize multi-source heterogeneous data integration and deep cross correlation, further provide the capability of analyzing the problem from the relation angle and achieve the aim of fully mining the data value.
3. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. And a knowledge graph is quickly constructed, so that the difficulty of integration of multi-source heterogeneous data of an enterprise is solved, and the application value of the enterprise data is further improved.
In the application, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the application, like reference numerals being used to refer to like parts throughout the several views.
FIG. 1 is a block diagram of a system for building an enterprise knowledge graph, in accordance with an embodiment of the application;
FIG. 2 is a specific block diagram of a system for constructing an enterprise knowledge graph in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of knowledge extraction in accordance with an embodiment of the application; and
fig. 4 is a flowchart of a method for constructing an enterprise knowledge graph, according to an embodiment of the application.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
The application discloses a system for constructing an enterprise knowledge graph. Referring to fig. 1, the system for constructing an enterprise knowledge graph includes: a creating map module 102, configured to create a knowledge map card; the atlas design module 104 is configured to design a knowledge atlas Schema for the knowledge atlas card, so as to define entities and relationships in the knowledge atlas; the data configuration module 106 is configured to configure the basic data sources through the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module, where the knowledge extraction sub-module performs structural processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; the map construction module 108 is configured to construct a knowledge map based on the knowledge map Schema by using the basic data source selected by the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and a rule design module 110 for implementing rule configuration and rule reasoning in the knowledge graph to display rule reasoning and reasoning results through the visualized knowledge graph.
Compared with the prior art, in the system for constructing the enterprise knowledge graph, the knowledge graph is a graph-based data structure and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
Hereinafter, a construction system of the enterprise knowledge graph will be described in detail with reference to fig. 1 to 3.
Referring to fig. 1, the system for constructing an enterprise knowledge graph includes: a create map module 102, a map design module 104, a data configuration module 106, a map construction module 108, and a rule design module 110. Referring to fig. 2, the system for constructing the enterprise knowledge graph further includes: the system comprises a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module.
The create map module 102 is used to create a knowledge-map card. Each knowledge-graph card is a knowledge-graph designed based on a specific application scene. The system supports a user to create a plurality of knowledge-graph cards.
The atlas design module 104 is configured to design a knowledge atlas Schema for the knowledge atlas card, so as to define entities and relations in the knowledge atlas. The knowledge graph Schema includes a normal view mode, a visual view mode, and a template import mode to switch between the normal view mode, the visual view mode, and the template import mode.
The data configuration module 106 is configured to configure the basic data sources by respectively importing a data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module performs a structuring process on unstructured data and provides the structured data as one of the basic data sources to the map construction module. The data configuration module 106 is referred to as my data module of fig. 2. Specifically, the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph. The data source configuration submodule is used for adding the relational database in a URL connection mode and mapping the structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph. The structured data map comprises: entity mapping, attribute mapping and relation mapping, wherein the entity mapping associates the entities defined by the map design module with the data tables in the relation database one by one; mapping the attribute of the entity with the field in the associated data table; and the relation mapping is to establish a relation between the head entity and the tail entity. Hereinafter, the knowledge extraction sub-module will be described in detail with reference to fig. 3.
Referring to fig. 3, the knowledge extraction submodule is configured to perform knowledge extraction on unstructured text data, and includes an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, where the ontology management submodule is configured to define entities extracted from unstructured text data and relationships thereof as ontologies according to service scene requirements; the corpus management sub-module is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked on the basis of the ontology; the algorithm management sub-module is used for managing an entity and entity relation extraction algorithm in unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and a model operation sub-module for constructing a model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
The map construction module 108 is configured to construct a knowledge map based on the knowledge map Schema by using the basic data source selected by the import data sub-module, the data source configuration sub-module and the knowledge extraction sub-module.
The data cleaning module is used for configuring the regular expression and filtering attribute value types according to the regular expression so as to unify data formats; the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities; and the knowledge graph reconstruction module is used for manually optimizing the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
The rule design module 110 is configured to implement rule configuration and rule reasoning in the knowledge graph, so as to display rule reasoning and reasoning results through the visual knowledge graph. Specifically, the rule design module is used for performing rule configuration and rule reasoning, and the rule configuration comprises: filling rule basic description information; selecting a rule-related entity; forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph (i.e. a reconstructed knowledge graph is also called an optimized knowledge graph); configuring one rule sub-expression in a plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity; selecting basic attributes or relation attributes to configure attribute data of the first entity; when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, one rule sub-expression is configured, and the rest rule sub-expressions are continuously configured in the same way as one rule sub-expression; when the relation attribute is selected, the entity selection box is automatically switched to the tail entity corresponding to the relation attribute, the attribute data is converted to the attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as one rule sub-expression; relationships between the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration. Rule reasoning is used to select configured rules and apply the selected rules to the optimized knowledge-graph to demonstrate rule reasoning and reasoning results through the visualized knowledge-graph.
The application discloses a construction method of an enterprise knowledge graph. Referring to fig. 4, the method for constructing the enterprise knowledge graph includes: step S402, creating a knowledge graph card; step S404, designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph; step S406, configuring basic data sources through an import data sub-module, a data source configuration sub-module and a knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module; step S408, constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and step S410, rule configuration and rule reasoning are realized in the knowledge graph to display rule reasoning and reasoning results through the visual knowledge graph.
Under the promotion of the rapid increase of the scale of the unstructured, semi-structured and structured data available to enterprises, the integration of multi-source heterogeneous data is solved, and the construction of a business knowledge graph is an important means for the informatization development of the enterprises. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. The data application value can be improved by constructing the knowledge graph application. The method and the system for constructing the enterprise knowledge graph, namely the method and the system for constructing the knowledge graph in a semi-automatic manner, can realize the rapid construction of the knowledge graph from 0 to 1, solve the difficult problem of integration of enterprise multi-source heterogeneous data, and further improve the application value of enterprise data.
Hereinafter, a construction system of an enterprise knowledge graph will be described in detail by way of a specific example with reference to fig. 2. Specifically, the implementation flow of the system for constructing the enterprise knowledge graph is as follows:
(1) The user logs in.
(2) The user clicks a 'create map' to create a knowledge map card, wherein the function of the create map module is to manage a plurality of knowledge maps which are created in the system and oriented to specific services. Each knowledge-graph card is a knowledge-graph designed based on a specific application scene. The system supports a user to create a plurality of knowledge-graph cards.
(3) Clicking the knowledge graph card, entering a graph design functional module, and designing a knowledge graph Schema, namely defining an entity and a relation in the knowledge graph. The Schema definition supports three modes: generic view, visual view, and template importation.
(4) After the Schema design is completed, a "My data" function module is entered. The module mainly provides basic data for the knowledge graph, and comprises three sub-modules of 'import data', 'data source configuration' and 'knowledge extraction'. (a) The 'import data' submodule provides functions of example import template downloading, semi-structured excel file example data batch import operation and the like. (b) The data source configuration submodule comprises two functions of adding a data source and mapping data, and mainly maps structured data in a relational database into a map. The sub-module of adding data source adds common relational database source such as MySQL, oracle and the like in a URL connection mode for use by the function of data mapping. The "data mapping" comprises three steps: entity mapping, attribute mapping, and relationship mapping. The entity mapping associates entities defined by the map design with tables in the relational database one by one. "Attribute mapping" maps attributes of an entity to fields in a table with which it is associated. The relation mapping is to establish the relation between the head entity and the tail entity, and express the relation between one field in the table corresponding to the head entity and one field in the table corresponding to the tail entity, and the field relation name is the relation name of the head entity and the tail entity. (c) The knowledge extraction submodule carries out knowledge extraction on unstructured text data, and the knowledge extraction submodule comprises functions of ontology management, corpus management, algorithm management, model training, model operation and the like. The ontology management is constructed according to the Schema defined by the Schema design module, and the category of the ontology defined by the ontology management does not exceed the category defined by the Schema. The "corpus management" includes original corpus management and corpus labeling. Corpus labeling is entity-entity relationship labeling in unstructured text based on ontology, and unstructured text is structured manually. The algorithm management manages the extraction algorithm of unstructured data entity and entity relation, and the system provides a pre-training model Bert (Bidirectional Encoder Representations from Transformers), a comprehensive extraction model BiLSTM+CRF+Capsule (BiLSTM: bi-directional Long Short-Term Memory neural network combined by forward LSTM and backward LSTM), a CRF conditional random field algorithm, a conditional random field, a Capsule neural network, a comprehensive extraction model BiLSTM+CRF+CNN (CNN: convolutional Neural Networks, convolutional neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+GCN (GCN: graph Convolutional Network, graph convolution neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+Bert combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+RNN (RNN: recurrent Neural Network, recursive neural network) combined with three algorithms, a comprehensive extraction model BiLSTM+CRF+Bitrans-form combined with three algorithms, and the like, which are used for different extraction scenes. The model training is to train and tune the model based on the labeling corpus and seven algorithms, and store the trained model for the model operation. And (3) carrying out knowledge extraction of non-labeling corpus by using a trained model in 'model operation', and preserving an extraction result for reservation. The three sub-modules are used for carrying out structuring processing on data with different structures, and the processed data are used as basic data for constructing a map.
Referring to fig. 3, the knowledge extraction flow includes: a. and (3) ontology management: the ontology management module comprises functions of adding, deleting, modifying and checking the ontology, and the ontology design is carried out on the basis of an 'atlas design' module by defining entities extracted from unstructured text data and relations (namely, newly adding the ontology) according to the requirements of business scenes. The entities of the ontology design and their relationships herein correspond to a subset of the schemas of the "atlas design" module. b. Corpus management: the corpus management module manages the uploaded unstructured text corpus, and comprises basic operations such as adding, deleting, modifying and checking and text marking functions. c. And (3) algorithm management: the algorithm management module is used for managing corresponding algorithms according to different unstructured text extraction requirements, and comprises functions of adding, deleting, checking and the like. d. Model training: the model training module selects an algorithm and a training sample according to task requirements and then performs model training. e. Model operation: the model operation is a model construction operation using a trained model and a new original sample to be processed, and the entity and its relation are extracted from the new original sample.
(5) And after the design of the knowledge graph Schema and the preparation of the basic data are completed, entering a 'graph construction' functional module to carry out the construction process of the knowledge graph. The construction process is subjected to two steps of configuring basic data provided in the My data function module and calling a knowledge graph schema, so that the knowledge graph can be constructed. After the knowledge graph construction is completed, whether the data mapping is completely correct or not can be checked through entity mapping and attribute mapping.
(6) After the knowledge graph is constructed, the manual tuning of the knowledge graph can be realized through operations such as entity mapping, attribute mapping, data cleaning, normalization and disambiguation and the like. Wherein, the data cleaning is carried out by configuring a regular expression mode; "normalization disambiguation" is achieved by providing an entity similarity algorithm. And then reconstructing the knowledge graph to obtain the knowledge graph meeting the quality requirement.
(7) After the knowledge graph construction is completed, a 'rule design' functional module is entered, and the module comprises two functions of 'rule management' and 'rule reasoning'. The "rule management" includes functions of "rule classification" and "rule addition", etc. The rule classification realizes operations such as adding, deleting and the like of the rule. The rule adding is to configure inference rules in the knowledge graph, and the flow is as follows: and (I) filling out rule basic description information. (II) selecting a rule-related entity. (III) entering rule configuration management, forming a rule expression according to rule intention and decomposing the rule expression into a plurality of rule sub-expressions (each rule sub-expression is an entity and relationship path in the map). (IV) configuring a rule sub-expression, namely selecting an entity at the top of the rule, and displaying all basic attributes and relationship attributes of the entity. (V) selecting a base attribute or a relationship attribute of the entity. If the basic attribute is selected, completing configuration of the rule sub-expression, and jumping to the step (VII); and (VI) if the relation attribute is selected, automatically switching the entity selection box into the tail entity corresponding to the relation attribute, converting the attribute data into the attribute data corresponding to the tail entity, and jumping to the step (V). (VII) repeating the step (IV) until all the rule sub-expressions are configured, and constructing the relations among the rule sub-expressions, the rule sub-expression functions (the relations between the rule sub-expressions are described in the scheme) and the conventional operators (adding, subtracting, multiplying, dividing, greater than, less than, identical, unequal, parallel, intersecting and the like) to complete the rule expression configuration. The rule reasoning can be applied to the visual map display for realizing rule reasoning and reasoning results in the knowledge map. The module can rapidly realize rule configuration in the knowledge graph and provide a rule reasoning function.
(8) After the rule design is completed, a pattern issuing functional module can be entered, the constructed knowledge pattern is issued, and the knowledge pattern is provided for third party service call.
The scheme provides a semi-automatic knowledge graph construction method and system, wherein the system mainly comprises six functional modules of graph creation, graph design, my data, graph construction, rule design, graph release and the like. The 'create map' functional module is used for managing a plurality of knowledge maps which are created by the system and are oriented to specific services. The "map design" functional module is to perform Schema design for each knowledge map, namely define "entity" and "relation" in the knowledge map. The My data function module configures a base data source for the knowledge graph. The ' map construction ' functional module selects a basic data source to construct a map based on the Schema, and supports the operations of ' data cleaning ', normalization and disambiguation ' and the like to realize the manual tuning of the knowledge map. The rule design functional module can rapidly realize rule configuration in the knowledge graph and support rule reasoning. The map publishing function module realizes the functions of map publishing and checking and publishing records. By using the system, the knowledge graph can be quickly built from 0 to 1.
Compared with the prior art, the application has at least one of the following beneficial effects:
1. the knowledge graph is a graph-based data structure, and consists of nodes and edges. In the knowledge graph, each node represents an "entity" defined in the data, and each edge is a "relationship" between the entities. The construction of knowledge graph mainly focuses on how to integrate structured, semi-structured and unstructured data, and realize the use of uniform semantic data structures. Thus, the capacity of organizing, managing and understanding the mass information of enterprises is better, and the difficult problem of enterprise data management can be overcome.
2. The knowledge graph connects the entities defined in the data with different structures on a large scale through the relation to form a data network, so as to realize multi-source heterogeneous data integration and deep cross correlation, further provide the capability of analyzing the problem from the relation angle and achieve the aim of fully mining the data value.
3. The knowledge graph technology can effectively break the data barrier and realize interconnection and intercommunication of multi-source heterogeneous data. And a knowledge graph is quickly constructed, so that the difficulty of integration of multi-source heterogeneous data of an enterprise is solved, and the application value of the enterprise data is further improved.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.
Claims (7)
1. The system for constructing the enterprise knowledge graph is characterized by comprising the following components:
the map creating module is used for creating a knowledge map card;
the map design module is used for designing a knowledge map Schema aiming at the knowledge map card so as to define entities and relations in the knowledge map;
the data configuration module is used for configuring basic data sources through the data importing sub-module, the data source configuration sub-module and the knowledge extraction sub-module respectively, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to the map construction module;
the map construction module is used for constructing a knowledge map based on the knowledge map Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and
a rule design module for realizing rule configuration and rule reasoning in the knowledge graph to display the rule reasoning and reasoning result through the visual knowledge graph,
the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph;
the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph;
the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management submodule is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and the model operation submodule is used for constructing model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
2. The system for building an enterprise knowledge graph according to claim 1, wherein the knowledge graph Schema includes a normal view mode, a visual view mode, and a template import mode to switch among the normal view mode, the visual view mode, and the template import mode.
3. The system for building an enterprise knowledge graph as claimed in claim 1, wherein the structured data map comprises: entity mapping, attribute mapping, and relationship mapping, wherein,
the entity mapping associates the entities defined by the map design module with the data tables in the relational database one by one;
the attribute mapping maps the attributes of the entity with the fields in the associated data table; and
the relation mapping is to establish a relation between a head entity and a tail entity.
4. The system for constructing an enterprise knowledge graph according to claim 1, further comprising a data cleaning module, a normalization disambiguation module and a knowledge graph reconstruction module, wherein,
the data cleaning module is used for configuring a regular expression and filtering attribute value types according to the regular expression so as to unify data formats;
the normalization disambiguation module is used for determining the same entity according to the entity similarity so as to remove duplicate entities;
the knowledge graph reconstruction module is used for manually adjusting the knowledge graph through data cleaning and normalization disambiguation so as to reconstruct the knowledge graph.
5. The system for building an enterprise knowledge graph as claimed in claim 4, wherein the rule design module is configured to perform rule configuration and rule reasoning, and the rule configuration includes:
setting rule basic description information;
selecting a rule-related entity;
forming a rule expression based on rule intention and decomposing the rule expression into a plurality of rule sub-expressions, wherein each rule sub-expression is an entity and relationship path in a knowledge graph;
configuring one rule sub-expression in the plurality of rule sub-expressions, selecting a first entity of the one rule sub-expression, and displaying all basic attributes and relationship attributes of the first entity;
selecting the basic attribute or the relation attribute to configure attribute data of the first entity;
when the basic attribute is selected, the attribute data is data corresponding to the basic attribute, the configuration of the one rule sub-expression is completed, and the remaining rule sub-expressions are continuously configured in the same mode as the one rule sub-expression;
when the relation attribute is selected, the entity selection box is automatically switched to a tail entity corresponding to the relation attribute, the attribute data is converted into attribute data corresponding to the tail entity, and the rest rule sub-expressions are continuously configured in the same mode as the rule sub-expressions;
the relationships among the rule sub-expressions, the rule sub-expression functions, and the conventional operators are constructed to complete the rule expression configuration.
6. The system for constructing an enterprise knowledge graph according to claim 1, wherein the rule reasoning is used to select a rule for configuration and apply the selected rule to the knowledge graph to show the rule reasoning and reasoning result through a visual knowledge graph.
7. The method for constructing the enterprise knowledge graph is characterized by comprising the following steps of:
creating a knowledge graph card;
designing a knowledge graph Schema aiming at the knowledge graph card to define entities and relations in the knowledge graph;
the method comprises the steps of respectively configuring basic data sources through an importing data sub-module, a data source configuration sub-module and a knowledge extraction sub-module, wherein the knowledge extraction sub-module carries out structuring processing on unstructured data and provides the structured data as one of the basic data sources to a map construction module;
constructing a knowledge graph based on the knowledge graph Schema by utilizing the basic data source selected by the importing data sub-module, the data source configuration sub-module and the knowledge extraction sub-module; and
rule configuration and rule reasoning are realized in the knowledge graph to display the rule reasoning and reasoning result through the visual knowledge graph,
the importing data sub-module is used for importing Excel files in batches to configure semi-structured data for the knowledge graph;
the data source configuration submodule is used for adding a relational database in a URL connection mode and mapping structured data in the relational database into the knowledge graph so as to configure the structured data for the knowledge graph;
the knowledge extraction submodule is used for carrying out knowledge extraction on unstructured text data and comprises an ontology management submodule, a corpus management submodule, an algorithm management submodule, a model training submodule and a model operation submodule, wherein the ontology management submodule is used for defining entities extracted from the unstructured text data and relations thereof as ontologies according to business scene requirements; the corpus management submodule is used for managing the uploaded unstructured sample corpus, wherein the entity and entity relationship in the unstructured sample corpus are marked based on the ontology; the algorithm management sub-module is used for managing an entity-entity relation extraction algorithm in the unstructured text data; the model training sub-module is used for selecting an algorithm and a training sample according to task requirements and then performing model training; and the model operation submodule is used for constructing model operation by using the trained model and a new original sample to be processed, and extracting the entity and the relation thereof from the new original sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011030017.3A CN112214611B (en) | 2020-09-24 | 2020-09-24 | Enterprise knowledge graph construction system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011030017.3A CN112214611B (en) | 2020-09-24 | 2020-09-24 | Enterprise knowledge graph construction system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112214611A CN112214611A (en) | 2021-01-12 |
CN112214611B true CN112214611B (en) | 2023-10-31 |
Family
ID=74051971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011030017.3A Active CN112214611B (en) | 2020-09-24 | 2020-09-24 | Enterprise knowledge graph construction system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214611B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112559773A (en) * | 2021-02-24 | 2021-03-26 | 北京通付盾人工智能技术有限公司 | Knowledge graph system building method and device |
CN112926855A (en) * | 2021-02-24 | 2021-06-08 | 北京通付盾人工智能技术有限公司 | Marketing activity risk control system and method based on knowledge graph |
CN113190689B (en) * | 2021-05-25 | 2023-04-18 | 广东电网有限责任公司广州供电局 | Construction method, device, equipment and medium of electric power safety knowledge graph |
CN113407688B (en) * | 2021-06-15 | 2022-09-16 | 西安理工大学 | Method for establishing knowledge graph-based survey standard intelligent question-answering system |
CN113468340B (en) * | 2021-06-28 | 2024-05-07 | 北京众标智能科技有限公司 | Construction system and construction method of industrial knowledge graph |
CN113537355A (en) * | 2021-07-19 | 2021-10-22 | 金鹏电子信息机器有限公司 | Multi-element heterogeneous data semantic fusion method and system for security monitoring |
CN113590846B (en) * | 2021-09-24 | 2021-12-17 | 天津汇智星源信息技术有限公司 | Legal knowledge map construction method and related equipment |
CN114138930B (en) * | 2021-10-23 | 2024-02-02 | 西安电子科技大学 | Intent characterization system and method based on knowledge graph |
CN114417018B (en) * | 2022-03-28 | 2022-07-15 | 金现代信息产业股份有限公司 | Full-process visual configuration system and method for knowledge graph |
CN115952301A (en) * | 2023-03-16 | 2023-04-11 | 浪潮软件科技有限公司 | Construction method and system of knowledge graph management platform |
CN116108146B (en) * | 2023-04-13 | 2023-06-27 | 天津数域智通科技有限公司 | Information extraction method based on knowledge graph construction |
CN117592561B (en) * | 2024-01-18 | 2024-04-19 | 国网江苏省电力工程咨询有限公司 | Enterprise digital operation multidimensional data analysis method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446341A (en) * | 2018-10-23 | 2019-03-08 | 国家电网公司 | The construction method and device of knowledge mapping |
CN109471949A (en) * | 2018-11-09 | 2019-03-15 | 袁琦 | A kind of semi-automatic construction method of pet knowledge mapping |
CN110516077A (en) * | 2019-08-20 | 2019-11-29 | 北京中亦安图科技股份有限公司 | Knowledge mapping construction method and device towards enterprise's market conditions |
CN110674311A (en) * | 2019-09-05 | 2020-01-10 | 国家电网有限公司 | Knowledge graph-based power asset heterogeneous data fusion method |
CN110689385A (en) * | 2019-10-16 | 2020-01-14 | 国网山东省电力公司信息通信公司 | Power customer service user portrait construction method based on knowledge graph |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN110825882A (en) * | 2019-10-09 | 2020-02-21 | 西安交通大学 | Knowledge graph-based information system management method |
CN110929042A (en) * | 2019-11-26 | 2020-03-27 | 昆明能讯科技有限责任公司 | Knowledge graph construction and query method based on power enterprise |
CN111444351A (en) * | 2020-03-24 | 2020-07-24 | 清华苏州环境创新研究院 | Method and device for constructing knowledge graph in industrial process field |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150095303A1 (en) * | 2013-09-27 | 2015-04-02 | Futurewei Technologies, Inc. | Knowledge Graph Generator Enabled by Diagonal Search |
US10915577B2 (en) * | 2018-03-22 | 2021-02-09 | Adobe Inc. | Constructing enterprise-specific knowledge graphs |
-
2020
- 2020-09-24 CN CN202011030017.3A patent/CN112214611B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446341A (en) * | 2018-10-23 | 2019-03-08 | 国家电网公司 | The construction method and device of knowledge mapping |
CN109471949A (en) * | 2018-11-09 | 2019-03-15 | 袁琦 | A kind of semi-automatic construction method of pet knowledge mapping |
CN110516077A (en) * | 2019-08-20 | 2019-11-29 | 北京中亦安图科技股份有限公司 | Knowledge mapping construction method and device towards enterprise's market conditions |
CN110674311A (en) * | 2019-09-05 | 2020-01-10 | 国家电网有限公司 | Knowledge graph-based power asset heterogeneous data fusion method |
CN110825882A (en) * | 2019-10-09 | 2020-02-21 | 西安交通大学 | Knowledge graph-based information system management method |
CN110689385A (en) * | 2019-10-16 | 2020-01-14 | 国网山东省电力公司信息通信公司 | Power customer service user portrait construction method based on knowledge graph |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN110929042A (en) * | 2019-11-26 | 2020-03-27 | 昆明能讯科技有限责任公司 | Knowledge graph construction and query method based on power enterprise |
CN111444351A (en) * | 2020-03-24 | 2020-07-24 | 清华苏州环境创新研究院 | Method and device for constructing knowledge graph in industrial process field |
Non-Patent Citations (2)
Title |
---|
基于知识图谱的企业知识服务模型构建研究;张肃;许慧;;情报科学(08);全文 * |
大规模企业级知识图谱实践综述;王昊奋;丁军;胡芳槐;王鑫;;计算机工程(07);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112214611A (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112214611B (en) | Enterprise knowledge graph construction system and method | |
CN110147437B (en) | Knowledge graph-based searching method and device | |
CN104615755B (en) | A kind of new question answering system based on mass-rent | |
Chebotko et al. | A big data modeling methodology for Apache Cassandra | |
CN109446344B (en) | Intelligent analysis report automatic generation system based on big data | |
Karnitis et al. | Migration of relational database to document-oriented database: Structure denormalization and data transformation | |
CN107193967A (en) | A kind of multi-source heterogeneous industry field big data handles full link solution | |
US20150095303A1 (en) | Knowledge Graph Generator Enabled by Diagonal Search | |
US20230177078A1 (en) | Conversational Database Analysis | |
An et al. | Methodology for automatic ontology generation using database schema information | |
US20140201203A1 (en) | System, method and device for providing an automated electronic researcher | |
CN111506621A (en) | Data statistical method and device | |
Li et al. | Discovering enterprise concepts using spreadsheet tables | |
CN116361487A (en) | Multi-source heterogeneous policy knowledge graph construction and storage method and system | |
CN107958004A (en) | The construction method and device of a kind of knowledge base | |
Di Blas et al. | Exploratory computing: a comprehensive approach to data sensemaking | |
CN114564482A (en) | Multi-entity-oriented label system and processing method | |
Ben Kraiem et al. | OLAP operators for social network analysis | |
Álvarez-García et al. | Compact and efficient representation of general graph databases | |
CN113312342A (en) | Scientific and technological resource integration system based on multi-source database | |
CN111125045B (en) | Lightweight ETL processing platform | |
Fernández et al. | Management of big semantic data | |
CN116467291A (en) | Knowledge graph storage and search method and system | |
Tsvetovat et al. | NetIntel: A database for manipulation of rich social network data | |
CN112835920A (en) | Distributed SPARQL query optimization method based on hybrid storage mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |