CN116204660A

CN116204660A - Multi-source heterogeneous data driven domain knowledge graph construction system method

Info

Publication number: CN116204660A
Application number: CN202310314038.5A
Authority: CN
Inventors: 陈佳; 张任宇
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-06-02
Anticipated expiration: 2043-03-28
Also published as: CN116204660B

Abstract

The invention relates to the technical field of software data interaction, in particular to a method for constructing a domain knowledge graph driven by multi-source heterogeneous data, which comprises the steps of performing back-end encapsulation on functional codes in the technical field by constructing Web service, providing a front-end interface for a user to realize construction and management of the knowledge graph, perfecting a system, optimizing the use experience of the user through a multi-category high-relevance visual interface and a concept ontology-level log function, and realizing the regular automatic update of the domain graph through the regular grabbing, automatic cleaning and warehousing of the multi-source heterogeneous data and the dynamic update of the concept ontology in Web, wherein after the expert is audited, a plurality of graph versions can be issued and the knowledge graph can be returned to a historical version. The invention constructs the domain knowledge graph through the multi-source heterogeneous data such as concept system, academic, technology, product, patent, standard and the like, standardizes the relationship among entity types, entity attributes and entities through Schema design, and improves the usability of the knowledge graph system.

Description

Multi-source heterogeneous data driven domain knowledge graph construction system method

Technical Field

The invention relates to the technical field of software data interaction, in particular to a multi-source heterogeneous data driven domain knowledge graph construction system method.

Background

The knowledge graph is a novel knowledge management mode recently, forms a graph topological structure by using concepts, expresses the knowledge model in a graph mode, greatly improves the data visualization, and has application benefits in promoting the knowledge of various industries of national economy. The domain knowledge graph is an important point of research of each unit, is constructed based on specific industry data, emphasizes the depth of knowledge, and is applied to the domain, so that the domain knowledge graph becomes the main content in the current knowledge graph research.

At present, the construction of the domain knowledge graph has partial achievements, such as a body-based method for constructing the knowledge graph by using a top-down construction method, a domain knowledge graph method by using semi-structured encyclopedia data, and a knowledge graph construction method with higher automation degree by using a knowledge extraction task model.

The methods all aim at the data sources of the construction atlas, and a specific atlas construction method is selected to realize construction of the domain knowledge atlas.

However, along with the propagation of the knowledge graph concept, how to integrate the multi-source data, the heterogeneous data and the dynamic update attribute into the knowledge graph construction method so as to realize the accuracy and the high efficiency of the knowledge graph construction is a new requirement for the knowledge graph development. In addition, the construction threshold of the knowledge graph is reduced, so that more experts and scholars with field knowledge can more conveniently participate in the construction of the knowledge graph, and the knowledge graph is not only aimed at knowledge graph technicians, but also the problem to be considered in the development of the knowledge graph.

Aiming at the actual needs of the field knowledge graph construction system, a plurality of feasible research and implementation methods exist at present, but the system is required to enable a user to easily construct a more perfect knowledge graph which can be dynamically updated, and the methods have a plurality of limitations:

firstly, the domain knowledge graph needs to design Schema in advance, define which types of entities and relationships among various types of entities are contained in the graph, and standardize the format of data added into the graph, and the existing partial solutions lack such consistency constraint.

Secondly, the atlas construction system needs to have good compatibility for various formatted data with different degrees, so that more knowledge can be conveniently and rapidly extracted from a data source to construct a knowledge atlas, and the realization is not ideal in the solution of improving the atlas by adding data by manpower.

Thirdly, the construction of the map is convenient and quick, and even users unfamiliar with the operation of the bottom layer of the construction of the map can also utilize the interface packaged by the upper layer to construct the map, which is not mentioned in most prior knowledge map construction methods.

Therefore, in order to solve the above problems, the present application proposes a system method for constructing a domain knowledge graph driven by multi-source heterogeneous data, which is implemented by using a concept system, academic, technology, product, patent, and dynamic knowledge graph driven by standard multi-source heterogeneous data, through data acquisition, knowledge extraction, knowledge fusion and storage, and knowledge update, and through packaging and dynamic update by Web service.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a system method for constructing a domain knowledge graph driven by multi-source heterogeneous data, which is implemented by a concept system, academic, technology, products, patents and dynamic knowledge graphs driven by standard multi-source heterogeneous data through data acquisition, knowledge extraction, knowledge fusion, storage and knowledge updating, and packaging and dynamic updating through Web service.

In order to achieve the above purpose, the present invention provides a method for constructing a domain knowledge graph driven by multi-source heterogeneous data, comprising the following steps:

s1, data acquisition, knowledge extraction, knowledge fusion and storage and knowledge update;

s2, performing back-end encapsulation on the functional codes in the S1 by building Web services, and providing the back-end encapsulation for a front-end interface of a user to realize construction and management of a knowledge graph;

s3, perfecting the system, optimizing the use experience of a user through a visual interface and a log function, and realizing the accuracy and continuity of knowledge graph construction through graph dynamic update and graph version release;

the domain knowledge graph construction system comprises:

s101, a map Schema definition subsystem;

S102, a data acquisition subsystem;

s103, a knowledge extraction subsystem;

s104, a knowledge storage and management subsystem;

s105, a knowledge dynamic updating subsystem;

s106, a map visualization subsystem;

s107, a user management subsystem;

s108, a map log management subsystem;

s109, a map version release subsystem;

s101 is as follows:

default system presets a set of Schema of six entity types including concept system, academic, technology, product, patent and standard, and users adjust the constraint on knowledge by defining and dynamically adjusting Schema, so as to meet the entity attribute and relationship predefined by Schema and then allow the Schema to be updated into the knowledge graph;

s102 is as follows:

in addition to receiving data of various formatting degrees manually input by a user, the system dynamically captures new data according to data sources and updating intervals defined by the user, and corresponding to entity types and relations in a Schema system framework, and the system designs a data receiving interface for each entity type respectively and constrains the data according to attributes of the entities and the relations in the framework;

s103 is as follows:

receiving the original data of the data acquisition subsystem, respectively carrying out knowledge extraction in different modes according to different formatting degrees of the data, completing constraint of entity attributes and relationships after the relationship extraction of the entities through the node types, attribute templates and relationships among different types of nodes designed in the map Schema, and allowing a user to correct knowledge extraction results, ensuring knowledge accuracy, and giving out how different types of entities operate respectively;

S104 is as follows:

the knowledge transmitted by the knowledge extraction subsystem is conveniently added into the knowledge graph through a functional component formed by an interface of system encapsulation, and the operation of the database is encapsulated to form the functional component, so that a high-authority user is allowed to conveniently add, delete and modify the knowledge in the knowledge graph to ensure the accuracy of the knowledge graph, and in order to ensure the consistency of the knowledge in the knowledge graph, the modification operation of nodes and relations must meet the specification of graph Schema and the relation types between nodes are not allowed to be newly added;

because the concept class entities in the knowledge graph are different from articles and other types of entities in the patent, the concept class entities have father-son relations, so when the concept class entities are deleted and moved, the whole movement or deletion of the whole node is not directly carried out, but all child nodes of the concept class entities are displayed at the front end, whether the concept class entities move together with the father node can be selected, if the user selects not to follow the movement, the operation only moves the selected node, and for all the concept nodes below the selected node, the system adds labels to be processed and adds notification reminding users to reprocess the concepts to be moved/deleted; in addition, when deleting the node, the system does not delete the node from the database completely, but hides the node by setting attribute setup=no, and when modifying the node, the system records related data and attribute change before and after modification;

S105 is:

the system has two functions, namely, a data acquisition subsystem receives a user update instruction, acquires semi-structured data and unstructured data for user increment, and stores knowledge map to realize knowledge update after knowledge extraction through user confirmation; secondly, carrying out correlation calculation on keywords and terminal concepts in the map according to academic paper data stored in the knowledge map defined by the Schema, and if the correlation and the frequency exceed a threshold value, carrying out terminal concept entity update, allowing a user to intervene in an update result, and ensuring the accuracy of knowledge update;

s106 is as follows:

the method comprises the steps of conveying and communicating information by means of a graphical means, displaying the content according to a time axis besides the integral graphical display of the knowledge graph, conveniently displaying the latest knowledge in the knowledge graph, and respectively displaying the entity according to a plurality of entity types defined by a Schema, highlighting and displaying related entities with high heat and new time in the entity types, and providing the connection of the content among different types and the most related content in the same type, so that users with different requirements can conveniently check the content;

s107 is:

managing multi-level users, including expert users in each field, multi-level paying users, common tourists, knowledge graph administrators in each field and system administrators;

S108 is as follows:

the log records of single nodes and relations of the bottom database level are abandoned, single multi-node relation operation of the database is recorded according to the back-end program, various information of nodes and relation ids modified by the single operation record is carefully recorded, the check of the single modification record of the map is realized, and the integral withdrawal operation of the single operation is convenient for users to use;

s109 is:

multi-version release: the release of the versions is required to be checked under the guidance of an expert, the domain knowledge graph of a certain time node is released, and the system stores backup and information records for each version, can be used for displaying the construction history of the graph and manually returns to the history version.

S102 specifically comprises the following steps:

s201, constructing an ontology of domain concepts according to a tree hierarchy, importing data of the domain concepts by a concept text file in a preset format, wherein the domain concepts in the concept text file can be imported from an electronic monograph or a document, can also be imported by a domain expert, and belongs to structural data; the text file is indented according to the hierarchy, the concepts in the same hierarchy are indented, the deeper the indentation is, the lower the hierarchy of the concepts is, and the top concept is usually in the large direction of the discipline field, such as the network space security field;

S202, in a map Schema preset by a system, the paper class entity comprises attributes: authors, paper names, abstracts, keywords, publication time, publication journal, publication meeting; there are two methods for importing paper-like data: firstly, a user downloads a predefined json file template provided by a system to complete the importing of formatted papers; secondly, the system periodically grabs paper data, contains attribute data defined in Schema, and stores the attribute data after de-duplication with the existing data; the system can update the knowledge in S105 according to the newly added paper data; the attribute of the paper entity is closely related to other entities, and the related technology, product and patent type entity can be related by the concept contained in the field through the key words, and most of the paper entities belong to the concept field of the lowest level;

s203, the standard class entity comprises the following attributes: standard classification, standard type, standard name, introduction, release time, and the like, and the paper type entity has a similar unified structure, and adopts a data access method of the paper type entity: namely structured template input and semi-structured page extraction;

s204, the patent class entity comprises the following attributes: patent type, inventor, standard name, application number, application date, and data access method of paper entity is adopted because of similar more unified structure with paper entity: namely structured template input and semi-structured page extraction;

S205, the engineering class entity has no fixed template, so its attribute is specified as: the name, the technical classification, the technical content and the technical legal status, so that the technical extraction is performed from the disclosure except for uploading the structured data by downloading the attribute template, so that the data input of engineering technical entities is uploaded by the user, and the text is transmitted to the knowledge extraction module;

s206, the product entity is the same as the engineering technology, and the text uploaded by the user can be received except for the product name, the product release time, the product content, the product application field and the attribute template json file of the company for releasing the product.

S103 specifically comprises the following steps:

s301, in academic paper class data, if structured data, directly realizing entity creation and relation establishment, if semi-structured data is subjected to standard structuring according to attribute values of paper class entities in Schema, then extracting related keywords from abstract, topic and paper keywords, establishing < paper, wherein the keywords in the paper are also content under a certain concept in a domain concept system, so that < concept is established again, the keywords are included, the keywords are three groups, and after matching, the relationship between the academic paper and the domain concept is established through the keywords, and the essence is that knowledge in the paper is used as concept system body rich keywords;

S302, patent and standard class data, and the processing method of the classmate paper is similar, if the input is semi-structured data, the semi-structured data is firstly analyzed into standard structured data, then keyword extraction is carried out on the abstract of the patent or the standard and the text content of the introduction containing main content, and the relation with the field concept system is established through the keywords;

s303, if engineering technology and product data are adopted, and text type data are accepted, entity and relation extraction is required to be carried out on input texts through an entity and relation extraction model, entity attribute is subjected to Schema constraint, relation chain of product use technology is considered in relation establishment, a < product, use and technology > triplet is established by the product and technology entity, if a paper is adopted as a theoretical basis, a < product, technical source and paper > triplet is established by the product and paper entity, if patent is related to research and development of the product, a < product, use and patent > triplet is established by the product and patent entity, and corresponding standards are required to be met by the product in order to ensure applicability, so that the < product, conformity and standard > triplet is established by the product and the standard; aiming at the details in the technology, matching keywords, and establishing a < engineering technology comprising, keyword > triples; accordingly, when the product and the technology establish the relationship, the product can establish the relationship with the concept system and the paper data as a whole through the technology including the keywords.

S105 specifically is:

the first knowledge updating method is to compare incremental data through a newly imported structure text of a user and then add the incremental data into a database; the second knowledge updating method aims at the paper, patent and product data, can acquire the entity type of information from the semi-structured encyclopedic database, can take the data source and crawling interval appointed by a user as parameters, realizes the incremental acquisition of the paper, patent and product entity by setting a timing task at a server, and performs the extraction of triples by utilizing a knowledge extraction subsystem and stores the triples in a knowledge updating list; taking the paper ontology as an example, the automatic updating mechanism of the system is to discover new keywords related to the concept at the extreme end of the ontology through the similarity calculation of keywords of the newly added paper and keywords of the existing branch concepts, and store the new keywords in the entity table after the new keywords are higher than the frequency and similarity threshold set by the system, and remind a user to check the updating of the concept ontology when the user logs in.

S106 specifically comprises the following steps:

the first kind is a point diagram which is the same as the whole knowledge graph, and the names of the nodes and the relation between the nodes are intuitively seen;

the second is the presentation of the tree diagram, the system will demonstrate the first three-layer concept preferentially, look over its lower level concept through clicking or zooming in and out of users, and look over the detailed information of concept and other types of entities that link to it through clicking the concept, facilitate users to obtain a certain concept related knowledge fast;

The visualization of other types of entities is that the patent product entities are relatively independent in the entity type, so that the visualization draws different sizes according to the heat of the entity and the newly released information, and when a user clicks a certain patent or a certain product, the user can display the related upper and lower concepts of the concepts connected with the node and other entities under the same concept besides displaying the specific information of the node, thereby facilitating the user to acquire knowledge and information.

Compared with the prior art, the invention has the following beneficial effects:

the domain knowledge graph is constructed through concept systems, academia, technology, products, patents and standard multi-source heterogeneous data, and the entity types, entity attributes and relationships among entities are standardized through Schema design.

The structured, semi-structured and unstructured data uploaded by the user are subjected to knowledge extraction in different modes, so that knowledge extraction of entities and relations and standardization of entity attributes are completed, and the time and frequency multidimensional data defined according to the Schema are stored in a knowledge graph, so that the compatibility of the system to the data with different formatting degrees is improved.

The system receives the map updating setting of the user, obtains more data by the difference and performs knowledge updating by combining the multidimensional information of the entity in the existing map. In addition, the operation of the knowledge graph is packaged, the POST operation of the front-end interface of the user is received through the back-end service, the knowledge graph is managed, and the usability of the knowledge graph system is improved.

Drawings

FIG. 1 is a schematic diagram of a pattern Schema system model of the present invention.

FIG. 2 is a schematic diagram of the semi-structured paper web page data acquisition of the present invention.

FIG. 3 is a schematic flow chart of the method of the present invention.

Fig. 4 is a schematic diagram of a system architecture according to the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings.

Referring to fig. 1-4, the invention provides a multi-source heterogeneous data driven domain knowledge graph construction system method, which is realized by adopting a python+neo4j+htmljscss, wherein Python language operates a knowledge storage medium neo4j database through a RestfulAPI by using a Cypher language character string through a Neo4j package, and a graph database entity and relationship adding, deleting and modifying function is realized, which is also a common technology for other numerous knowledge graph items. However, the invention additionally utilizes the lightweight Web frame Python-flash to carry out back-end encapsulation on the knowledge graph construction, HTMLJSSS language is used for front-end development, json format data is used for front-end and back-end data interaction. Therefore, the front end of the user inputs the character string or the file, and the Ajax-POST requests call the related APIs of the rear end, so that the knowledge graph is constructed. The following describes a data interaction method of the front end and the back end, how the front end visually displays the knowledge graph, and a specific database operation step of the back end for realizing the knowledge graph construction step.

1. Interaction method of front-end data and back-end data

In front-end interface development, besides ensuring the simplicity of an interface through writing of HTML labels and CSS styles, the method also uniquely determines the use id of a component for receiving user input, extracts user input received in the component through a document/getElementById () method of js language when a user clicks a button to start the js function, temporarily stores the function as a variable in the component, calls an interactive framework of Ajax-POST, uploads the input just obtained in url corresponding to a sub-function in a rear-end flash framework, returns a result distinguishing various conditions of success or failure of operation in the rear-end development, and assigns the result to a res variable in callback function (res).

$.ajax({

url:"/api/uploadnewconcept",

data:data,

type:"POST",

dataType:"json",

success:function(res){

The// callback function processes the content in the res return value

}

})

Through the mode, the data interaction of the front end and the back end is realized, so that a user does not need to care about the execution process of the back end code when the front end interface uses the system, and the specific result of the operation of the user is checked through the display of the front end interface.

2. Method for realizing operation code of back end aiming at Neo4j knowledge graph storage medium

When the user operates the operation of modifying the knowledge graph at the front end, after receiving the knowledge to be modified at the rear end of the Python-flash, the uploaded entity and relationship data are converted into Neo4j native Cypher language character strings, and the link with the Neo4j database is established through a. Session () method in the Neo4j database, so that the operation of nodes and relationships in the graph database is realized. The Cypher statement used by Neo4j database is a declarative graph query language supporting Chinese characters, which allows efficient query and update of the graph expressively.

2.1 creation of entity nodes

In the Cypher statement, the description of the node can be modeled (variable name: entity type name { attribute name: attribute value }), and the add node uses CREATE key, when adding the node, the database will automatically add an identifier integer ID to the node for unique identification without additional specification by the user. Thus, taking newly created author entity with name and organization name as attributes as an example, its Cypher statement is shown below.

CREATE (n: author { name: ' jianjin ' shape ', construction: ' College of Science, PLA Information Engineering University, zhengzhou ' }) RETURN n;

2.2 queries of entity nodes of a certain class

Referring to the description model for the node, a search may be performed by a MATCH statement specifying an entity type name. Taking query of all concept class entities as an example, the Cypher statement is shown below.

MATCH (n: concept) RETURN n;

2.3 Attribute modification updates for entity nodes

The updating of the node attribute can be mainly divided into the following two steps: firstly, inquiring the node in a database to be stored as a variable, and then endowing the node with an attribute. The query node uses MATCH key words, and attributes are given to the nodes through SET key words, so that the attribute 'source address' is added to the paper entity as an example, and the Cypher statement is shown as follows.

MATCH (n: paper { name: 'Cyber Security Situational Awareness among Parents' }) SET n.src= "https:// ieeExplore.ieee.org/document/8626830" RETURN n;

2.4 creation of relationships between entities

The description of the relationship by the Cypher statement can be similar to an entity, and is (variable name: relationship name { attribute name: attribute value }) since the relationship exists between two entity nodes, two nodes needing to establish the relationship need to be located by the above-mentioned node description manner. Then, a description model of the unidirectional relation c is established between the node a and the node b as (a) - [ c ] - > (b). Thus, here, taking the "keywords of paper" relationship between paper and keywords as an example, the Cypher statement is shown below.

MATCH (a: paper { name: 'Cyber Security Situational Awareness among Parents' }) MATCH (b: keyword { name: 'Cyber security awareness' }) CREATE (a) - [ r: keyword of paper ] - > (b) RETURN r;

knowledge extraction method for multi-source heterogeneous data by rear end

3.1 definition of atlas Schema

Firstly, the system initializes a Schema stored in a Schema. Json file of the system main catalog by concept, technology, patent node type and attribute of each node, and at the time of system start, the following instruction is used to read the file of the Schema of the save map:

GraphSchema＝eval(fileObj.read())

When the modification requirement of the user front end on the Schema is received, for example, the node type is added or the attribute of a node of a certain type is modified, the back end program can modify the file. In order to facilitate the program to read, the Schema file format is of json type, and a formatted storage strategy is adopted to define fields of various entity types so as to summarize the Schema used by the map.

'label_name' is the entity type name,

'label_attribute' is the name of the attribute that should be under the entity type,

'relationships' are relationships that may be present between the entity type and other entity types, with values that are a list of dictionary types that hold names of other entity types, and relationship names.

3.2 structured data extraction knowledge

The system reminds a user in a front-end interface to carry out formatting processing on data in the hand of the user according to the prompt of the system, and the system designs two receiving methods of formatted data so as to improve the convenience of use: for concept name files in the concept hierarchy, txt files representing concept parent-child relationships with tab symbols can be received, and for other entity types with more entity attributes, the system receives json files specifying fields according to the attributes that the entity type defined by the Schema file currently read by the system should have. After receiving the file uploaded by the user, the system completes knowledge extraction and storage through the following steps.

Filedata= GetFileData (filepath)// extracting raw data from user uploaded files

Node, relation=getnodand Relation (Filedata, graphSchema)// is formatted data, and Node and Relation in file are obtained by Schema defined by system at present

CreateNode (Node)// creation of nodes in a graph by Session. Run (cypher-create statement)

CreateRelation (Relation)// creating relationships in a atlas by Session. Run (cypher-create statement)

SetAttributes (Node)// modifying node properties in a graph by session. Run (cytor-set statement)

Setleg (Node, relation, log page)/log this operation

3.3 extraction of knowledge from semi-structured data

The general semi-structured knowledge source analyzes the page according to the website page data source by using a wrapper to obtain useful data information, the information required by the user in the page is generally and rapidly locked through an xpath path in webpage analysis, and as the tag arrangement of the general page of the regular website has higher consistency, the system reminds the user to copy the function through the xpath path of the browser, receives the xpath path of the content required by the user, is applied to knowledge extraction of different sub-webpages under the same website which is uploaded by the user later, and finally realizes knowledge extraction. The main flow is as follows:

rooturl=geooturl ()// get root url uploaded by user at front end

Xpath=getxpath ()// obtain Xpath path obtained by user through browser

url=geturnls (leaf)// sub-url of the acquisition content required by the acquisition user, and forms the url last used for crawling

Html= CrawltoHTMl (urls)// circularly crawling each url corresponding website page, and converting the website pages into Html format for convenient processing

Node, relation= GetContentbyXpath (Html)// obtaining Node and Relation by analyzing page through xpath path

Setleg (Node, relation, log page)/log this operation

3.4 unstructured data extraction knowledge

The knowledge extraction of unstructured data needs to use an entity recognition network model, mainly comprises a named entity recognition model and an entity relation extraction model, and the knowledge extraction is carried out by the following method after the unstructured text data of a user is received by the rear end of the system.

knowledges= SearchInfoinGraph (inputstring)// will input existing knowledge in entity name query graphs in text sequences

content=inputlayer (inputstring, knowledges)// fusion of knowledge in input text and atlas

vector = Embedding (content)// encoding the entire content

lastlayer=xlnet (vector)// self-coding learning is performed on content and position to obtain the last hidden state layer

node= CRF (lastlayer)// deriving optimized entity annotation sequences using probability map models

relation = softmaxFC (lastlayer)// calculate relationship using full connection layer softmax

Setleg (Node, relation, log page)/log this operation

Back-end knowledge graph updating method

4.1 user-specified Diagram delta updates

The user sets a data source website for acquiring data and an Interval for acquiring data by the difference at the front end, the server receives a data updating crawling instruction set by the user, and invokes a Scrapy crawling frame, and parameters are set:

dont_filter＝False

The Url deduplication of the crawl data is ensured, and the function of incremental crawling is realized.

The semi-structured data and unstructured data obtained by crawling can be obtained according to the methods of 3.2 and 3.3

4.2 map updating for node correlation computation within a map

The method mainly comprises the steps of updating a knowledge graph body, periodically updating the rear end, issuing and executing a task through an apschedule by a timing task, calculating correlation between a concept of a relatively terminal of the body and other keywords in the graph, and finally obtaining the probability of using the other keywords as new bodies.

nodes=getallndes ()// obtain last two levels of nodes in the domain ontology

knowledges= GetKnowledgeofnode (nodes)// acquire knowledge about correlations in node-connected atlases

vector= Vectorize (knowledge)// vectorizing node and connected knowledge

calculating similarity between the keyword and each node vector, and if the similarity is higher, the keyword may be used as a new branch at the tail end of the body, perfecting the body of the atlas and returning to the front end for confirmation.

Front-end knowledge graph visualization method

If a user needs to look at the dotted line graph or the tree graph, the front-end knowledge graph visualization link needs to return according to the corresponding json format. For example, in a tree diagram, the child field of a parent node is required to contain child nodes, while the dotted diagram is a subscript that is required to indicate the relevant nodes.

The front end receives json data, the d3js force is utilized to guide the gallery, svg canvas is added first, then elements are added respectively according to required points and lines, and the association between the coordinates of the points and the lines is established to ensure the tight connection of the points and the edges on the diagram. The main implementation mode is as follows:

svgarea= GetElementandAddSVG (elementid)// pair by locking element id

Simulation force steering of approximation=d3.formationo ()// d3js

link= createlinkinsvg (linkdata)// since the first drawing will be placed at the bottom, the first drawing edge linktest= createlinktest (linkdata)// drawing edge literal, i.e. relationship name

node= createnodeinsvg (nodedata)// drawing node

Nodetest= createnodetest (nodedata)// plotted node name

Createnodeclick (nodedata)// set functions triggered when a node clicks, such as looking at node related information, expose node attribute operations.

Log record and knowledge graph version release method for specific development of Neo4j database by system

In the system main catalog, storing a file in a log and a release, in the two folders, taking a character string obtained by SHA256 for different domain names as a new folder to store version release and log information of different domains conveniently by the collision resistance of the SHA256 under the two directories.

Version 5.1 release

When the user front end triggers release, the following is all the operations that the system back end program will perform to implement release:

WriteBasicInfo (releasefile)// writing date, version number information into the file

nodosinfo= GetAllNodesofDomain (domainname)// use session. Run (Cypher-match statement) to obtain all nodes in the field

WriteNodeToFile (releasefile)// write node name, id, attribute all node information into the file

relationship= GetAllRelationofDomain (domainname)// use of session. Run (Cypher-match statement) to obtain all relationships in the art

WriteReconstationToFile (relationship fo, relationship file)// write relationship name into file, attribute ownership information

5.2 logging

When the user makes any modification to the knowledge graph, the system will leave a record, mainly of the following: timestamp, type of operation (add/delete/modify/search), operator, node operated and relationship content. The method for recording the log at the back end of the system is as follows:

the/accept knowledge storage link, at which time nodes and relationships have been successfully imported

def Setlog(Node,Relation,logfile):

WriteBasicInfo (logfile)// writing date to file, modifying person header information

WriteNodeToFile (nodeinfo, logfile)// write node name and attributes to a file

WriteRelationToFile (relationship fo, logfile)// write relationships and attributes to a file

The knowledge graph modification log is stored through the file, and the system also displays various operations on the knowledge graph, which are performed at the front end, to the user so that the user can check the modification process of the knowledge graph and perform other operations, such as undoing the modification. In the revocation operation, it is necessary to complete the revocation by setting a function having an opposite function correspondingly through data stored in the log according to an operation function at the time of knowledge storage.

DeleteNode (node)// function inverse of the add node CreateNode () operation

DeleteRelation (relation)// function inverse of the add relation CreateRelationoperation

SetAttributes (Node, oldvalue)// still uses the original attribute setup function, but reassigns the original attribute value to the Node attribute when revoked.

The above is only a preferred embodiment of the present invention, only for helping to understand the method and the core idea of the present application, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

The invention solves the problems of lack of consistency constraint, unsatisfactory realization of compatibility of different data and incomplete adaptability of the solution method in the prior art on the whole, and realizes construction of a knowledge graph by data acquisition, knowledge extraction, knowledge fusion, knowledge storage, knowledge update, and packaging and dynamic update of Web service.

Claims

1. The method for constructing the domain knowledge graph driven by the multi-source heterogeneous data is characterized by comprising the following steps of:

s2, performing back-end packaging on the functional codes in the S1 through building Web service, and providing the back-end packaging for a front-end interface of a user to realize construction and management of a knowledge graph;

the domain knowledge graph construction system comprises:

s101, a map Schema definition subsystem;

s102, a data acquisition subsystem;

s103, a knowledge extraction subsystem;

s104, a knowledge storage and management subsystem;

S105, a knowledge dynamic updating subsystem;

s106, a map visualization subsystem;

s107, a user management subsystem;

s108, a map log management subsystem;

s109, a map version release subsystem;

the step S101 is as follows:

the step S102 is:

the step S103 is as follows:

The step S104 is as follows:

The step S105 is:

the step S106 is as follows:

the step S107 is:

The step S108 is:

the step S109 is:

2. The multi-source heterogeneous data driven domain knowledge graph construction system method according to claim 1, wherein the step S102 is specifically:

3. The multi-source heterogeneous data driven domain knowledge graph construction system method according to claim 1, wherein the step S103 is specifically:

4. The multi-source heterogeneous data driven domain knowledge graph construction system method according to claim 1, wherein the step S105 is specifically:

5. The multi-source heterogeneous data driven domain knowledge graph construction system method according to claim 1, wherein the step S106 is specifically: