CN115329099A - Knowledge graph construction and query optimization method and system based on SmartKG - Google Patents

Knowledge graph construction and query optimization method and system based on SmartKG Download PDF

Info

Publication number
CN115329099A
CN115329099A CN202211038401.7A CN202211038401A CN115329099A CN 115329099 A CN115329099 A CN 115329099A CN 202211038401 A CN202211038401 A CN 202211038401A CN 115329099 A CN115329099 A CN 115329099A
Authority
CN
China
Prior art keywords
module
smartkg
compatible
source code
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038401.7A
Other languages
Chinese (zh)
Inventor
于家晟
�田�浩
路国隋
李存冰
胡焕钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co Ltd filed Critical Inspur Software Technology Co Ltd
Priority to CN202211038401.7A priority Critical patent/CN115329099A/en
Publication of CN115329099A publication Critical patent/CN115329099A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Abstract

The invention discloses a knowledge graph construction and query optimization method and system based on SmartKG, belonging to the field of knowledge graph construction; the method comprises the following specific steps: s1, source code compiling is carried out; s2, adding a method for importing SPARQL and Cypher texts and being compatible with the old version excel; s3, embedding the functions of adding, deleting, modifying and searching the nodes in the back end into a front end UI; s4, optimizing and upgrading the data structure and part of methods; the API compatible with the SPARQL and the Cypher is provided, namely a conversion function is provided, nodes and relations thereof created in the Cypher and the SPARQL can be extracted, and technicians can conveniently import written SPARQL or Cypher sentences to create and manage the knowledge graph; the functions of adding, deleting, modifying and searching the nodes in the back end are embedded into the front end UI, so that a user can modify the node information in real time on the UI interface, and the time spent on modifying and re-uploading the nodes in the excel can be saved; and optimizing partial data structures and methods in the source codes, and improving the query efficiency.

Description

Knowledge graph construction and query optimization method and system based on SmartKG
Technical Field
The invention discloses a knowledge graph construction and query optimization method and system based on SmartKG, and relates to the technical field of knowledge graph construction.
Background
In recent years, the richness and development of knowledge graph related technologies have led more and more enterprises and scientific research institutions to begin to focus on the knowledge graph construction in the commercial or scientific research fields such as internal personnel structure optimization or search recommendation algorithms, and perform regular exploration based on the knowledge graph construction. The open-source lightweight knowledge graph construction tool SmartKG developed by Microsoft is simple to deploy, and convenient to use is popular with the public. However, as the demand of users increases, the disadvantages of the tool are gradually revealed. For example, the tool accepts only excel input, adding additional workload to subsequent analysis or modeling processes. In the template Excel, nodes and edges are divided into two worksheets, so that an analyst needs to consider what data structure is used, whether additional storage needs to be created or not when the analyst wants to take the relationship between the nodes and simultaneously take the attributes of the nodes, and how to store the attributes of different nodes with different quantities. In addition, although SmartKG and Neo4j can present the relationship between entities after running visualization, when a user wants to modify the node or the entity relationship, the user can only modify the excel content, then upload the excel content again and then display the excel content again, which definitely wastes a certain time. Moreover, in the big data era, it is unrealistic to rely on manpower to input the relationships between huge nodes and edges.
Therefore, the invention provides a knowledge graph construction and query optimization method based on SmartKG to solve the problems.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a knowledge graph construction and query optimization method and system based on SmartKG, and the adopted technical scheme is as follows: a knowledge graph construction and query optimization method based on SmartKG comprises the following specific steps:
s1, source code compiling is carried out;
s2, adding a method for importing SPARQL and Cypher texts and being compatible with the old version excel;
s3, embedding the functions of adding, deleting, modifying and searching the nodes in the back end into a front end UI;
and S4, optimizing and upgrading the data structure and part of the method.
The specific steps of S1 are as follows:
s11, downloading a zip file;
s12, entering an src folder after downloading is completed, and starting SmartKG;
s13, after the Visual Studio is entered, compiling a source code by using the Build Solution of the Build;
and S14, developing the customized function and modifying the original function.
The specific steps of S2 are as follows:
s21, uploading files in compatible txt, json and xml formats is added;
s22, judging whether the result is txt or json, and performing keyword matching on the file at the end of the xml suffix.
And the front-end UI in the S3 supports a user to modify the node information in the UI interface in real time.
A knowledge graph building and query optimization system based on SmartKG specifically comprises a source code processing module, a text compatible module, a node processing module and an upgrading optimization module:
the source code processing module: performing source code compiling;
a text compatibility module: adding a method for importing SPARQL and Cypher texts and being compatible with the old version excel import;
a node processing module: embedding the functions of adding, deleting, changing and searching nodes in the back end into a front end UI;
an upgrade optimization module: and optimizing and upgrading the data structure and part of the method.
The source code processing module specifically comprises a file downloading module, a starting processing module, a code compiling module and a function editing module:
a file downloading module: downloading a zip file;
the starting processing module: after downloading is finished, entering an src folder, and starting SmartKG;
a code compiling module: after entering Visual Studio, compiling a source code by using Build Solution of Build;
a function editing module: and developing the customized function and modifying the original function.
The text compatible module comprises a format capacity expansion module and a file matching module according to the concrete steps:
a format capacity expansion module: uploading of compatible txt,. Json,. Xml format files is increased;
a file matching module: judging to be txt or json, and performing keyword matching on the file at the end of the xml suffix.
The front-end UI in the node processing module supports a user to modify the node information in real time on a UI interface.
The invention has the beneficial effects that: the API compatible with the SPARQL and the Cypher is provided, namely a conversion function is provided, nodes and relations thereof created in the Cypher and the SPARQL can be extracted, and technicians can conveniently import written SPARQL or Cypher sentences to create and manage the knowledge graph; the functions of adding, deleting, modifying and searching the nodes in the back end are embedded into the front end UI, so that a user can modify the node information in real time on the UI interface, and the time spent on modifying and re-uploading the nodes in the excel can be saved; and optimizing partial data structures and methods in the source codes, and improving the query efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an implementation of step S2 of an embodiment of the method of the present invention; FIG. 2 is a partial screenshot of the program response interface of S3 of a method embodiment of the present invention; fig. 3 is a flow chart of a delay test in an embodiment of the method of the present invention.
Detailed Description
The present invention is further described with reference to the accompanying drawings and specific examples so that those skilled in the art may better understand the invention and practice it, but the examples are not intended to limit the invention;
some of the terms of art to which the invention relates will be explained first:
knowledge graph: a series of different graphs displaying the relation between the knowledge development process and the structure describe the knowledge resources and the carriers thereof by using the visual technology, and mine, analyze, construct, draw and display the mutual relation between the knowledge and the knowledge;
triple group: SPO (Subject, predicate, object), subject, predicate, object; the subject and object are generally a person or thing, and the predicate is an effect or association that the subject exerts on the object; such as: < Geoflrey Hinton > < is a > < research >;
RDF Resource Description Framework, resource Description Framework; is an expression of this logical form of a triplet; typically expressed using XML format;
the attribute graph is a multiple relation graph which comprises a Vertex (Vertex) and an Edge (Edge); each entity in the attribute graph model is defined as a vertex; for example: vertex { "name": three in tension "} Edge {" relationship ": friend", "head": "zhangsan" and "tail": "Lisi" }
Neo4j, a high performance, NOSQL, graphical database; is a high performance graph engine that stores structured data on a network (called a graph mathematically) rather than in tables;
the open-source lightweight knowledge graph construction tool is developed by SmartKG Microsoft's first seat algorithm engineer LiYe, accepts excel as input, contains description of knowledge graphs (vertexes and edges), and converts the description into memory graph storage; the item realizes searching, filtering and acquiring the API of the node from the memory knowledge graph; the project also provides a dialogue management framework, and a chat robot based on the knowledge graph can be used by the user; (https:// github. Com/microsoft/SmartKG)
The SPARQL is called as SPARQL Protocol and RDF Query Language in full English, and is a Query Language and a data acquisition Protocol developed for RDF; since RDF data is represented in the form of triples, almost all SPARQL statements contain triples;
cypher is a descriptive graph query language, and allows the query of expressive force and efficiency of graph storage without compiling traversal codes of graph structures;
MongoDB is a database based on distributed file storage;
the first embodiment is as follows:
a knowledge graph construction and query optimization method based on SmartKG comprises the following specific steps:
s1, source code compiling is carried out;
s2, adding a method for importing an SPARQL text and a Cypher text and being compatible with an old version excel import;
s3, embedding the functions of adding, deleting, modifying and searching the nodes in the back end into a front end UI;
s4, optimizing and upgrading the data structure and part of methods;
the invention provides a knowledge graph construction and query optimization scheme based on SmartKG in a practical level, which can facilitate technical personnel to introduce previously written Cypher or SPARQL into SmartKG to establish, manage and maintain the knowledge graph, save the time for modifying the node relation, optimize a data structure and improve the query efficiency;
further, the specific steps of S1 are as follows:
s11, downloading a zip file;
s12, entering into an src folder after downloading is completed, and starting SmartKG;
s13, after the Visual Studio is entered, compiling a source code by using the Build Solution of the Build;
s14, developing a customized function and modifying an original function;
at present, a user can clone through https// github.com/microsoft/SmartKG.git or git @ github.com: microsoft/SmartKG.git, or can directly download a zip file;
after downloading is finished, entering an src folder, and starting SmartKG.sln; after entering Visual Studio, clicking Build Solution of Build, and compiling a source code; then, developing a customized function and modifying the original function;
further, the specific steps of S2 are as follows:
s21, uploading files in compatible txt, json and xml formats is added;
s22, judging whether txt or json exists, and matching keywords of the file at the tail of the xml suffix;
meanwhile, judging is added in a data processing module, and if the result is txt or json, keyword matching is firstly carried out on files at the tail of the xml suffix;
in the module for matching keywords, firstly, symbols which are frequently appeared in Cypher, SPARQL, xml and JSON are respectively stored in different lists, and JSON can directly extract contents through a swift.Json library, a System.Text.Json library, a Next Json library and the like; dividing the character string according to the found characters, and taking the contents before and after the key character, namely the contents of the node or the edge; then, the contents are returned, and then the contents are operated according to the existing version data processing mode and finally presented; the flow is shown in figure 1;
furthermore, the front-end UI in S3 supports the user to modify the node information in the UI interface in real time; the program back end interface provides a plurality of functions of adding, deleting, modifying and searching related to the node and the configuration, and the functions can be integrated and put into the front end interface, thereby facilitating the user to directly carry out the operation related to the node at the front end
In the data processing process of the current version, a storage part selects a Dictionary data structure and stores the Dictionary data structure in a key-value form, although the query accuracy can be ensured in the query process, when a test data set has a very large and complex relationship, a certain result display delay appears after a display button is clicked (in the figure 2, after the click, a program still responds but the result is not displayed immediately, and the test data set only comprises 170 edges, 78 nodes and different attributes thereof)
Although this delay is acceptable, for some high concurrency support later on for larger scale data, we would like this optimization to replace part of the locked Dictionary with Concurrentdictionary; although the speed of the writing operation in the process of creating the map is reduced, the speed of the reading operation can be obviously improved; this is more advantageous in query operations;
the test flow is shown in FIG. 3;
in addition, a mode that more ContainsKey + indexes value according to key occurs in the code, and a preliminary idea is considered to be replaced by a TryGetValue (key, value) form, because the method only uses one search, the speed is better, but the problem that the subsequent code structure needs to be adjusted exists, but the method is still an attempt for optimizing the query speed of the large-scale knowledge graph; similarly, new entities and relationships need to be continuously expanded in the large-scale knowledge graph construction process, so that the problem that the number of nodes exceeds the capacity may often occur under the condition that the capacity is not specified by the Dictionary (capacity), at the moment, a resize method is automatically called to reconstruct buckets and entries, once the method is frequently called, huge expenses are generated, and a maximum-scale value is preferably estimated;
because the SmartKG of the current version adopts an In-memory Graph Store (memory Graph storage), when a dictionary is traversed In the searching process, an iterator mode is also considered to replace a part of for loops, and the method aims to reduce GC and reduce expenditure.
Example two:
a knowledge graph construction and query optimization system based on SmartKG specifically comprises a source code processing module, a text compatible module, a node processing module and an upgrade optimization module:
the source code processing module: performing source code compiling;
a text compatibility module: adding a method for importing SPARQL and Cypher texts and being compatible with the old version excel import;
a node processing module: embedding the functions of adding, deleting, modifying and checking nodes in the back end into a front end UI;
an upgrade optimization module: optimizing and upgrading a data structure and part of methods;
further, the source code processing module specifically includes a file downloading module, a starting processing module, a code compiling module and a function editing module:
a file downloading module: downloading a zip file;
the starting processing module comprises: after downloading is finished, entering an src folder, and starting SmartKG;
a code compiling module: after entering Visual Studio, compiling a source code by using a Build Solution of Build;
a function editing module: developing a customized function and modifying an original function;
further, the text compatible module comprises a format expansion module and a file matching module according to the specific steps:
a format capacity expansion module: uploading of compatible txt,. Json,. Xml format files is increased;
a file matching module: judging whether txt or json exists, and performing keyword matching on files at the end of the xml suffix;
still further, a front-end UI in the node processing module supports a user to modify node information in a UI interface in real time.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A knowledge graph construction and query optimization method based on SmartKG is characterized by comprising the following specific steps:
s1, source code compiling is carried out;
s2, adding a method for importing SPARQL and Cypher texts and being compatible with the old version excel;
s3, embedding the functions of adding, deleting, modifying and checking the nodes in the back end into the front end UI;
and S4, optimizing and upgrading the data structure and part of the method.
2. The method of claim 1, wherein the specific steps of S1 are as follows:
s11, downloading a zip file;
s12, entering into an src folder after downloading is completed, and starting SmartKG;
s13, after entering the visual studio, compiling a source code by using a Build Solution of Build;
and S14, developing the customized function and modifying the original function.
3. The method of claim 1, wherein the step of S2 is as follows:
s21, uploading of compatible txt,. Json,. Xml format files is increased;
s22, judging whether txt or json exists, and matching keywords on the file at the tail of the xml suffix.
4. The method according to claim 1, wherein the front end UI in S3 supports a user to modify node information in real time in a UI interface.
5. A knowledge graph construction and query optimization system based on SmartKG is characterized by specifically comprising a source code processing module, a text compatible module, a node processing module and an upgrade optimization module:
the source code processing module: performing source code compiling;
a text compatibility module: adding a method for importing SPARQL and Cypher texts and being compatible with old version excel import;
a node processing module: embedding the functions of adding, deleting, changing and searching nodes in the back end into a front end UI;
an upgrade optimization module: and optimizing and upgrading the data structure and part of the method.
6. The system as claimed in claim 5, wherein the source code processing module specifically includes a file downloading module, a starting processing module, a code compiling module and a function editing module:
a file downloading module: downloading a zip file;
the starting processing module: after downloading is finished, entering an src folder, and starting SmartKG;
a code compiling module: after entering Visual Studio, compiling a source code by using Build Solution of Build;
a function editing module: and developing the customized function and modifying the original function.
7. The system of claim 5, wherein said text compatible module includes a format expansion module and a file matching module according to specific steps:
a format capacity expansion module: increasing the uploading of compatible txt,. Json,. Xml format files;
a file matching module: and judging whether the result is txt or json, and performing keyword matching on the file at the tail of the xml suffix.
8. The system of claim 5, wherein the front end UI in the node processing module enables a user to modify the node information in real time in the UI interface.
CN202211038401.7A 2022-08-29 2022-08-29 Knowledge graph construction and query optimization method and system based on SmartKG Pending CN115329099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038401.7A CN115329099A (en) 2022-08-29 2022-08-29 Knowledge graph construction and query optimization method and system based on SmartKG

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038401.7A CN115329099A (en) 2022-08-29 2022-08-29 Knowledge graph construction and query optimization method and system based on SmartKG

Publications (1)

Publication Number Publication Date
CN115329099A true CN115329099A (en) 2022-11-11

Family

ID=83928848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038401.7A Pending CN115329099A (en) 2022-08-29 2022-08-29 Knowledge graph construction and query optimization method and system based on SmartKG

Country Status (1)

Country Link
CN (1) CN115329099A (en)

Similar Documents

Publication Publication Date Title
US20220327137A1 (en) Modifying field definitions to include post-processing instructions
US9727604B2 (en) Generating code for an integrated data system
US7689582B2 (en) Data flow system and method for heterogeneous data integration environments
CN112262390A (en) Regular expression generation based on positive and negative pattern matching examples
US8024701B2 (en) Visual creation of object/relational constructs
EP3671526B1 (en) Dependency graph based natural language processing
WO2014186057A1 (en) Supporting combination of flow based etl and entity relationship based etl
WO2015050909A1 (en) Extracting relational data from semi-structured spreadsheets
US11281864B2 (en) Dependency graph based natural language processing
US11263187B2 (en) Schema alignment and structural data mapping of database objects
US20060122973A1 (en) Mechanism for defining queries in terms of data objects
CN112434046B (en) Data blood margin analysis method, device, equipment and storage medium
CN115543402B (en) Software knowledge graph increment updating method based on code submission
Varga et al. QB2OLAP: enabling OLAP on statistical linked open data
CN116400910A (en) Code performance optimization method based on API substitution
CN112970011A (en) Recording pedigrees in query optimization
CN116360766A (en) Low-code system and device based on DDD visual designer
US9189249B2 (en) Method for automatically defining icons
CN115857918A (en) Data processing method and device, electronic equipment and storage medium
CN115469860A (en) Method and system for automatically generating demand-to-software field model based on instruction set
CN115795046A (en) Data processing method, device, system, electronic device and storage medium
CN115329099A (en) Knowledge graph construction and query optimization method and system based on SmartKG
CN112988778A (en) Method and device for processing database query script
CN111159218B (en) Data processing method, device and readable storage medium
US20220414156A1 (en) Ingestion system for distributed graph database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination