CN112182238B - Knowledge graph construction system and method based on graph database - Google Patents

Knowledge graph construction system and method based on graph database Download PDF

Info

Publication number
CN112182238B
CN112182238B CN202010999621.0A CN202010999621A CN112182238B CN 112182238 B CN112182238 B CN 112182238B CN 202010999621 A CN202010999621 A CN 202010999621A CN 112182238 B CN112182238 B CN 112182238B
Authority
CN
China
Prior art keywords
data
graph
knowledge
fields
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010999621.0A
Other languages
Chinese (zh)
Other versions
CN112182238A (en
Inventor
路智钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010999621.0A priority Critical patent/CN112182238B/en
Publication of CN112182238A publication Critical patent/CN112182238A/en
Application granted granted Critical
Publication of CN112182238B publication Critical patent/CN112182238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph construction system and a knowledge graph construction method based on a graph database, wherein the knowledge graph construction system comprises the following steps: identifying the type of graph data in a graph database, and loading the graph data of different types into a CPU memory in batches; identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, obtaining fields of a knowledge graph, and extracting information fields; establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model based on the data dimension inner layer model; respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database; and establishing a knowledge query Web interface, analyzing the query command, and returning to the knowledge map database to retrieve result data. The invention realizes one-click query and millisecond response, reduces the difficulty of knowledge mining on the graph data as much as possible, and greatly improves the efficiency of constructing the knowledge graph.

Description

Knowledge graph construction system and method based on graph database
Technical Field
The invention relates to the technical field of knowledge graph construction, in particular to a knowledge graph construction system and a knowledge graph construction method based on a graph database.
Background
The knowledge graph is a modern theory which achieves the aim of multi-discipline fusion by combining theories and methods of applying disciplines such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology quotation analysis, co-occurrence analysis and the like and vividly displaying core structures, development histories, frontier fields and overall knowledge architectures of the disciplines by utilizing the visualized graph. The method displays the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveals the dynamic development rule of the knowledge field, and provides a practical and valuable reference for subject research. In a scene related to the construction of the knowledge graph, the complex analysis and the extraction of useful knowledge information are required, so that the process is long, the efficiency is extremely low, and the accuracy of a knowledge mining result is low due to incomplete consideration. Therefore, there is a need for a system and method for knowledge-graph construction based on graph databases to at least partially solve the problems of the prior art.
Disclosure of Invention
A series of concepts in a simplified form are introduced in the summary section, which is described in further detail in the detailed description section. This summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
To at least partially solve the above problems, the present invention provides a knowledge-graph construction method based on a graph database, comprising:
identifying the type of graph data in a graph database, and loading different types of graph data into a CPU memory in batches;
identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, obtaining fields of a knowledge graph, and extracting information fields;
establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model based on the data dimension inner layer model;
respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database;
and establishing a knowledge query Web interface, analyzing the query command, and returning to the knowledge map database to retrieve result data.
Further, the identifying the graph data types in the graph database and loading the graph data of different types into the CPU memory in batches includes:
loading a configuration file of graph data, and identifying the type of the graph data in a graph database;
monitoring the configuration file in real time, and dynamically changing the loading strategy of the graph data along with the modification of the configuration file;
and calling APIs corresponding to the graph data of different types, and loading the graph data into the CPU memory according to the loading strategy.
Further, monitoring the configuration file in real time, and dynamically changing the loading policy of the graph data along with the modification of the configuration file, including:
dynamically monitoring the configuration file, and dynamically changing the loading strategy of the graph data when the configuration file is modified;
and reloading the lost graph data from the data source file by using the addition monitoring.
Further, the calling an API corresponding to the graph data of the different types and loading the graph data into the CPU memory according to the loading policy includes:
importing the graph data into a kafka theme using flume;
and calling APIs corresponding to the graph data of different types, and loading the graph data into the CPU memory from the kafka theme according to the loading strategy.
Further, identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, obtaining fields of a knowledge graph, and extracting information fields, including:
judging whether the designated attribute associated field exists in the graph data;
analyzing, segmenting, extracting and filtering all the attribute associated fields or the specified attribute associated fields to obtain result attribute associated fields;
identifying the graph data one by one and marking the result attribute association fields to obtain fields of a knowledge graph;
extracting information fields from fields of the knowledge-graph.
Further, establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model based on the data dimension inner layer model, wherein the data dimension inner layer model comprises the following steps:
establishing a plurality of data vectors according to proportion by using the information fields extracted from the fields of the knowledge graph, taking the information fields corresponding to the uniqueness as a central starting point and a plurality of information fields not corresponding to the uniqueness as an end point, and taking a full-scale data set formed by the data vectors as a data dimension inner-layer model;
and establishing a plurality of full data vectors by using the full data set in the data dimension inner model as a central starting point and using the statistical results of different strategies aiming at the full data set as an end point, wherein the data set formed by the full data vectors is used as a data dimension outer model.
Further, according to the data dimension inner layer model and the data dimension outer layer model, data in the models are respectively persisted to obtain a knowledge graph database, and the method comprises the following steps:
the data in the data dimension inner layer model and the data dimension outer layer model are persisted to obtain final persisted data;
judging whether the configuration file of the loaded graph data designates the database type of the final persistent data or not;
dividing the final persistent data without the appointed database type into a hive partition table to obtain a knowledge graph database; and calling a corresponding API (application programming interface) for the final persistent data with the specified database type, and dividing the final persistent data into the specified database to obtain a knowledge map database.
Further, in the knowledge map database, different partitions and tables are respectively established for the data dimension inner layer model and the data dimension outer layer model; index fields are created for partitions and tables using solr or es.
Further, the establishing of the knowledge query Web interface, the parsing of the query command, and the returning of the query result data in the knowledge graph database includes:
constructing a Web to provide a data query interface for a user;
analyzing and optimizing SQL query language provided by the user, returning to the knowledge map database to extract a query result, and calling result data.
A system for knowledge-graph construction based on graph databases, comprising:
the graph data connection module is used for identifying the graph data types in the graph database and loading the graph data of different types into a CPU memory in batches;
the information marking module is used for identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, acquiring fields of a knowledge graph and extracting information fields;
the data magic cube construction module is used for establishing a data dimension inner layer model for the information field according to proportion and establishing a data dimension outer layer model based on the data dimension inner layer model;
the data magic cube persistence module is used for respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database;
and the knowledge map query module is used for establishing a knowledge query web interface, analyzing a query command and returning to the knowledge map database to retrieve result data.
Compared with the prior art, the invention at least comprises the following beneficial effects:
the knowledge graph construction system and method based on the graph database load the graph data into a CPU in batches by identifying the type of the graph data in the graph database, improve the loading efficiency, mark the graph data one by one to obtain the fields of the knowledge graph, extract the required information fields in the fields, and establish a data dimension inner layer model and a data dimension outer layer model by using the information fields, so that the accuracy of knowledge discovery results can be improved, the process of analyzing the complex graph data and extracting useful knowledge information is effectively shortened, the efficiency is improved, the data in the model can be stored for a long time after being durably solidified, and the data loss is prevented; when the knowledge mining is carried out on the graph data, a user inputs a query command through a query Web interface of a knowledge graph on an interface, the query command is returned to a knowledge graph database after being analyzed, and information to be queried is fed back to the interface, so that one-click query and millisecond-level response are realized, the difficulty of the knowledge mining on the graph data is reduced as much as possible, and the efficiency of constructing the knowledge graph is greatly improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a graph-database-based knowledge-graph construction method according to the present invention.
FIG. 2 is a schematic flow diagram of a system and method for graph-based knowledge-graph construction according to the present invention.
Detailed Description
The present invention is further described in detail below with reference to the drawings and examples so that those skilled in the art can practice the invention with reference to the description.
As shown in FIG. 1, the invention provides a knowledge graph construction method based on a graph database, which comprises the following steps:
s1, identifying the graph data types in a graph database, and loading different types of graph data into a CPU memory in batches;
s2, identifying the graph data loaded into the CPU memory one by one, marking result attribute association fields, obtaining fields of a knowledge graph, and extracting information fields;
s3, establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model based on the data dimension inner layer model;
s4, respectively persisting data in the models according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge graph database;
and S5, establishing a knowledge query Web interface, analyzing the query command, and returning to the knowledge map database to retrieve result data.
The working principle of the technical scheme is as follows: firstly, according to the type of graph data specified by a user, loading the graph data in a graph database into a CPU memory in batches by respectively adopting connectors corresponding to the graph data, then identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, obtaining fields of a knowledge graph, and extracting information fields; then, establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model for all possible result data according to a rule specified by a user on the basis of the data dimension inner layer model; finally, respectively persisting data in the model into a database according to the data dimension inner layer model and the data dimension outer layer model, wherein the database can be an unstructured database or a structured database, and further obtaining a knowledge map database; and a path of the configuration file is input by a user, a knowledge query Web interface is established to provide a query interface for the user, a query command input by the user at the query interface of the client is analyzed and then returned to the knowledge map database, and then a statistical chart and specific statistical table information constructed by the statistical strategy are displayed on a visual interface.
The beneficial effects of the above technical scheme are that: the method has the advantages that the types of graph data in a graph database are identified, the graph data are loaded into a CPU in batches, loading efficiency is improved, the graph data are marked one by one to obtain fields of a knowledge graph, required information fields are extracted from the fields, a data dimension inner layer model and a data dimension outer layer model are established by using the information fields, accuracy of knowledge mining results can be improved, the process of analyzing the complex graph data and extracting useful knowledge information is effectively shortened, efficiency is improved, the data in the models can be stored for a long time after being persisted, and data loss is prevented; when the knowledge mining is carried out on the graph data, a user inputs a query command through a query Web interface of a knowledge graph on an interface, the query command is returned to a knowledge graph database after being analyzed, and information to be queried is fed back to the interface, so that one-click query and millisecond-level response are realized, the difficulty of the knowledge mining on the graph data is reduced as much as possible, and the efficiency of constructing the knowledge graph is greatly improved.
In one embodiment, the identifying the types of graph data in the graph database and loading the different types of graph data into the CPU memory in batches includes:
s101, loading a configuration file of graph data, and identifying the type of the graph data in a graph database;
s102, monitoring the configuration file in real time, and dynamically changing the loading strategy of the graph data along with the modification of the configuration file;
s103, calling APIs corresponding to the graph data of different types, and loading the graph data into the CPU memory according to the loading strategy.
The working principle of the technical scheme is as follows: firstly, loading a graph data configuration file provided by a user, and identifying the graph data type in a graph database; meanwhile, the configuration file is monitored in real time, and once the configuration file is modified, the loading strategy of the graph data is dynamically changed along with the modification of the configuration file; and then, calling an API (Application Programming Interface) corresponding to the graph data according to the graph data of different types in the database, and loading the graph data of different types into the CPU memory according to the loading strategy.
The beneficial effects of the above technical scheme are that: the loaded configuration file can be monitored in real time, once the configuration file is modified, the loading strategy of the graph data is dynamically changed, an Application Programming Interface (API) corresponding to different types of graph data can be adjusted and called at any time, the data can be quickly and effectively processed, and the data is loaded into the CPU memory according to the corresponding loading strategy, so that the speed of loading the graph data can be increased, and the efficiency is further improved.
In one embodiment, the monitoring of the configuration file in real time, and the dynamically changing the loading policy of the graph data as the configuration file is modified, includes:
s1021, dynamically monitoring the configuration file, and dynamically changing the loading strategy of the graph data when the configuration file is modified;
s1022, reloading the lost graph data from the data source file by using addition monitoring.
The working principle of the technical scheme is as follows: and dynamically monitoring the configuration file, dynamically changing the loading strategy of the graph data once the configuration file is modified, simultaneously monitoring whether the graph data is lost or not in real time by using addition monitoring, and once the graph data is lost, loading the lost graph data from the data source file again.
The beneficial effects of the above technical scheme are as follows: the dynamic monitoring of the configuration files can improve the efficiency of data loading, save the time of data loading, provide convenience for users, prevent bad results caused by data loss by utilizing the additional monitoring, and ensure the integrity of the finally constructed knowledge graph.
In one embodiment, the calling an API corresponding to the graph data of the different types and loading the graph data into the CPU memory according to the loading policy includes:
s1031, importing the graph data into a kafka theme by using flash;
s1032, calling APIs corresponding to the graph data of different types, and loading the graph data into the CPU memory from the kafka theme according to the loading strategy.
The working principle of the technical scheme is as follows: firstly, the graph data is imported into a kafka (distributed publish-subscribe message system) theme by using a flash (log collection system), then an API corresponding to the graph data of different types is called, and the graph data is loaded into the CPU memory from the kafka theme according to the loading strategy.
The beneficial effects of the above technical scheme are as follows: the flash has high availability and high reliability, ensures the consistency of data during transmission and reception, reduces the probability of errors during data import, ensures the stability of the data by kafka, and is a high-throughput distributed publish-subscribe messaging system which can process all action stream data of a consumer in a website, can support millions of messages per second, ensures the storage quantity and stability of image data, and improves the data processing efficiency.
In one embodiment, identifying and marking result attribute associated fields of the graph data loaded into the CPU memory one by one, obtaining fields of a knowledge graph, and extracting information fields, includes:
s201, judging whether the designated attribute associated field exists in the graph data;
s202, analyzing, segmenting, extracting and filtering all the attribute associated fields or the specified attribute associated fields to obtain result attribute associated fields;
s203, identifying the graph data one by one and marking the result attribute association fields to obtain fields of a knowledge graph;
and S204, extracting information fields from the fields of the knowledge graph.
The working principle of the technical scheme is as follows: firstly, judging whether the graph data has the attribute associated fields specified by the user, if the judgment result is that the graph data does not have the attribute associated fields specified by the user, analyzing, segmenting, extracting and filtering all the attribute associated fields to obtain result attribute associated fields used for constructing fields of a knowledge graph, and if the judgment result is that the graph data has the attribute associated fields specified by the user, analyzing, segmenting, extracting and filtering only the specified attribute associated fields to obtain the result attribute associated fields used for constructing fields of the knowledge graph; then, identifying the graph data one by one and marking the result attribute associated fields to obtain the fields of the knowledge graph; finally, information fields are extracted from the fields of the knowledge-graph.
The beneficial effects of the above technical scheme are that: if the attribute associated fields specified by the user exist in the graph database, all the attribute associated fields do not need to be subjected to data processing, the data processing time is reduced, the efficiency of obtaining the result attribute associated fields is improved, then the graph data are marked one by one to obtain the fields of the knowledge graph, and the situation that the graph data are not completely marked to cause low accuracy is prevented.
In one embodiment, the information field is used for establishing a data dimension inner model according to the proportion, and the data dimension outer model is established based on the data dimension inner model, and the method comprises the following steps:
s301, establishing a plurality of data vectors according to proportion by using the information fields extracted from the fields of the knowledge graph, wherein the information fields corresponding to the unique fields are used as a central starting point, a plurality of information fields not corresponding to the unique fields are used as end points, and a full data set formed by the data vectors is used as a data dimension inner-layer model;
s302, establishing a plurality of full data vectors by using the full data set in the data dimension inner model as a central starting point and using statistical results of different strategies aiming at the full data set as an end point, wherein a data set formed by the full data vectors is used as a data dimension outer model.
The working principle of the technical scheme is as follows: firstly, establishing a plurality of data vectors according to the proportion of the information fields which are not uniquely corresponding and designated by a user by using the information fields extracted from the fields of the knowledge graph and taking the information fields which are uniquely corresponding as a central starting point and a plurality of information fields which are not uniquely corresponding as an end point, and if the proportion of the information fields which are not uniquely corresponding and designated by the user is not designated by the user, averagely distributing the proportion to each information field which is not uniquely corresponding, establishing a plurality of data vectors, and taking a full data set which is formed by the plurality of data vectors as a data dimension inner layer model, wherein each information field which is uniquely corresponding associates a plurality of information fields which are uniquely corresponding to other information fields in a two-layer relationship with the information field; and then, in combination with a statistical strategy appointed by a user, establishing a plurality of full data vectors by using the full data set in the data dimension inner layer model as a central starting point and using statistical results of different strategies aiming at the full data set as an end point, wherein a data set formed by the full data vectors is used as a data dimension outer layer model, and if the user does not appoint the statistical strategy, an omnibearing statistical strategy is automatically generated according to the type of the information field to establish the data dimension outer layer model.
The beneficial effects of the above technical scheme are that: the accuracy of knowledge mining results can be guaranteed, each uniquely corresponding information field is associated with a plurality of pieces of information of other uniquely corresponding information fields within the two-layer relationship, and the information of the plurality of other uniquely corresponding information fields associated with one uniquely corresponding information field can be quickly located, so that a user can easily find out concerned social relationships, and the efficiency of constructing the knowledge graph is improved.
In one embodiment, the data in the model is persisted according to the data dimension inner layer model and the data dimension outer layer model respectively to obtain a knowledge graph database, and the method comprises the following steps:
s401, persisting the data in the data dimension inner layer model and the data dimension outer layer model to obtain final persisted data;
s402, judging whether the loaded configuration file of the graph data designates the database type of the final persistent data or not;
s403, dividing the final persistent data without the specified database type into a hive partition table to obtain a knowledge map database; and calling a corresponding API (application programming interface) for the final persistent data with the specified database type, and dividing the final persistent data into the specified database to obtain a knowledge map database.
The working principle of the technical scheme is as follows: firstly, data in the data dimension inner layer model and the data dimension outer layer model are persisted to obtain final persisted data; then judging whether the loaded configuration file of the user graph data designates the database type of the final persistent data or not, if the user does not designate the database type, dividing the final persistent data into a hive (index data warehouse analysis system) partition table to obtain a knowledge graph database; if the user specifies the database type, the final persistent data is divided into the database specified by the user by calling a corresponding Application Programming Interface (API), so as to obtain a knowledge graph database.
The beneficial effects of the above technical scheme are that: the data in the model is persisted to the corresponding database, the data can be stored for a long time, the data can be prevented from being lost, the integrity and the accuracy of the knowledge graph are kept to a certain extent, and people can better and more efficiently inquire the related information.
In one embodiment, in the knowledge map database, different partitions and tables are respectively established for the data dimension inner layer model and the data dimension outer layer model; index fields are created for partitions and tables using solr or es.
The working principle of the technical scheme is as follows: in the knowledge graph database, different partitions and tables are respectively established for the data dimension inner layer model and the data dimension outer layer model, and then index fields are created for the partitions and the tables by using solr (enterprise-level search application server) or es (referred to as elastic search server).
The beneficial effects of the above technical scheme are as follows: different partitions and tables are respectively established for the two models, and then index fields are established for the partitions and the tables, so that a user can conveniently and quickly associate the index fields when inquiring related information, quickly respond to an inquiry command of the user, and ensure the accuracy and the integrity of an inquiry result; and real-time searching can be achieved by using solr or es, and the method has the advantages of stability, reliability, rapidness and convenience in installation and use.
In one embodiment, the establishing a knowledge query Web interface, parsing the query command, and returning to the knowledge graph database to retrieve result data includes:
s501, constructing Web to provide a data query interface for a user;
s502, analyzing and optimizing SQL query language provided by a user, returning to a knowledge graph database to extract a query result, and calling result data.
The working principle of the technical scheme is as follows: a knowledge Query Web interface is constructed to provide a data Query interface for a user, the user inputs a Query command on the Query interface of a client, and after SQL (Structured Query Language) Query Language provided by the user is analyzed and optimized, the Query command is returned to a knowledge graph database to extract a Query result, result data is called, and then a statistical graph constructed by a statistical strategy and specific statistical table information are displayed on a visual interface.
The beneficial effects of the above technical scheme are as follows: SQL has great flexibility and powerful data query function, is convenient for analyzing query commands input by users, quickly calls corresponding result data in a knowledge graph database, and more efficiently displays statistical graphs and specific statistical table information constructed by statistical strategies on a visual interface for users.
As shown in FIG. 2, the present invention provides a system for constructing a knowledge-graph based on a graph database, comprising:
the graph data connection module is used for identifying the graph data types in the graph database and loading the graph data of different types into a CPU memory in batches;
the information marking module is used for identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, acquiring fields of a knowledge graph and extracting information fields;
the data magic cube construction module is used for establishing a data dimension inner layer model for the information field according to proportion and establishing a data dimension outer layer model based on the data dimension inner layer model;
the data magic cube persistence module is used for respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database;
and the knowledge map query module is used for establishing a knowledge query web interface, analyzing a query command and returning to the knowledge map database to retrieve result data.
The working principle of the technical scheme is as follows: the map data connection module loads different types of map data in a map database into a CPU memory in batches by adopting corresponding connectors according to the type of the map data specified by a user for subsequent processing, identifies the map data loaded into the CPU memory one by one through an information marking module and marks result attribute associated fields to obtain fields of a knowledge map, and extracts information fields for constructing a data magic cube; then, a data magic cube construction module is used for establishing a data dimension inner layer model for the information field according to proportion, and all possible result data are subjected to data dimension outer layer model establishment according to rules specified by a user on the basis of the data dimension inner layer model; and respectively persisting the data in the model into a database through a data magic cube persistence module according to the data dimension inner layer model and the data dimension outer layer model, wherein the database can be an unstructured database or a structured database, and then obtaining a knowledge graph database. And a path of the configuration file is input by a user, a knowledge graph query Web interface is established by adopting a knowledge graph query module, a query interface is provided for the user, the user inputs a command on the query interface, the command is returned to a knowledge graph database after being analyzed, and then a statistical graph and specific statistical table information constructed by a statistical strategy are displayed on the interface.
The beneficial effects of the above technical scheme are that: the loading efficiency is improved by identifying the type of the graph data in the graph database and loading the graph data into the CPU in batches, the graph data are marked one by one to obtain the fields of the knowledge graph, the required information fields are extracted from the fields, and a data dimension inner layer model and a data dimension outer layer model are established by using the information fields, so that the accuracy of a knowledge mining result can be improved, the process of analyzing the complex graph data and extracting useful knowledge information is effectively shortened, the efficiency is improved, the data in the model can be stored for a long time after being duralized, and the data loss is prevented; when the knowledge mining is carried out on the graph data, a user inputs a query command through a query Web interface of a knowledge graph on an interface, the query command is returned to a knowledge graph database after being analyzed, and information to be queried is fed back to the interface, so that one-click query and millisecond-level response are realized, the difficulty of the knowledge mining on the graph data is reduced as much as possible, and the efficiency of constructing the knowledge graph is greatly improved.
While embodiments of the invention have been described above, it is not intended to be limited to the details shown, described and illustrated herein, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed, and to such extent that such modifications are readily available to those skilled in the art, and it is not intended to be limited to the details shown and described herein without departing from the general concept as defined by the appended claims and their equivalents.

Claims (9)

1. A method for constructing a knowledge graph based on a graph database, the method comprising:
identifying the type of graph data in a graph database, and loading different types of graph data into a CPU memory in batches;
identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, obtaining fields of a knowledge graph, and extracting information fields;
establishing a data dimension inner layer model for the information field according to the proportion, and establishing a data dimension outer layer model based on the data dimension inner layer model;
respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database;
establishing a knowledge query Web interface, analyzing a query command, and returning to a knowledge map database to retrieve result data;
establishing a data dimension inner model for the information fields according to the proportion, and establishing a data dimension outer model based on the data dimension inner model, wherein the data dimension inner model comprises the following steps:
establishing a plurality of data vectors according to proportion by using the information fields extracted from the fields of the knowledge graph, taking the information fields which are uniquely corresponding as a central starting point and a plurality of information fields which are not uniquely corresponding as end points, and taking a full data set consisting of the data vectors as a data dimension inner layer model;
and establishing a plurality of full data vectors by using the full data set in the data dimension inner model as a central starting point and using the statistical results of different strategies aiming at the full data set as an end point, wherein the data set formed by the full data vectors is used as a data dimension outer model.
2. The method of claim 1, wherein identifying the type of graph data in a graph database and loading different types of graph data in batches into a CPU memory comprises:
loading a configuration file of graph data, and identifying the type of the graph data in a graph database;
monitoring the configuration file in real time, and dynamically changing the loading strategy of the graph data along with the modification of the configuration file;
and calling APIs corresponding to the different types of the graph data, and loading the graph data into the CPU memory according to the loading strategy.
3. A method for constructing a knowledge graph based on a graph database according to claim 2, wherein the configuration file is monitored in real time, and the loading strategy of the graph data is dynamically changed along with the modification of the configuration file, comprising:
dynamically monitoring the configuration file, and dynamically changing the loading strategy of the graph data when the configuration file is modified;
and reloading the lost graph data from the data source file by using the addition monitoring.
4. The method according to claim 2, wherein said calling APIs corresponding to different types of said graph data and loading said graph data into said CPU memory according to said loading policy comprises:
importing the graph data into a kafka theme using flume;
and calling APIs corresponding to the graph data of different types, and loading the graph data from the kafka theme to the CPU memory according to the loading strategy.
5. The method according to claim 1, wherein identifying and marking result attribute associated fields of the graph data loaded into the CPU memory one by one, obtaining fields of a knowledge graph, and extracting information fields comprises:
judging whether the graph data has the specified attribute association field or not;
analyzing, segmenting, extracting and filtering all the attribute associated fields or the specified attribute associated fields to obtain result attribute associated fields;
identifying the graph data one by one and marking the result attribute association fields to obtain fields of a knowledge graph;
extracting information fields from fields of the knowledge-graph.
6. The method according to claim 1, wherein obtaining a knowledge graph database by persisting data in a model according to the data dimension inner model and the data dimension outer model comprises:
data in the data dimension inner layer model and the data dimension outer layer model are persisted to obtain final persisted data;
judging whether the configuration file of the loaded graph data designates the database type of the final persistent data or not;
dividing the final persistent data without the appointed database type into a hive partition table to obtain a knowledge graph database; and calling a corresponding API (application programming interface) for the final persistent data with the specified database type, and dividing the final persistent data into the specified database to obtain a knowledge map database.
7. A knowledge graph construction method based on a graph database according to claim 6, characterized in that in the knowledge graph database, different partitions and tables are respectively established for the data dimension inner layer model and the data dimension outer layer model; index fields are created for partitions and tables using solr or es.
8. The method for constructing a knowledge graph based on a graph database according to claim 1, wherein the steps of establishing a knowledge query Web interface, analyzing a query command, returning to a knowledge graph database and retrieving result data comprise:
constructing a Web to provide a data query interface for a user;
analyzing and optimizing SQL query language provided by the user, returning to the knowledge map database to extract a query result, and calling result data.
9. A system for knowledge graph construction based on graph databases, the system comprising:
the graph data connection module is used for identifying the graph data types in the graph database and loading the graph data of different types into a CPU memory in batches;
the information labeling module is used for identifying the graph data loaded into the CPU memory one by one, marking result attribute associated fields, acquiring fields of a knowledge graph, and extracting information fields;
the data magic cube construction module is used for establishing a data dimension inner layer model for the information field according to proportion and establishing a data dimension outer layer model based on the data dimension inner layer model;
the data magic cube persistence module is used for respectively persisting data in the model according to the data dimension inner layer model and the data dimension outer layer model to obtain a knowledge map database;
the knowledge map query module is used for establishing a knowledge query web interface, analyzing a query command and returning to a knowledge map database to retrieve result data;
the data magic cube building module comprises:
establishing a plurality of data vectors according to proportion by using the information fields extracted from the fields of the knowledge graph, taking the information fields which are uniquely corresponding as a central starting point and a plurality of information fields which are not uniquely corresponding as end points, and taking a full data set consisting of the data vectors as a data dimension inner layer model;
and establishing a plurality of full data vectors by using the full data set in the data dimension inner model as a central starting point and using the statistical results of different strategies aiming at the full data set as end points, wherein the data set formed by the full data vectors is used as a data dimension outer model.
CN202010999621.0A 2020-09-22 2020-09-22 Knowledge graph construction system and method based on graph database Active CN112182238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010999621.0A CN112182238B (en) 2020-09-22 2020-09-22 Knowledge graph construction system and method based on graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010999621.0A CN112182238B (en) 2020-09-22 2020-09-22 Knowledge graph construction system and method based on graph database

Publications (2)

Publication Number Publication Date
CN112182238A CN112182238A (en) 2021-01-05
CN112182238B true CN112182238B (en) 2022-12-27

Family

ID=73956124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010999621.0A Active CN112182238B (en) 2020-09-22 2020-09-22 Knowledge graph construction system and method based on graph database

Country Status (1)

Country Link
CN (1) CN112182238B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312410B (en) * 2021-06-10 2023-11-21 平安证券股份有限公司 Data map construction method, data query method and terminal equipment
CN113407578A (en) * 2021-07-12 2021-09-17 上海数慧系统技术有限公司 Data processing method and device
CN113722549B (en) * 2021-09-03 2022-06-21 优维科技(深圳)有限公司 Data state fusion storage system and method based on graph
CN113918733B (en) * 2021-12-16 2022-03-04 中科雨辰科技有限公司 Data processing system for acquiring target knowledge graph
CN116069982A (en) * 2023-02-15 2023-05-05 北京欧拉认知智能科技有限公司 Graph-based master data management method, system, computing device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345647A (en) * 2018-01-18 2018-07-31 北京邮电大学 Domain knowledge map construction system and method based on Web
CN108549731A (en) * 2018-07-11 2018-09-18 中国电子科技集团公司第二十八研究所 A kind of knowledge mapping construction method based on ontology model
CN109670089A (en) * 2018-12-29 2019-04-23 颖投信息科技(上海)有限公司 Knowledge mapping system and its figure server

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345647A (en) * 2018-01-18 2018-07-31 北京邮电大学 Domain knowledge map construction system and method based on Web
CN108549731A (en) * 2018-07-11 2018-09-18 中国电子科技集团公司第二十八研究所 A kind of knowledge mapping construction method based on ontology model
CN109670089A (en) * 2018-12-29 2019-04-23 颖投信息科技(上海)有限公司 Knowledge mapping system and its figure server

Also Published As

Publication number Publication date
CN112182238A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112182238B (en) Knowledge graph construction system and method based on graph database
US11269834B2 (en) Detecting quasi-identifiers in datasets
Davis Jr et al. Inferring the location of twitter messages based on user relationships
CN102171689B (en) Method and system for providing search results
US20080109419A1 (en) Computer apparatus, computer program and method, for calculating importance of electronic document on computer network, based on comments on electronic document included in another electronic document associated with former electronic document
JP6932360B2 (en) Object search method, device and server
TW201333730A (en) Web page search method and apparatus
US11573961B2 (en) Delta graph traversing system
CN103455335A (en) Multilevel classification Web implementation method
CN105608113A (en) Method and apparatus for judging POI data in text
CN114385620A (en) Data processing method, device, equipment and readable storage medium
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
CN115438087A (en) Data query method and device based on cache library, storage medium and equipment
CN113312539B (en) Method, device, equipment and medium for providing search service
CN107679097B (en) Distributed data processing method, system and storage medium
CN114218211A (en) Data processing system, method, computer device and readable storage medium
CN112836124A (en) Image data acquisition method and device, electronic equipment and storage medium
CN111125332B (en) Method, device, equipment and storage medium for calculating TF-IDF value of word
JP7213890B2 (en) Accelerated large-scale similarity computation
CN112650791B (en) Method, device, computer equipment and storage medium for processing field
CN114385821A (en) Resource retrieval method and device, storage medium and electronic equipment
Wang et al. Geo-ontology design and its logic reasoning
CN111324800A (en) Business item display method and device and computer readable storage medium
CN111143582A (en) Multimedia resource recommendation method and device for updating associative words in real time through double indexes
US9189528B1 (en) Searching and tagging media storage with a knowledge database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant