LU503512B1 - Operating method for construction of knowledge graph based on naming rule and caching mechanism - Google Patents
Operating method for construction of knowledge graph based on naming rule and caching mechanism Download PDFInfo
- Publication number
- LU503512B1 LU503512B1 LU503512A LU503512A LU503512B1 LU 503512 B1 LU503512 B1 LU 503512B1 LU 503512 A LU503512 A LU 503512A LU 503512 A LU503512 A LU 503512A LU 503512 B1 LU503512 B1 LU 503512B1
- Authority
- LU
- Luxembourg
- Prior art keywords
- data
- graph
- cached
- knowledge graph
- caching
- Prior art date
Links
- 238000010276 construction Methods 0.000 title claims abstract description 46
- 230000007246 mechanism Effects 0.000 title claims abstract description 22
- 238000011017 operating method Methods 0.000 title claims abstract description 18
- 230000001915 proofreading effect Effects 0.000 claims abstract description 51
- 238000007726 management method Methods 0.000 claims abstract description 43
- 238000013523 data management Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000011161 development Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 238000013500 data storage Methods 0.000 claims description 26
- 238000005516 engineering process Methods 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 11
- 230000003252 repetitive effect Effects 0.000 claims description 10
- 238000005065 mining Methods 0.000 claims description 9
- 238000009825 accumulation Methods 0.000 claims description 8
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 230000008676 import Effects 0.000 claims description 4
- 238000013439 planning Methods 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 abstract description 6
- 230000002708 enhancing effect Effects 0.000 abstract description 3
- 230000001737 promoting effect Effects 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: data collection--designing of knowledge graph schema--making a caching strategy--making a naming rule--development of a graph data management module--extraction of entity and relation data--data caching, updating and proofreading--graph generation and secondary proofreading--backup and management of cached data. In the present disclosure, caching mechanism can be used for promoting knowledge graph construction and generation efficiency under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, and for realizing knowledge graph data rollback function; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, and meeting requirements of life-circle management of graph construction and utilization.
Description
Specification ie) 35.1 2
OPERATING METHOD FOR CONSTRUCTION OF
KNOWLEDGE GRAPH BASED ON NAMING RULE AND
CACHING MECHANISM
The present disclosure relates to an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which belongs to the technical field of knowledge graph construction, as well as data storage, management and utilization, and particularly relates to knowledge graph construction, updating and rollback and knowledge graph data management that are based on a naming rule and a caching mechanism.
Following the continuous development of computer technology, information communication technology and Internet technology, electronic data has grown explosively, which has promoted the development of a series of fields and related technologies, such as big data and artificial intelligence, so the technology and ability of mining and analyzing valid information from mass data are becoming more and more important. Represented by machine learning and deep learning, technologies related to mining and analyzing of big data have made numerous achievements, however, large resource consumption, poor interpretability of some analytical process and other problems caused by high proportion of repetitive working and frequent processing of mass data still exist in the mining and analyzing of mass data. To solve the afore-mentioned problems, new opportunities and development are needed for theories and technologies related to knowledge graph.
By virtue of knowledge graph, mass data and knowledge in different fields can be represented via data mining and analyzing, information processing, data fusion, knowledge extraction and representation, knowledge fusion and inference, and graph drawing, so that dynamic development law of knowledge field can be revealed in a more simple and intuitive way, and higher-level data analyzing and mining based on knowledge can be supported, thereby providing practical and 1
Specification — 35 1 valuable reference, data and technical support for discipline research.
Knowledge graph construction is a process of continuous iteration and improvement. As human experience and data volume increase, the scale of knowledge graph will be larger and larger, and the complexity of entity and relation network will increase by multiple, correspondingly, data updating and verification, and problem detection will also be more and more difficult. Moreover, because of the optimization strategy for the technical solutions and hardware conditions of most existing knowledge graph databases, a small amount of data of large-scale knowledge graph with multiple frequencies updates more slowly than bulk data with low frequencies; besides, many knowledge graph databases do not have the process logging and rollback functions of traditional relation databases, as a result, once mistakes are made, it is difficult to trace problems and data, which causes great difficulty in knowledge graph data updating and management.
In order to ensure the availability, timeliness, accuracy and stability of computer data, caching is used in many scenarios such as computer storage and web browsers. By the lights of caching, as well as the middle tier data designed for the mining, analyzing and calculation of large-scale data, a transition tier is provided between analyzing and processing of mass data and human experience, and knowledge graph construction and management by means of rational and normative naming rules, data caching strategies and data backup strategies, so that the automaticity of knowledge graph construction and the detail level of data proofreading can be promoted, knowledge graph can be more easily constructed and utilized, rapid splitting, fusion and backup of data in the knowledge graph database can be supported, and data rollback, problem tracing and other requirements during the knowledge graph construction and management can be satisfied, as a result of which the whole process of knowledge graph construction and utilization can be effectively managed, thereby realizing better researches and application of technologies related to knowledge graph.
To solve the above-mentioned technical problem, the present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, wherein the operating method of knowledge graph 2
Specification — 35 1 construction based on a naming rule and a caching mechanism develops a graph data management module and a log management module which integrate multiple functions by means of making more informative knowledge graph schema, rational naming conventions and detailed data caching strategies, and provides a caching tier between the knowledge graph database and the graph construction data, thereby realizing rapid construction, whole-process management, data proofreading, problem tracing, and rollback and so on of knowledge graphs.
The present disclosure 1s realized in a way as follows.
The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; ©) establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; 3
Specification — 803512 (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy.
The Step (D) includes: (1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.
The Step @ includes: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
The Step © includes: 4
Specification — 803512 (3.1) determining a range, a storage location and a storage mode of cached data; (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy; (3.3) defining naming rules for entity and associated relation data storage folders; and (3.4) defining naming rules of corresponding cached data of the entity and associated relation data.
In the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files.
The Step @ includes: (4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; (4.2) completing development and test of a cached data file management and log system updating and management module; and (4.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.
In the Step (4.1), repetitively or similarly named data is found by reading and
Specification — 35 1 recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
The Step © includes: (5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and (5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.
The Step © includes: (6.1) according to the caching strategy, caching and accumulating the entity attribute data and associated relation attribute data extracted in Step (5), and recording amount and volume of cached data in real time; and (6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; wherein: the constraints comprise constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; 6
Specification —_—mmmm-————— ee 03512 the proofreading of cached data is realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.
In the Step @), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.
The beneficial effects of the present disclosure are as follows: the caching mechanism can be used for promoting knowledge graph construction and generation speed and accuracy under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, for realizing knowledge graph data rollback function, and for facilitating data backup, encryption and transmission; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, meeting requirements of life-circle management of graph construction and utilization, and providing effective data and technical supports for subsequent application and researches of knowledge graphs. 7
Specification — 803512
Fig. 1 is a structure diagram of the present disclosure.
The technical solution of present disclosure will be further described below, but the scope of protection is not limited thereto.
As shown in Fig. 1, an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which includes steps as follows. (D Data collection: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers, specifically comprising: (1.1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to data formats. (2 Designing of knowledge graph schema: designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data, specifically comprising: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and 8
Specification — 803512 associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
Further, remarks need to be written for any information having impacts on construction, utilization, management and expansion of knowledge graphs, so as to ensure smooth work and handover process. @ Making a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached, specifically comprising: (3.1) determining a range, a storage location and a storage mode of cached data;
Preferably, the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage. (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
Preferably, the data can be backed up locally or on a server. @ Making cached folder and cached file naming rules, specifically comprising: (4.1) defining naming rules for entity and associated relation data storage folders; and (4.2) defining naming rules of corresponding cached data of the entity and 9
Specification — 803512 associated relation data.
Preferably, the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name is not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files. ® Development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module, specifically comprising: (5.1) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;
Preferably, repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated, and input parameters of the module are designed to be concise and clear; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file. (5.2) completing a cached data file management and log system updating and management module; and
Preferably, the cached data file management comprises management of
Specification —_— 803512 creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
(5.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.
@ Extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning,
specifically comprising:
(6.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data; and
(6.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data.
@ Data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data,
specifically comprising:
(7.1) according to the caching strategy, caching and accumulating the entity and relation data extracted in Step (©), and recording amount and volume of cached data in real time; and
(7.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems;
further, the constraints of cached data include constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data is realized via data caching strategies, naming rules
11
Specification — 803512 and the graph data management module, contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly. graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result, specifically comprising: (8.1) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single piece of or bulk cached data into a knowledge graph database; and
Further, the graph data management module can perform automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files, thereby enhancing speed of graph generation. (8.2) performing a secondary proofreading of the knowledge graph data by means of manual or automatic script, the contents of proofreading including validity and correctness of data.
Further, rationality, validity and correctness of data in the knowledge graph after data updating are judged by means of manual or automatic script, and it is determined according to the judgment whether to proceed to the subsequent step, whether data adjustment is needed, and whether data rollback is needed.
There is further a step ©, i.e. cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked, specifically comprising: (9.1) performing backup for data in the cached data satisfying backup 12
Specification — 803512 requirements according to the backup strategy; (9.2) managing backup data of the cached data.
The present disclosure is a graph construction, updating and rollback method based on a naming rule and a caching mechanism, including steps of: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers; designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached; making cached folder and cached file naming rules; completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result; completing backup and management of cached data according to the cached data backup strategy.
As stated above, the present disclosure is carried out in a way as follows: 1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data, and forming multimodal data; 2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to 13
Specification — 803512 data formats;
3) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment, and defining knowledge graph remarks, comprising: other tools used, data and existing problems;
4) determining a range, a storage location and a storage mode of cached data;
5) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
6) defining naming rules for entity and associated relation data storage folders and cached data;
7) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; and completing a cached data file management and log system updating and management module;
8) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules;
9) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data;
10) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data;
14
Specification — 35 1 11) according to the caching strategy, caching and accumulating the extracted entity and relation data, and recording amount and volume of cached data in real time; and proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; 12) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single or bulk cached data into a knowledge graph database; and performing a secondary proofreading of the knowledge graph data in the form of manual or automatic script operation, the contents of proofreading including validity and correctness of data; and 13) performing backup for data in the cached data satisfying backup requirements according to the backup strategy, and managing backup data of the cached data.
To sum up, the present disclosure is a knowledge graph construction and management system, wherein: all changes in the process of knowledge graph construction and management can be recorded and saved, and addition, updating and rollback of knowledge graph data can be realized via cached data and history record; the system is applicable to the realization of the life-circle management system of knowledge graphs, establishment of high-quality knowledge graphs, as well as effective backup and management of automatic knowledge graph construction system and knowledge graph data at large and small scales.
Claims (10)
1. An operating method of knowledge graph construction based on a naming rule and a caching mechanism, comprising steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; (2 establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy. 1
Claims 03512
2. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step D comprises:
(1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and
(1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.
3. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form;
(2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data;
(2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and
(2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
4. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(3.1) determining a range, a storage location and a storage mode of cached data; 2
Claims — 5) 35 1
(3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
(3.3) defining naming rules for entity and associated relation data storage folders; and
(3.4) defining naming rules of corresponding cached data of the entity and associated relation data.
5. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 4, wherein: in the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it 1s ensured that naming thereof meets the naming requirements of system files.
6. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;
(4.2) completing development and test of a cached data file management and log system updating and management module; and
(4.3) completing tests and optimization of stability, availability, timeliness, 3
Claims —— e352 and accuracy of each of the above-mentioned modules.
7. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 6, wherein: in the Step (4.1), repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
8. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:
(5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and
(5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.
9. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:
(6.1) according to the caching strategy, caching and accumulating the entity 4
Claims —————— 803512 attribute data and associated relation attribute data extracted in Step ©, and recording amount and volume of cached data in real time; and
(6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; the constraints including constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data being realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading including proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems including repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.
10. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein: in the Step (@), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110764250.2A CN113434610A (en) | 2021-07-06 | 2021-07-06 | Operation method of knowledge graph structure based on naming rule and cache mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
LU503512B1 true LU503512B1 (en) | 2023-06-19 |
Family
ID=77759307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
LU503512A LU503512B1 (en) | 2021-07-06 | 2021-12-31 | Operating method for construction of knowledge graph based on naming rule and caching mechanism |
Country Status (3)
Country | Link |
---|---|
CN (2) | CN113434610A (en) |
LU (1) | LU503512B1 (en) |
WO (1) | WO2023279684A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113434610A (en) * | 2021-07-06 | 2021-09-24 | 中电科大数据研究院有限公司 | Operation method of knowledge graph structure based on naming rule and cache mechanism |
CN115309789B (en) * | 2022-10-11 | 2023-01-03 | 浩鲸云计算科技股份有限公司 | Method for generating associated data graph in real time based on intelligent dynamic business object |
CN116028648B (en) * | 2023-02-15 | 2023-06-09 | 熙牛医疗科技(浙江)有限公司 | Medical text structured information extraction method universal for fine-grained scenes |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10740557B1 (en) * | 2017-02-14 | 2020-08-11 | Casepoint LLC | Technology platform for data discovery |
CN109255031B (en) * | 2018-09-20 | 2022-02-11 | 苏州友教习亦教育科技有限公司 | Data processing method based on knowledge graph |
CN110990585B (en) * | 2019-11-29 | 2024-01-30 | 上海勘察设计研究院(集团)股份有限公司 | Multi-source data and time sequence processing method and device for building industry knowledge graph |
CN111428048A (en) * | 2020-03-20 | 2020-07-17 | 厦门渊亭信息科技有限公司 | Cross-domain knowledge graph construction method and device based on artificial intelligence |
CN113434610A (en) * | 2021-07-06 | 2021-09-24 | 中电科大数据研究院有限公司 | Operation method of knowledge graph structure based on naming rule and cache mechanism |
-
2021
- 2021-07-06 CN CN202110764250.2A patent/CN113434610A/en not_active Withdrawn
- 2021-11-18 CN CN202111369404.4A patent/CN113918663A/en active Pending
- 2021-12-31 LU LU503512A patent/LU503512B1/en active IP Right Grant
- 2021-12-31 WO PCT/CN2021/143464 patent/WO2023279684A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
CN113918663A (en) | 2022-01-11 |
CN113434610A (en) | 2021-09-24 |
WO2023279684A1 (en) | 2023-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
LU503512B1 (en) | Operating method for construction of knowledge graph based on naming rule and caching mechanism | |
US20230122210A1 (en) | Resource dependency system and graphical user interface | |
CN111967761B (en) | Knowledge graph-based monitoring and early warning method and device and electronic equipment | |
EP4195112A1 (en) | Systems and methods for enriching modeling tools and infrastructure with semantics | |
US11269822B2 (en) | Generation of automated data migration model | |
US20230195728A1 (en) | Column lineage and metadata propagation | |
CN112612902A (en) | Knowledge graph construction method and device for power grid main device | |
CN111708773A (en) | Multi-source scientific and creative resource data fusion method | |
CN110990585B (en) | Multi-source data and time sequence processing method and device for building industry knowledge graph | |
CN111078780A (en) | AI optimization data management method | |
US10353877B2 (en) | Construction and application of data cleaning templates | |
Khattak et al. | Ontology Evolution and Challenges. | |
CN112966162A (en) | Scientific and technological resource integration method and device based on data warehouse and middleware | |
CN113190687A (en) | Knowledge graph determining method and device, computer equipment and storage medium | |
CN115328894A (en) | Data processing method based on data blood margin | |
CN116991931A (en) | Metadata management method and system | |
US12001423B2 (en) | Method and electronic device for obtaining hierarchical data structure and processing log entries | |
CN113254725A (en) | Data management and retrieval enhancement method for graph database | |
CN116578612A (en) | Lithium battery finished product detection data asset construction method | |
CN110569061A (en) | Automatic construction system of software engineering knowledge base based on big data | |
JP2017010376A (en) | Mart-less verification support system and mart-less verification support method | |
CN114692595B (en) | Repeated conflict scheme detection method based on text matching | |
US11250010B2 (en) | Data access generation providing enhanced search models | |
CN117573934A (en) | Intelligent data interaction method and device for optical fiber transmission knowledge management system | |
CN117668229A (en) | Meta model automatic acquisition and classification management method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FG | Patent granted |
Effective date: 20230619 |