LU503512B1 - Operating method for construction of knowledge graph based on naming rule and caching mechanism - Google Patents

Operating method for construction of knowledge graph based on naming rule and caching mechanism Download PDF

Info

Publication number
LU503512B1
LU503512B1 LU503512A LU503512A LU503512B1 LU 503512 B1 LU503512 B1 LU 503512B1 LU 503512 A LU503512 A LU 503512A LU 503512 A LU503512 A LU 503512A LU 503512 B1 LU503512 B1 LU 503512B1
Authority
LU
Luxembourg
Prior art keywords
data
graph
cached
knowledge graph
caching
Prior art date
Application number
LU503512A
Other languages
French (fr)
Inventor
Bing Chang
Longjun Zhao
Zhihai Chu
Xiang Li
Xueqiang Ren
Zhongwen Yin
Original Assignee
Cetc Bigdata Res Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cetc Bigdata Res Institute Co Ltd filed Critical Cetc Bigdata Res Institute Co Ltd
Application granted granted Critical
Publication of LU503512B1 publication Critical patent/LU503512B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: data collection--designing of knowledge graph schema--making a caching strategy--making a naming rule--development of a graph data management module--extraction of entity and relation data--data caching, updating and proofreading--graph generation and secondary proofreading--backup and management of cached data. In the present disclosure, caching mechanism can be used for promoting knowledge graph construction and generation efficiency under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, and for realizing knowledge graph data rollback function; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, and meeting requirements of life-circle management of graph construction and utilization.

Description

Specification ie) 35.1 2
OPERATING METHOD FOR CONSTRUCTION OF
KNOWLEDGE GRAPH BASED ON NAMING RULE AND
CACHING MECHANISM
Technical Field
The present disclosure relates to an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which belongs to the technical field of knowledge graph construction, as well as data storage, management and utilization, and particularly relates to knowledge graph construction, updating and rollback and knowledge graph data management that are based on a naming rule and a caching mechanism.
Background Art
Following the continuous development of computer technology, information communication technology and Internet technology, electronic data has grown explosively, which has promoted the development of a series of fields and related technologies, such as big data and artificial intelligence, so the technology and ability of mining and analyzing valid information from mass data are becoming more and more important. Represented by machine learning and deep learning, technologies related to mining and analyzing of big data have made numerous achievements, however, large resource consumption, poor interpretability of some analytical process and other problems caused by high proportion of repetitive working and frequent processing of mass data still exist in the mining and analyzing of mass data. To solve the afore-mentioned problems, new opportunities and development are needed for theories and technologies related to knowledge graph.
By virtue of knowledge graph, mass data and knowledge in different fields can be represented via data mining and analyzing, information processing, data fusion, knowledge extraction and representation, knowledge fusion and inference, and graph drawing, so that dynamic development law of knowledge field can be revealed in a more simple and intuitive way, and higher-level data analyzing and mining based on knowledge can be supported, thereby providing practical and 1
Specification — 35 1 valuable reference, data and technical support for discipline research.
Knowledge graph construction is a process of continuous iteration and improvement. As human experience and data volume increase, the scale of knowledge graph will be larger and larger, and the complexity of entity and relation network will increase by multiple, correspondingly, data updating and verification, and problem detection will also be more and more difficult. Moreover, because of the optimization strategy for the technical solutions and hardware conditions of most existing knowledge graph databases, a small amount of data of large-scale knowledge graph with multiple frequencies updates more slowly than bulk data with low frequencies; besides, many knowledge graph databases do not have the process logging and rollback functions of traditional relation databases, as a result, once mistakes are made, it is difficult to trace problems and data, which causes great difficulty in knowledge graph data updating and management.
In order to ensure the availability, timeliness, accuracy and stability of computer data, caching is used in many scenarios such as computer storage and web browsers. By the lights of caching, as well as the middle tier data designed for the mining, analyzing and calculation of large-scale data, a transition tier is provided between analyzing and processing of mass data and human experience, and knowledge graph construction and management by means of rational and normative naming rules, data caching strategies and data backup strategies, so that the automaticity of knowledge graph construction and the detail level of data proofreading can be promoted, knowledge graph can be more easily constructed and utilized, rapid splitting, fusion and backup of data in the knowledge graph database can be supported, and data rollback, problem tracing and other requirements during the knowledge graph construction and management can be satisfied, as a result of which the whole process of knowledge graph construction and utilization can be effectively managed, thereby realizing better researches and application of technologies related to knowledge graph.
Summary of the Invention
To solve the above-mentioned technical problem, the present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, wherein the operating method of knowledge graph 2
Specification — 35 1 construction based on a naming rule and a caching mechanism develops a graph data management module and a log management module which integrate multiple functions by means of making more informative knowledge graph schema, rational naming conventions and detailed data caching strategies, and provides a caching tier between the knowledge graph database and the graph construction data, thereby realizing rapid construction, whole-process management, data proofreading, problem tracing, and rollback and so on of knowledge graphs.
The present disclosure 1s realized in a way as follows.
The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; ©) establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; 3
Specification — 803512 (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy.
The Step (D) includes: (1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.
The Step @ includes: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
The Step © includes: 4
Specification — 803512 (3.1) determining a range, a storage location and a storage mode of cached data; (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy; (3.3) defining naming rules for entity and associated relation data storage folders; and (3.4) defining naming rules of corresponding cached data of the entity and associated relation data.
In the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files.
The Step @ includes: (4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; (4.2) completing development and test of a cached data file management and log system updating and management module; and (4.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.
In the Step (4.1), repetitively or similarly named data is found by reading and
Specification — 35 1 recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
The Step © includes: (5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and (5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.
The Step © includes: (6.1) according to the caching strategy, caching and accumulating the entity attribute data and associated relation attribute data extracted in Step (5), and recording amount and volume of cached data in real time; and (6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; wherein: the constraints comprise constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; 6
Specification —_—mmmm-————— ee 03512 the proofreading of cached data is realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.
In the Step @), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.
The beneficial effects of the present disclosure are as follows: the caching mechanism can be used for promoting knowledge graph construction and generation speed and accuracy under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, for realizing knowledge graph data rollback function, and for facilitating data backup, encryption and transmission; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, meeting requirements of life-circle management of graph construction and utilization, and providing effective data and technical supports for subsequent application and researches of knowledge graphs. 7
Specification — 803512
Brief Description of Drawing
Fig. 1 is a structure diagram of the present disclosure.
Embodiments
The technical solution of present disclosure will be further described below, but the scope of protection is not limited thereto.
As shown in Fig. 1, an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which includes steps as follows. (D Data collection: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers, specifically comprising: (1.1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to data formats. (2 Designing of knowledge graph schema: designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data, specifically comprising: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and 8
Specification — 803512 associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
Further, remarks need to be written for any information having impacts on construction, utilization, management and expansion of knowledge graphs, so as to ensure smooth work and handover process. @ Making a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached, specifically comprising: (3.1) determining a range, a storage location and a storage mode of cached data;
Preferably, the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage. (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
Preferably, the data can be backed up locally or on a server. @ Making cached folder and cached file naming rules, specifically comprising: (4.1) defining naming rules for entity and associated relation data storage folders; and (4.2) defining naming rules of corresponding cached data of the entity and 9
Specification — 803512 associated relation data.
Preferably, the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name is not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files. ® Development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module, specifically comprising: (5.1) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;
Preferably, repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated, and input parameters of the module are designed to be concise and clear; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file. (5.2) completing a cached data file management and log system updating and management module; and
Preferably, the cached data file management comprises management of
Specification —_— 803512 creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
(5.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.
@ Extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning,
specifically comprising:
(6.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data; and
(6.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data.
@ Data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data,
specifically comprising:
(7.1) according to the caching strategy, caching and accumulating the entity and relation data extracted in Step (©), and recording amount and volume of cached data in real time; and
(7.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems;
further, the constraints of cached data include constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data is realized via data caching strategies, naming rules
11
Specification — 803512 and the graph data management module, contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly. graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result, specifically comprising: (8.1) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single piece of or bulk cached data into a knowledge graph database; and
Further, the graph data management module can perform automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files, thereby enhancing speed of graph generation. (8.2) performing a secondary proofreading of the knowledge graph data by means of manual or automatic script, the contents of proofreading including validity and correctness of data.
Further, rationality, validity and correctness of data in the knowledge graph after data updating are judged by means of manual or automatic script, and it is determined according to the judgment whether to proceed to the subsequent step, whether data adjustment is needed, and whether data rollback is needed.
There is further a step ©, i.e. cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked, specifically comprising: (9.1) performing backup for data in the cached data satisfying backup 12
Specification — 803512 requirements according to the backup strategy; (9.2) managing backup data of the cached data.
The present disclosure is a graph construction, updating and rollback method based on a naming rule and a caching mechanism, including steps of: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers; designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached; making cached folder and cached file naming rules; completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result; completing backup and management of cached data according to the cached data backup strategy.
Example
As stated above, the present disclosure is carried out in a way as follows: 1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data, and forming multimodal data; 2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to 13
Specification — 803512 data formats;
3) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment, and defining knowledge graph remarks, comprising: other tools used, data and existing problems;
4) determining a range, a storage location and a storage mode of cached data;
5) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
6) defining naming rules for entity and associated relation data storage folders and cached data;
7) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; and completing a cached data file management and log system updating and management module;
8) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules;
9) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data;
10) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data;
14
Specification — 35 1 11) according to the caching strategy, caching and accumulating the extracted entity and relation data, and recording amount and volume of cached data in real time; and proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; 12) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single or bulk cached data into a knowledge graph database; and performing a secondary proofreading of the knowledge graph data in the form of manual or automatic script operation, the contents of proofreading including validity and correctness of data; and 13) performing backup for data in the cached data satisfying backup requirements according to the backup strategy, and managing backup data of the cached data.
To sum up, the present disclosure is a knowledge graph construction and management system, wherein: all changes in the process of knowledge graph construction and management can be recorded and saved, and addition, updating and rollback of knowledge graph data can be realized via cached data and history record; the system is applicable to the realization of the life-circle management system of knowledge graphs, establishment of high-quality knowledge graphs, as well as effective backup and management of automatic knowledge graph construction system and knowledge graph data at large and small scales.

Claims (10)

Claims —————— 803512
1. An operating method of knowledge graph construction based on a naming rule and a caching mechanism, comprising steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; (2 establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy. 1
Claims 03512
2. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step D comprises:
(1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and
(1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.
3. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form;
(2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data;
(2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and
(2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.
4. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(3.1) determining a range, a storage location and a storage mode of cached data; 2
Claims — 5) 35 1
(3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;
(3.3) defining naming rules for entity and associated relation data storage folders; and
(3.4) defining naming rules of corresponding cached data of the entity and associated relation data.
5. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 4, wherein: in the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it 1s ensured that naming thereof meets the naming requirements of system files.
6. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:
(4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;
(4.2) completing development and test of a cached data file management and log system updating and management module; and
(4.3) completing tests and optimization of stability, availability, timeliness, 3
Claims —— e352 and accuracy of each of the above-mentioned modules.
7. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 6, wherein: in the Step (4.1), repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.
8. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:
(5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and
(5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.
9. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:
(6.1) according to the caching strategy, caching and accumulating the entity 4
Claims —————— 803512 attribute data and associated relation attribute data extracted in Step ©, and recording amount and volume of cached data in real time; and
(6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; the constraints including constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data being realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading including proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems including repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.
10. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein: in the Step (@), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.
LU503512A 2021-07-06 2021-12-31 Operating method for construction of knowledge graph based on naming rule and caching mechanism LU503512B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110764250.2A CN113434610A (en) 2021-07-06 2021-07-06 Operation method of knowledge graph structure based on naming rule and cache mechanism

Publications (1)

Publication Number Publication Date
LU503512B1 true LU503512B1 (en) 2023-06-19

Family

ID=77759307

Family Applications (1)

Application Number Title Priority Date Filing Date
LU503512A LU503512B1 (en) 2021-07-06 2021-12-31 Operating method for construction of knowledge graph based on naming rule and caching mechanism

Country Status (3)

Country Link
CN (2) CN113434610A (en)
LU (1) LU503512B1 (en)
WO (1) WO2023279684A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434610A (en) * 2021-07-06 2021-09-24 中电科大数据研究院有限公司 Operation method of knowledge graph structure based on naming rule and cache mechanism
CN115309789B (en) * 2022-10-11 2023-01-03 浩鲸云计算科技股份有限公司 Method for generating associated data graph in real time based on intelligent dynamic business object
CN116028648B (en) * 2023-02-15 2023-06-09 熙牛医疗科技(浙江)有限公司 Medical text structured information extraction method universal for fine-grained scenes

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740557B1 (en) * 2017-02-14 2020-08-11 Casepoint LLC Technology platform for data discovery
CN109255031B (en) * 2018-09-20 2022-02-11 苏州友教习亦教育科技有限公司 Data processing method based on knowledge graph
CN110990585B (en) * 2019-11-29 2024-01-30 上海勘察设计研究院(集团)股份有限公司 Multi-source data and time sequence processing method and device for building industry knowledge graph
CN111428048A (en) * 2020-03-20 2020-07-17 厦门渊亭信息科技有限公司 Cross-domain knowledge graph construction method and device based on artificial intelligence
CN113434610A (en) * 2021-07-06 2021-09-24 中电科大数据研究院有限公司 Operation method of knowledge graph structure based on naming rule and cache mechanism

Also Published As

Publication number Publication date
CN113918663A (en) 2022-01-11
CN113434610A (en) 2021-09-24
WO2023279684A1 (en) 2023-01-12

Similar Documents

Publication Publication Date Title
LU503512B1 (en) Operating method for construction of knowledge graph based on naming rule and caching mechanism
US20230122210A1 (en) Resource dependency system and graphical user interface
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
EP4195112A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
US11269822B2 (en) Generation of automated data migration model
US20230195728A1 (en) Column lineage and metadata propagation
CN112612902A (en) Knowledge graph construction method and device for power grid main device
CN111708773A (en) Multi-source scientific and creative resource data fusion method
CN110990585B (en) Multi-source data and time sequence processing method and device for building industry knowledge graph
CN111078780A (en) AI optimization data management method
US10353877B2 (en) Construction and application of data cleaning templates
Khattak et al. Ontology Evolution and Challenges.
CN112966162A (en) Scientific and technological resource integration method and device based on data warehouse and middleware
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
CN115328894A (en) Data processing method based on data blood margin
CN116991931A (en) Metadata management method and system
US12001423B2 (en) Method and electronic device for obtaining hierarchical data structure and processing log entries
CN113254725A (en) Data management and retrieval enhancement method for graph database
CN116578612A (en) Lithium battery finished product detection data asset construction method
CN110569061A (en) Automatic construction system of software engineering knowledge base based on big data
JP2017010376A (en) Mart-less verification support system and mart-less verification support method
CN114692595B (en) Repeated conflict scheme detection method based on text matching
US11250010B2 (en) Data access generation providing enhanced search models
CN117573934A (en) Intelligent data interaction method and device for optical fiber transmission knowledge management system
CN117668229A (en) Meta model automatic acquisition and classification management method, device and storage medium

Legal Events

Date Code Title Description
FG Patent granted

Effective date: 20230619