LU503512B1

LU503512B1 - Operating method for construction of knowledge graph based on naming rule and caching mechanism

Info

Publication number: LU503512B1
Application number: LU503512A
Authority: LU
Inventors: Bing Chang; Longjun Zhao; Zhihai Chu; Xiang Li; Xueqiang Ren; Zhongwen Yin
Original assignee: Cetc Bigdata Res Institute Co Ltd
Priority date: 2021-07-06
Filing date: 2021-12-31
Publication date: 2023-06-19
Also published as: CN113918663A; CN113434610A; WO2023279684A1

Abstract

The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: data collection--designing of knowledge graph schema--making a caching strategy--making a naming rule--development of a graph data management module--extraction of entity and relation data--data caching, updating and proofreading--graph generation and secondary proofreading--backup and management of cached data. In the present disclosure, caching mechanism can be used for promoting knowledge graph construction and generation efficiency under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, and for realizing knowledge graph data rollback function; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, and meeting requirements of life-circle management of graph construction and utilization.

Description

Specification ie) 35.1 2

OPERATING METHOD FOR CONSTRUCTION OF

KNOWLEDGE GRAPH BASED ON NAMING RULE AND

CACHING MECHANISM

Technical Field

The present disclosure relates to an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which belongs to the technical field of knowledge graph construction, as well as data storage, management and utilization, and particularly relates to knowledge graph construction, updating and rollback and knowledge graph data management that are based on a naming rule and a caching mechanism.

Background Art

Following the continuous development of computer technology, information communication technology and Internet technology, electronic data has grown explosively, which has promoted the development of a series of fields and related technologies, such as big data and artificial intelligence, so the technology and ability of mining and analyzing valid information from mass data are becoming more and more important. Represented by machine learning and deep learning, technologies related to mining and analyzing of big data have made numerous achievements, however, large resource consumption, poor interpretability of some analytical process and other problems caused by high proportion of repetitive working and frequent processing of mass data still exist in the mining and analyzing of mass data. To solve the afore-mentioned problems, new opportunities and development are needed for theories and technologies related to knowledge graph.

By virtue of knowledge graph, mass data and knowledge in different fields can be represented via data mining and analyzing, information processing, data fusion, knowledge extraction and representation, knowledge fusion and inference, and graph drawing, so that dynamic development law of knowledge field can be revealed in a more simple and intuitive way, and higher-level data analyzing and mining based on knowledge can be supported, thereby providing practical and 1

Specification — 35 1 valuable reference, data and technical support for discipline research.

Knowledge graph construction is a process of continuous iteration and improvement. As human experience and data volume increase, the scale of knowledge graph will be larger and larger, and the complexity of entity and relation network will increase by multiple, correspondingly, data updating and verification, and problem detection will also be more and more difficult. Moreover, because of the optimization strategy for the technical solutions and hardware conditions of most existing knowledge graph databases, a small amount of data of large-scale knowledge graph with multiple frequencies updates more slowly than bulk data with low frequencies; besides, many knowledge graph databases do not have the process logging and rollback functions of traditional relation databases, as a result, once mistakes are made, it is difficult to trace problems and data, which causes great difficulty in knowledge graph data updating and management.

In order to ensure the availability, timeliness, accuracy and stability of computer data, caching is used in many scenarios such as computer storage and web browsers. By the lights of caching, as well as the middle tier data designed for the mining, analyzing and calculation of large-scale data, a transition tier is provided between analyzing and processing of mass data and human experience, and knowledge graph construction and management by means of rational and normative naming rules, data caching strategies and data backup strategies, so that the automaticity of knowledge graph construction and the detail level of data proofreading can be promoted, knowledge graph can be more easily constructed and utilized, rapid splitting, fusion and backup of data in the knowledge graph database can be supported, and data rollback, problem tracing and other requirements during the knowledge graph construction and management can be satisfied, as a result of which the whole process of knowledge graph construction and utilization can be effectively managed, thereby realizing better researches and application of technologies related to knowledge graph.

Summary of the Invention

To solve the above-mentioned technical problem, the present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, wherein the operating method of knowledge graph 2

Specification — 35 1 construction based on a naming rule and a caching mechanism develops a graph data management module and a log management module which integrate multiple functions by means of making more informative knowledge graph schema, rational naming conventions and detailed data caching strategies, and provides a caching tier between the knowledge graph database and the graph construction data, thereby realizing rapid construction, whole-process management, data proofreading, problem tracing, and rollback and so on of knowledge graphs.

The present disclosure 1s realized in a way as follows.

The present disclosure provides an operating method of knowledge graph construction based on a naming rule and a caching mechanism, including steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; ©) establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; 3

Specification — 803512 (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy.

The Step (D) includes: (1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.

The Step @ includes: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.

The Step © includes: 4

Specification — 803512 (3.1) determining a range, a storage location and a storage mode of cached data; (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy; (3.3) defining naming rules for entity and associated relation data storage folders; and (3.4) defining naming rules of corresponding cached data of the entity and associated relation data.

In the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files.

The Step @ includes: (4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; (4.2) completing development and test of a cached data file management and log system updating and management module; and (4.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.

In the Step (4.1), repetitively or similarly named data is found by reading and

Specification — 35 1 recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.

The Step © includes: (5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and (5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.

The Step © includes: (6.1) according to the caching strategy, caching and accumulating the entity attribute data and associated relation attribute data extracted in Step (5), and recording amount and volume of cached data in real time; and (6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; wherein: the constraints comprise constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; 6

Specification —_—mmmm-————— ee 03512 the proofreading of cached data is realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.

In the Step @), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.

The beneficial effects of the present disclosure are as follows: the caching mechanism can be used for promoting knowledge graph construction and generation speed and accuracy under the background of mass data, for decreasing difficulty in data proofreading and log generation management during knowledge graph construction, for realizing knowledge graph data rollback function, and for facilitating data backup, encryption and transmission; rational naming rules and functional modules can be used for decreasing the difficulty of knowledge graph data management, and for realizing automatic processing and comparison of cached files, as well as importing, updating and rollback of knowledge graph data, thereby decreasing graph construction and management difficulty, enhancing graph construction speed, meeting requirements of life-circle management of graph construction and utilization, and providing effective data and technical supports for subsequent application and researches of knowledge graphs. 7

Specification — 803512

Brief Description of Drawing

Fig. 1 is a structure diagram of the present disclosure.

Embodiments

The technical solution of present disclosure will be further described below, but the scope of protection is not limited thereto.

As shown in Fig. 1, an operating method of knowledge graph construction based on a naming rule and a caching mechanism, which includes steps as follows. (D Data collection: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers, specifically comprising: (1.1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data; and (1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to data formats. (2 Designing of knowledge graph schema: designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data, specifically comprising: (2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; (2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and 8

Specification — 803512 associated relation data; (2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and (2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.

Further, remarks need to be written for any information having impacts on construction, utilization, management and expansion of knowledge graphs, so as to ensure smooth work and handover process. @ Making a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached, specifically comprising: (3.1) determining a range, a storage location and a storage mode of cached data;

Preferably, the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage. (3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;

Preferably, the data can be backed up locally or on a server. @ Making cached folder and cached file naming rules, specifically comprising: (4.1) defining naming rules for entity and associated relation data storage folders; and (4.2) defining naming rules of corresponding cached data of the entity and 9

Specification — 803512 associated relation data.

Preferably, the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name is not limited, information is spaced and recognized via specific characters therebetween, and it is ensured that naming thereof meets the naming requirements of system files. ® Development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module, specifically comprising: (5.1) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;

Preferably, repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated, and input parameters of the module are designed to be concise and clear; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file. (5.2) completing a cached data file management and log system updating and management module; and

Preferably, the cached data file management comprises management of

Specification —_— 803512 creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.

(5.3) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules.

@ Extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning,

specifically comprising:

(6.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data; and

(6.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data.

@ Data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data,

specifically comprising:

(7.1) according to the caching strategy, caching and accumulating the entity and relation data extracted in Step (©), and recording amount and volume of cached data in real time; and

(7.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems;

further, the constraints of cached data include constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data is realized via data caching strategies, naming rules

11

Specification — 803512 and the graph data management module, contents of the proofreading include proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems include repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly. graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result, specifically comprising: (8.1) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single piece of or bulk cached data into a knowledge graph database; and

Further, the graph data management module can perform automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files, thereby enhancing speed of graph generation. (8.2) performing a secondary proofreading of the knowledge graph data by means of manual or automatic script, the contents of proofreading including validity and correctness of data.

Further, rationality, validity and correctness of data in the knowledge graph after data updating are judged by means of manual or automatic script, and it is determined according to the judgment whether to proceed to the subsequent step, whether data adjustment is needed, and whether data rollback is needed.

There is further a step ©, i.e. cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked, specifically comprising: (9.1) performing backup for data in the cached data satisfying backup 12

Specification — 803512 requirements according to the backup strategy; (9.2) managing backup data of the cached data.

The present disclosure is a graph construction, updating and rollback method based on a naming rule and a caching mechanism, including steps of: acquiring data necessary for constructing a knowledge graph through interfaces and crawlers; designing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; determining a cached data storage location, a data storage mode and a data backup strategy, and determining a data range that needs to be cached; making cached folder and cached file naming rules; completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, performing secondary data proofreading for updated knowledge graph, and determining a data adjustment strategy according to data proofreading result; completing backup and management of cached data according to the cached data backup strategy.

Example

As stated above, the present disclosure is carried out in a way as follows: 1) acquiring, through interfaces and crawlers, data necessary for constructing a knowledge graph, such as conventional numerical data, text data, image data, video data and voice data, and forming multimodal data; 2) performing preliminary data cleansing and data processing for acquired multimodal data, and performing rational classification and storage according to 13

Specification — 803512 data formats;

3) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form; defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data; defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment, and defining knowledge graph remarks, comprising: other tools used, data and existing problems;

4) determining a range, a storage location and a storage mode of cached data;

5) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;

6) defining naming rules for entity and associated relation data storage folders and cached data;

7) completing a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module; and completing a cached data file management and log system updating and management module;

8) completing tests and optimization of stability, availability, timeliness, and accuracy of each of the above-mentioned modules;

9) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data;

10) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy, thereby ensuring validity and uniqueness of data and correspondence of data and entity attribute data;

14

Specification — 35 1 11) according to the caching strategy, caching and accumulating the extracted entity and relation data, and recording amount and volume of cached data in real time; and proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; 12) using the graph data management module to automatically or semi-automatically realize importing, as well as automatic generation and updating of a single or bulk cached data into a knowledge graph database; and performing a secondary proofreading of the knowledge graph data in the form of manual or automatic script operation, the contents of proofreading including validity and correctness of data; and 13) performing backup for data in the cached data satisfying backup requirements according to the backup strategy, and managing backup data of the cached data.

To sum up, the present disclosure is a knowledge graph construction and management system, wherein: all changes in the process of knowledge graph construction and management can be recorded and saved, and addition, updating and rollback of knowledge graph data can be realized via cached data and history record; the system is applicable to the realization of the life-circle management system of knowledge graphs, establishment of high-quality knowledge graphs, as well as effective backup and management of automatic knowledge graph construction system and knowledge graph data at large and small scales.

Claims

Claims —————— 803512

1. An operating method of knowledge graph construction based on a naming rule and a caching mechanism, comprising steps of: (D data collection: acquiring multimodal data for constructing a knowledge graph through interfaces and crawlers; (2 establishment of knowledge graph schema: establishing a graph schema used for guiding mining and storage of knowledge graph entity data and associated data; © determination of a caching strategy: determining a cached data storage location, a data storage mode and a data backup strategy, determining a data range that needs to be cached, and then constructing naming rules for cached folders and cached files; @ development of a graph data management module: completing development and test of a cached file automatic reading, comparison and recognition module, a graph data importing, updating, deleting and rollback module, and a cached file management module; © extraction of entity and relation data: extracting entity attribute data and associated relation attribute data necessary for a knowledge graph from collected data according to contents of graph schema planning; @ data caching, updating and proofreading: storing extracted entity and relation data according to requirements of the caching strategy so as to obtain cached data, using the graph data management module for preliminary proofreading and updating of data when accumulation of cached data satisfies certain conditions, and processing problematic data; (@ graph generation and secondary proofreading: using the graph data management module to automatically import data, for which preliminary proofreading and updating have been completed, into a knowledge graph database, and performing secondary data proofreading for updated knowledge graph, determining a data adjustment strategy according to data proofreading result, and then generating a graph; and backup and management of cached data: completing backup and management of cached data according to the cached data backup strategy. 1

Claims 03512

2. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step D comprises:

(1.1) acquiring, through interfaces and crawlers, conventional numerical data, text data, image data, video data and voice data necessary for constructing a knowledge graph, and forming multimodal data; and

(1.2) performing preliminary data cleansing and data processing for acquired multimodal data, and classifying and storing data according to data formats.

3. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:

(2.1) defining basic principles and conventions of knowledge graph construction, comprising: background introduction, introduction of graph uses, introduction of data requirements and graph requirements, data confidential agreement, data interpretation, explanation of specialized vocabulary, naming convention of data field, data sources, and data form;

(2.2) defining concepts, uniqueness constraints, categories, domain definitions, attribute naming, attribute interpretation, data association manner, attribute constraints and association constraints of knowledge graph entity data and associated relation data;

(2.3) defining knowledge graph technology selection and graph schema, comprising: graph data storage technology, graph data retrieval and application technology, graph schema build-up, and graph and data development environment; and

(2.4) defining knowledge graph remarks, comprising: other tools used, data and existing problems.

4. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:

(3.1) determining a range, a storage location and a storage mode of cached data; 2

Claims — 5) 35 1

(3.2) determining a backup strategy for the cached data, comprising: a naming rule of backup data, backup location, backup amount, and a backup data management and utilization strategy;

(3.3) defining naming rules for entity and associated relation data storage folders; and

(3.4) defining naming rules of corresponding cached data of the entity and associated relation data.

5. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 4, wherein: in the Step (3.1), the range of cached data comprises full data caching and partial data caching; the storage location comprises a local file system, a local server and a cloud server, the cached data being stored in a file system of the same path or file systems of multiple paths; and the storage mode comprises structured data storage, unstructured data storage and semi-structed data storage; in the Step (3.2), the data can be backed up locally or on a server; and in the Step (3.4), the storage name of the cached data contains entity or relation keywords or code, data uniqueness field name or code, name or code of new entity or relation type, data update time or code, data processing mode or code, and other data-related introduction or code, wherein the order of information in the name 1s not limited, information is spaced and recognized via specific characters therebetween, and it 1s ensured that naming thereof meets the naming requirements of system files.

6. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step @ comprises:

(4.1) completing development and test of a cached data file automatic reading, comparison and recognition module, and a graph data importing, updating, deleting and rollback module;

(4.2) completing development and test of a cached data file management and log system updating and management module; and

(4.3) completing tests and optimization of stability, availability, timeliness, 3

Claims —— e352 and accuracy of each of the above-mentioned modules.

7. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 6, wherein: in the Step (4.1), repetitively or similarly named data is found by reading and recognizing key information in the name of cached data and comparing the key information with log content; judgement of data repeatability and validity is performed by judging similarity of data fields and data contents in the cached data; the graph data importing, updating, deleting and rollback module performs importing, creating, updating, deleting and rollback of one piece of or more graph data, as well as automatic recognition and processing of repetitive data, the module can be automatically, semi-automatically and manually called and operated; the graph data rollback is to revoke all of the latest updating operations of the knowledge graph data, and supports manual and automatic rollback, between which the difference lies in whether the parameters are input manually or automatically, the accurate rollback of data being realized via judgement of name and content of cached file; and in the Step (4.2), the cached data file management comprises management of creating, copying, deleting and renaming of data files, and the log system needs to record modified content, modification object and modification time of data files.

8. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:

(5.1) extracting entity attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy; and

(5.2) extracting associated relation attribute data defined by fusion graph schema from the collected data, and performing data caching and naming according to the data caching strategy.

9. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein the Step © comprises:

(6.1) according to the caching strategy, caching and accumulating the entity 4

Claims —————— 803512 attribute data and associated relation attribute data extracted in Step ©, and recording amount and volume of cached data in real time; and

(6.2) proofreading and updating the cached data when accumulation of the cached data satisfies certain constraints, and recording and processing existing data problems; the constraints including constraint on the amount of cached data, constraint on size of cached data, constraint on processing time, constraint on volume of processed data, and artificial condition constraints; the proofreading of cached data being realized via data caching strategies, naming rules and the graph data management module; contents of the proofreading including proofreading of file name similarity and correctness and proofreading of data content repetitiveness and correctness, and the proofreading can be performed automatically and manually; and the data problems including repetitive data, repetitive naming, misnaming, erroneous data, data missing, and data anomaly.

10. The operating method of knowledge graph construction based on a naming rule and a caching mechanism according to claim 1, wherein: in the Step (@), the graph data management module performs automatic and bulk knowledge graph data generation and updating by recognizing name contents of cached files; in the Step (@), a secondary data proofreading and adjustment is to judge rationality, validity and correctness of data in the knowledge graph after data updating by means of manual or automatic script, and to determine, according to the judgment, whether to proceed to Step (8), whether data adjustment is needed, and whether data rollback is needed; and in the Step (@, cached data backup and management: secondary backup is performed for all or part of the cached data according to data and hardware, backup folders and backup files are uniformly named according to the naming rules, and backup time, backup person and backup content are remarked.