CN114416102A

CN114416102A - Data processing method, device and equipment based on knowledge graph script and storage medium

Info

Publication number: CN114416102A
Application number: CN202210103654.1A
Authority: CN
Inventors: 郑林; 丁军; 黄振; 张渝; 张涛; 聂庆; 贺芳; 王磬音; 谢秋学; 马青; 孙金; 赵秋慧; 常秀; 张悦; 陈添添; 王昊
Original assignee: Haiyizhi Information Technology Nanjing Co ltd; INDAA MEDIA INVESTMENT HOLDINGS Ltd; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Current assignee: Haiyizhi Information Technology Nanjing Co ltd; INDAA MEDIA INVESTMENT HOLDINGS Ltd; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-04-29

Abstract

The invention discloses a data processing method based on a knowledge graph script, which is characterized by comprising the following steps: constructing a knowledge graph script based on a knowledge graph service system; after receiving xml data, putting the xml data into a specified directory, wherein the xml data is a book; processing and analyzing the xml data file through the knowledge graph script to form a bookmark data object; after the data in the book data object is analyzed, adding the entities in the data into a knowledge graph, and then matching semantic and word similarity so as to enable the entities with the same characteristics to be mutually associated; by the method, the analysis and the drawing operation of the xml data can be realized based on the knowledge graph script, so that the operation is simple, the complexity of the drawing storage of the knowledge is reduced, and finally, a personnel management means which is high in reliability and can support the personnel management means in the power industry enterprise is formed.

Description

Data processing method, device and equipment based on knowledge graph script and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a data processing method, apparatus, device, and storage medium based on a knowledge-graph script.

Background

At present, most enterprises in the power industry manually record knowledge of systems such as posts, equipment, infrastructure, science and technology, marketing, power grids and the like through historical files, all the knowledge is compiled into books to a certain extent, most of experience for judging categories is deposited on responsible hands in various fields, when all information and the knowledge need to be concatenated for analysis and the incidence relation of related knowledge is found, the work of inquiring and retrieving becomes very difficult, and the knowledge is not stored systematically. However, with the rapid development of the internet and information technology, a large and complex information system has been derived. The knowledge graph becomes a powerful tool for realizing the purpose, and the dependence on the traditional guiding experience and the traditional problem searching document resolution of disputed nations can be reduced through the knowledge graph.

In the prior art, the knowledge graph is generally constructed. The adopted technical means is knowledge fusion, D2R mapping and the like to carry out association among data, finally obtain corresponding data streams, and call the knowledge of the data streams into a knowledge graph. If the user wants to perform operations such as data decomposition and knowledge graph storage based on the constructed knowledge graph, technical means such as knowledge fusion and D2R mapping are also required. But the technical problems of algorithm judgment, complex knowledge mapping, difficult operation and the like exist.

Therefore, how to realize the analysis and mapping operation of xml data based on the knowledge graph script is a technical problem to be solved at present, so that the operation is simple, the complexity of mapping and storing knowledge is reduced, and finally a personnel management means in the electric power industry enterprise with higher reliability and support is formed.

Disclosure of Invention

The invention discloses a data processing method based on a knowledge graph script, which is used for solving the problems that the prior art can not realize the analysis and drawing operation of xml data based on the knowledge graph script, so that the operation is simple, the complexity of drawing and storing knowledge is reduced, and finally, a personnel management means with higher reliability and supportability in an electric power industry enterprise is formed, and the method comprises the following steps:

constructing a knowledge graph script based on a knowledge graph service system;

after receiving xml data, putting the xml data into a specified directory, wherein the xml data is a book;

processing and analyzing the xml data file through the knowledge graph script to form a bookmark data object;

after the data in the book data object is analyzed, adding the entities in the data into a knowledge graph, and then matching semantic and word similarity so as to enable the entities with the same characteristics to be mutually associated;

the business system is obtained by splitting power resource information and specifically comprises a post manpower knowledge system, an equipment knowledge system, a capital construction knowledge system, a scientific and technological knowledge system, a marketing knowledge system, a power grid knowledge system and a legal knowledge system.

Optionally, the knowledge graph script processes and analyzes the xml data file to form a cookie data object, specifically:

after the xml data is put into the designated directory, the knowledge graph script processes and analyzes according to the path and the type of the xml data file;

if the type of the xml data file is in a format needing decompression, the knowledge graph script decompresses the xml data file to the specified directory and then analyzes the xml data file according to the path;

and if the type of the xml data file is in a format which does not need to be decompressed, the xml data file is directly analyzed according to the path.

Optionally, parsing the xml data file according to the path specifically includes:

reading a self-defined file name under a folder through the knowledge graph script;

after forming a Document object by reading data in a computer file input stream, reading each line of data by using xpath to finally form the bookmark data object;

and when the data in the bookmark data object is analyzed, the xml data file is analyzed.

Optionally, the method further comprises:

defining entities, relations and attributes existing in the knowledge graph based on the knowledge graph service system;

and determining the entity type, the entity basic attribute and the relationship among the entities in the data based on the defined entities, relationships and attributes.

Optionally, the basic attributes of the entities at least include teaching materials, parts, chapters, modules, work types and stations, and the relationships between the entities at least include relationships between teaching materials, parts, chapters, modules, work types and stations.

Optionally, after the entities in the data are added into the knowledge graph, matching of semantics and word similarity is performed, so that the entities with the same characteristics are associated with each other, specifically:

adding the entities of the books into the knowledge graph through the interface of the knowledge graph, matching the titles and abs of the books with the entries, and filtering by using the filter in java8 to establish the association relationship between the books and the entries;

and acquiring part and chapter data from the bookmark data, adding entities in the part and chapter data into the knowledge graph, and establishing a corresponding association relationship through matching of the title and the entry or/and matching of the semantics and the word similarity.

Correspondingly, the invention also discloses a data processing device based on the knowledge-graph script, which comprises:

the construction module is used for constructing a knowledge graph script based on a knowledge graph service system;

the receiving module is used for placing the xml data into a specified directory after receiving the xml data, wherein the xml data is a book;

the processing and analyzing module is used for processing and analyzing the xml data file through the knowledge graph script to form a bookmark data object;

the matching module is used for adding the entities in the data into a knowledge graph and then matching the semantic and word similarity after the data in the bookmark data object is analyzed, so that the entities with the same characteristics are correlated;

Optionally, the processing and parsing module is specifically configured to:

after the xml data is put into the designated directory, the knowledge graph script carries out processing and circular analysis according to the path and the type of the xml data file;

In order to achieve the above object, according to yet another aspect of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

In order to achieve the above object, according to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method.

Compared with the prior art, the invention has the following beneficial effects:

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a flow diagram illustrating a method for data processing based on knowledge-graph scripts according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a knowledge-graph business architecture page according to an embodiment of the present application;

FIG. 3 is a knowledge graph system visualization page view according to an embodiment of the present application;

FIG. 4 is a display of a knowledge-graph editing page according to an embodiment of the present application;

FIG. 5 is a display of another knowledge-graph editing page according to an embodiment of the present application;

FIG. 6 is a block diagram of a data processing apparatus based on a knowledge-graph script according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

Fig. 1 is a schematic flow chart of a data processing method based on a knowledge-graph script according to an embodiment of the present invention, where the method includes:

s201, establishing a knowledge graph script based on a knowledge graph service system.

Specifically, a knowledge graph global knowledge classification system is established according to paper knowledge accumulation modeling related to resources in various fields of the power industry, knowledge is extracted from data from different sources and different structures, and knowledge is formed and stored in a knowledge graph. The method is characterized in that a knowledge graph script is constructed depending on seven large business systems in the power industry, the knowledge graph business systems are obtained by splitting power resource information and specifically comprise a post system, an equipment system, a capital construction system, a scientific and technological system, a marketing system, a power grid system and a legal system, and a page display diagram of the knowledge graph business systems is shown in fig. 2. The knowledge graph is constructed in a multi-dimensional mode from the angles of using teaching materials, using parts, using chapters, using modules, relevant work categories and the like of resource management in various large fields. Fig. 3 is a visualized page diagram of a knowledge graph system, which is constructed based on a knowledge graph. The knowledge map system can analyze and mine the resource management knowledge value of the power industry more deeply, improve the intelligent level of resource management in each field and realize the fusion application of different knowledge data of human resource management.

S202, after receiving the xml data, putting the xml data into a specified directory, wherein the xml data is a book.

Specifically, after receiving the book xml data, the prepared book xml data needs to be put into a specified directory, for example, into the sharing server: under the/upload folder. The following list is included in the complete book data: CHAPTER (CHAPTER), COVER, EPUB (electronic version), MOBI (electronic version), PDF, SOURCE, and XML files.

It should be noted that, in the present application, the xml data is a book, and includes an xml file of data such as a book in the human resource field, a capital construction book, a power generation volume, and the like, the scheme of the above preferred embodiment is only a specific implementation scheme provided by the present application, and other data in any form analyzed through a knowledge graph script all belong to the protection scope of the present application.

And S203, processing and analyzing the xml data file through the knowledge graph script to form a cookdata object.

Specifically, after the book xml data is stored in the customized folder and is arranged, the knowledge graph script is processed and analyzed according to the path and the type of the xml data file, and finally, a book data (customized object name) object is formed.

In order to process the xml data file, in some embodiments, the xml data file is processed and analyzed by the knowledge graph script to form a cookie data object, which specifically includes:

Specifically, after the xml data is put into the specified directory, the knowledge-graph script processes and analyzes according to the path and the type of the xml data file. If the xml data file type is in a format needing decompression, if the xml data file type is in a zip format, the script is automatically read by using ZipFile and is decompressed to an appointed directory, and then the xml data file is analyzed according to the path; and if the type of the xml data file is in a format which does not need to be decompressed, the xml data file directly analyzes the file according to the path.

In order to parse an xml data file, in some embodiments, parsing the xml data file according to the path includes:

Specifically, the knowledge graph script reads a fileList (file name-self-defined on a server) below the whole folder, then performs circular analysis, the file analysis process uses SAXReader (technical read stream) and FileInputStream (computer file input stream) file streams in a matched manner, a read method of SAXReader is used for reading data in the FileInputStream stream to form a Document (computer Document) object, and then, an xpath (java read technology) is used for reading each row of data; for example: book title, directory, section, and chapter, etc. Finally, a cookie data (custom object name) object is formed. And when the data in the bookmark data object is analyzed, the process of analyzing the xml data file is completed.

S204, after the data in the book data object is analyzed, adding the entities in the data into a knowledge graph, and then matching semantic and word similarity, so that the entities with the same characteristics are correlated.

Specifically, after the data in the bookmark data object is analyzed, the entities in the data are added into the knowledge graph and then matched with the semantic similarity and the word similarity, so that the entities with the same characteristics are associated with each other. The XML data analysis and map entering operation are performed through the knowledge map script, the complexity of knowledge map entering storage can be reduced, and the knowledge map script is classified and compiled according to different services, so that the use scene is clearer; the knowledge map script is independently executed outside the system, so that the map entering speed is increased, and operations such as knowledge fusion, algorithm judgment of D2R mapping and the like are not needed; the knowledge map script has strong service, and the operations of adjustment, addition and the like are simpler.

In order to accurately determine the association relationship between the entities, in some embodiments, the method further includes:

Specifically, based on the knowledge graph service system, data in the knowledge graph is defined according to the provided data and the application requirements, and entities and relations in the knowledge graph and attributes of the entities and relations are defined. And determining the entity type, the entity basic attribute and the relationship among the entities in the xml data and the data in the book data object based on the defined entities, the relationship and the attributes. Knowledge in the knowledge-graph exists in the form of (head entities, relationships, tail entities) and (entities, attributes, attribute values).

In order to obtain the interrelation among the data, in some embodiments, basic attributes of entities such as teaching materials, parts, chapters, modules, work types, posts and the like are obtained, including a unified book number, a middle drawing classification number, a book classification, an ISBN number and the like; and acquiring the relation among entities such as teaching materials, parts, chapters, modules, work types, posts and the like, including parts, related entry knowledge and the like.

In order to provide a means for managing personnel in an electric power industry enterprise, in some embodiments, after entities in the data are added into a knowledge graph, semantic and word similarity matching is performed, so that the entities with the same characteristics are associated with each other, specifically:

Specifically, a knowledge graph system interface is requested to request all entity information under a vocabulary entry in an http mode, and the bookmark data is circulated and supplemented with the rest fields, such as: cover, pdfPath, etc.; the COVER page under the COVER directory is uploaded to FastDFS (distributed storage system) and the path returned is finally filled. And then requesting addEntity (api of the knowledge graph) of the knowledge graph system to add the entity of the book into the knowledge graph, matching with the vocabulary entry according to title and abs of the book, and filtering by using a filter in java8, thereby establishing the association relationship between the book and the vocabulary entry (addRelation method-api of the knowledge graph system). And acquiring data of parts and chapters from the bookmark data, establishing entity data of the parts and chapters, and establishing a corresponding association relationship through the matching adaptation ratio (semantic/word similarity) of the title and the entries or the attributes. According to the method and the system, the knowledge graph script is used for analyzing and drawing the xml data, and finally the analyzed data are output to the knowledge graph. As shown in fig. 4 and 5, a display diagram of a knowledge graph page is shown, through the association of various attributes, terms, parts, chapters and other attributes in the knowledge graph, a user can input relevant knowledge points to search, then the searched content is converted into data, and finally a scheme which accords with actual judgment and solves problems can be provided according to the association between the data, so that a personnel management means in the power industry enterprise is provided. Meanwhile, the powerful display page also provides a relatively clear visual effect for a user, and provides a very good retrieval and visual platform for training new staff in an enterprise and acquiring experience data.

In order to further illustrate the technical idea of the present invention, the technical solution of the present invention will now be described with reference to specific application scenarios.

The method comprises the following steps: unstructured xml data parsing.

Firstly, putting prepared book xml data into a specified directory, such as a sharing server: under the/upload folder. The following list is included in the complete book data: CHAPTER (CHAPTER), COVER, EPUB (electronic version), MOBI (electronic version), PDF, SOURCE, and XML files.

And after the book xml data is stored in the self-defined folder and is arranged, processing and analyzing the knowledge graph script according to the path and the type of the xml data file, and finally forming a final formed book data (self-defined object name) object. And after the xml data is put into the specified directory, the knowledge graph script processes and analyzes according to the path and the type of the xml data file. If the xml data file type is in a format needing decompression, if the xml data file type is in a zip format, the script is automatically read by using ZipFile and is decompressed to an appointed directory, and then the xml data file is analyzed according to the path; and if the type of the xml data file is in a format which does not need to be decompressed, the xml data file directly analyzes the file according to the path. The analysis process is as follows:

the knowledge graph script reads a fileList (file name-self-defined on a server) below the whole folder, then carries out circular analysis, the process of analyzing files uses SAXReader (technical read stream) and FileInputStream (computer file input stream) file streams in a matching way, the read method of SAXReader is used for reading data in the FileInputStream streams to form Document objects, and then the xpath (java read technology) is used for reading each row of data; for example: book title, directory, section, and chapter, etc. Finally, a cookie data (custom object name) object is formed. And when the data in the bookmark data object is analyzed, the process of analyzing the xml data file is completed.

Step two: and importing the data in the parsed bookmark data object into the map.

Firstly, requesting a knowledge graph system interface to request all entity information under a vocabulary entry in an http mode, circulating the bookmark data and supplementing the remaining fields for the bookmark data, for example: cover, pdfPath, etc.; the COVER page under the COVER directory is uploaded to FastDFS (distributed storage system) and the path returned is finally filled. And then requesting addEntity (api of the knowledge graph) of the knowledge graph system to add the entity of the book into the knowledge graph, matching with the vocabulary entry according to title and abs of the book, and filtering by using a filter in java8, thereby establishing the association relationship between the book and the vocabulary entry (addRelation method-api of the knowledge graph system). And acquiring data of parts and chapters from the bookmark data, establishing entity data of the parts and chapters, and establishing a corresponding association relationship through the matching adaptation ratio (semantic/word similarity) of the title and the entries or the attributes.

In order to achieve the above technical object, an embodiment of the present application further provides a data processing apparatus based on a knowledge-graph script, as shown in fig. 6, the apparatus including:

a construction module 401, configured to construct a knowledge graph script based on a knowledge graph service system;

the receiving module 402 is configured to, after receiving xml data, place the xml data into an appointed directory, where the xml data is a book;

the processing and analyzing module 403 is configured to process and analyze the xml data file through the knowledge graph script to form a bookmark data object;

a matching module 404, configured to, after the data in the cookdata object is analyzed, add entities in the data into a knowledge graph and perform semantic and word similarity matching, so that the entities with the same characteristics are associated with each other;

In a specific application scenario of the present application, the processing and analyzing module is specifically configured to:

The present application further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

According to yet another aspect of the application, there is also provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data processing method based on a knowledge-graph script is characterized by comprising the following steps:

the service system is obtained by splitting power resource information, and specifically comprises a post system, an equipment system, a capital construction system, a scientific and technological system, a marketing system, a power grid system and a legal system.

2. The method of claim 1, wherein the xml data file is processed and analyzed by the knowledge graph script to form a cookie data object, specifically:

3. The method according to claim 2, wherein parsing the xml data file according to the path specifically comprises:

4. The method of claim 1, further comprising:

5. The method as claimed in claim 4, wherein the entity basic attributes include at least textbook, part, chapter, module, work category and post, and the relationship between the entity and the entity includes at least the relationship between the textbook, part, chapter, module, work category and post.

6. The method of claim 1, wherein the matching of semantic and word similarity is performed after the entities in the data are added to the knowledge-graph, so that the entities with the same characteristics are related to each other, specifically:

7. A data processing apparatus based on a knowledge-graph script, the apparatus comprising:

8. The apparatus of claim 7, wherein the processing and parsing module is specifically configured to:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 6 are implemented by the processor when executing the computer program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.