CN107330125B

CN107330125B - Mass unstructured distribution network data integration method based on knowledge graph technology

Info

Publication number: CN107330125B
Application number: CN201710593929.3A
Authority: CN
Inventors: 曹敏; 邹京希; 唐立军; 赵旭; 周年荣; 魏玲; 沈鑫
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2020-06-30
Anticipated expiration: 2037-07-20
Also published as: CN107330125A

Abstract

The invention discloses a massive unstructured distribution network data integration method based on knowledge graph technology.A data acquisition unit acquires unstructured distribution network data of each information system and respectively performs quality analysis and data cleaning treatment on the unstructured distribution network data of each information system; constructing a local data index based on a local knowledge graph according to the processed unstructured distribution network data of each informatization system; sending the local index of the data based on the local knowledge graph to a data management center through a big data connector; and the data management center constructs a data global index based on the global knowledge graph. The distributed multi-source heterogeneous data acquisition, quality analysis and data cleaning are prepositioned to each informatization system, and the data fusion calculation amount, storage pressure and data scheduling burden of a data management center are reduced; the data sources are integrated by using the data global index based on the global knowledge graph, so that the data is convenient to query and extract, and the workload of a data management center is reduced.

Description

Mass unstructured distribution network data integration method based on knowledge graph technology

Technical Field

The invention relates to the technical field of data fusion and integration, in particular to a massive unstructured distribution network data integration method based on a knowledge graph technology.

Background

The power grid comprises a marketing system, a production system, a power distribution data acquisition and monitoring system, an electric energy meter and other informatization systems, mass data from distribution network equipment needs to be efficiently and quickly acquired for enhancing the power grid operation capacity and expanding the service capacity and quality of power customers, the data are effectively identified and filtered by combining with the marketing system, the production system and other business system data, and finally, relevant data which are beneficial to power operation and improve the service quality and service level of the customers are output.

Distribution network data collected from various information systems can be divided into two types, one is structured data such as data or symbol data, and the other is unstructured data such as user voice, images, texts and the like. The existing integration method of the unstructured distribution network data is to establish a uniform data center platform, copy the acquired unstructured data to the data center platform by adopting technologies such as a data adapter and the like, and then integrate the data after cleaning, thereby solving the requirement of frequent data exchange among all departments.

However, on one hand, the method generally performs centralized data cleaning in a data center, which results in large cleaning amount and low integration speed of the data center, and cannot meet the integration requirement of massive unstructured data; on the other hand, the unstructured data of each information system has differences in service logic, data format and storage, so that after the data are transmitted to the data center platform, the data are not beneficial to classified storage of mass data, inconvenience is brought to data extraction and query, and workload of the data center platform is greatly increased.

Disclosure of Invention

In order to solve the technical problems, the invention provides a massive unstructured distribution network data integration method based on a knowledge graph technology.

According to the embodiment of the invention, a massive unstructured distribution network data integration method based on a knowledge graph technology is provided, and the method comprises the following steps:

acquiring unstructured distribution network data of each information system by a data acquisition unit, and performing quality analysis and data cleaning treatment on the unstructured distribution network data of each information system respectively;

according to the processed unstructured distribution network data of each information system, constructing a local data index based on a local knowledge graph, wherein the local data index based on the local knowledge graph comprises the following steps: local knowledge maps and local data index tables of the various informatization systems;

sending the local data index based on the local knowledge graph to a data management center through a big data connector;

and constructing a global data index based on the global knowledge graph by the data management center, wherein the global data index based on the global knowledge graph comprises the global knowledge graph and a global data index table.

Further, the step of constructing a local data index based on a local knowledge graph according to the processed unstructured distribution network data of each information system includes:

performing entity extraction on the processed unstructured distribution network data of each information system to obtain an entity library of the unstructured distribution network data of each information system, wherein the entity library comprises entity, class and attribute information of the unstructured distribution network data of each information system;

constructing the local knowledge graph according to the relation of each entity in the entity library;

and constructing a local data index table by taking the entity name of each entity in the entity library as a key word, wherein the local data index table comprises local index information corresponding to each entity in the entity library, and the local index information comprises attributes, examples, a belonging text, a data source name and a belonging database.

Further, the step of constructing a global index of data based on the global knowledge-graph by the data management center comprises:

performing conflict detection on the local knowledge maps of the various information systems, wherein the conflict detection comprises entity name conflict detection, upper and lower relation conflict monitoring, single-value attribute conflict detection and multi-value attribute conflict detection;

if the local knowledge maps of the various information systems have conflicts, the conflicts are eliminated;

unifying local index information of each entity in the local data index table according to the entities, classes, attribute values and upper and lower relations of the local knowledge graph obtained in the process of detecting and eliminating conflicts, and constructing a global knowledge graph;

constructing a mapping relation between the global knowledge graph and the local knowledge graphs of the various information systems;

and according to the mapping relation and the local data index table, taking the entity name of each entity in the entity library as a key word, constructing a global data index table, wherein the global data index table comprises global index information corresponding to each entity in the entity library, and all the index information comprises the relationship, the caused conflict, the local index information and the local knowledge graph.

Further, if there is a conflict between the local knowledge-graphs of the information systems, the step of eliminating the conflict comprises:

creating priorities of local knowledge maps of the various informatization systems;

if entity name conflict or superior-inferior relation conflict exists between the local knowledge maps of the information systems, selecting the entity name or superior-inferior relation of the local knowledge map with the highest priority as the entity name or superior-inferior relation of the global knowledge map, and modifying the corresponding entity name and superior-inferior relation of the local knowledge map;

traversing single-value attributes in each local knowledge graph, if a certain single-value attribute is detected to be multi-value, selecting the attribute value of the local knowledge graph with the highest priority as the attribute value of the attribute in the global knowledge graph, and modifying the attribute value of the corresponding local knowledge graph;

and if the multi-value attribute values of the local knowledge maps are detected to be inconsistent, combining the attribute values of all the local knowledge maps to form the attribute value of the global knowledge map, and modifying the corresponding attribute value of the local knowledge map.

Further, the step of performing entity extraction on the processed unstructured distribution network data of each information system includes:

judging whether the processed unstructured distribution network data of each information system is text data or not;

if the processed unstructured distribution network data of each information system are text data, extracting entity, class and attribute information according to a preset rule and a dictionary method;

if the processed unstructured distribution network data of the information systems are not text data, converting the processed unstructured distribution network data of the information systems into texts;

and segmenting the text, analyzing the syntactic structure of the text and the dependency relationship among words in a sentence by adopting a syntactic analysis algorithm based on natural language processing, and then extracting entity, class and attribute information.

Further, the step of constructing the local knowledge graph according to the relationship of each entity in the entity library includes:

performing inner product on any subsequence with a certain length in the character string sequence of the textual unstructured distribution network data, and calculating the similarity between sentences;

taking the core of the character string sequence as the core of a support vector machine to carry out statistical learning, obtaining the relation of each entity in the entity library, and constructing the local knowledge graph by adopting the triple shown in the following formula:

G_L＝(E,R,S)

wherein G is_LIs the local knowledge-graph; e ═ E₁,e₂,…,e_|E|The entity library is a set of entities in the entity library, and contains | E | different entities in total; r ═ R₁,r₂,…,r_|R|The entity relation is a set of entity relations in the entity library, and the entity relations in the entity library contain | R | different entity relations;

representing a set of triples in the local knowledge-graph.

Further, the method for detecting entity name conflict comprises the following steps:

calculating the similarity between the entity A of one local knowledge graph and the entity B of other local knowledge graphs according to the following formula;

Sim(A,B)＝Dis(L_A,L_B)+Dis(S_A,S_B)

wherein Sim (a, B) is the similarity of the entity a and the entity B; dis (L)_A,L_B) Is class L of said entity A_AClass L with said entity B_BThe distance of (d); dis (S)_A,S_B) Is attribute S of the entity A_AWith an attribute S of said entity B_BThe distance of (d);

if the similarity between the entity A and the entity B is larger than a threshold value, judging whether the entity names of the entity A and the entity B are the same;

and if the entity names of the entity A and the entity B are the same, the detection result is that entity name conflict exists.

Further, the method for monitoring the upper and lower bit relation conflict comprises the following steps:

extracting an upper and lower relation graph of an entity A in a certain local knowledge graph;

searching out a top-bottom relation entity set related to the entity A from other local knowledge maps, and extracting a top-bottom relation graph of each entity in the top-bottom relation entity set;

obtaining a combined upper and lower position relation diagram according to the following formula;

G＝G_A∪G_q1∪G_q2…∪G_qn

wherein G is a combined upper and lower relation graph; g_AIs a superior-inferior relation graph of the entity A; g_q1、G_q2…G_qnRespectively, a top-bottom relationship diagram of each entity in the top-bottom relationship entity set is taken, n is the top-bottom relationship diagramNumber of entities in the set of bit-related entities;

deleting all vertexes with zero in-degree and related outgoing edges in the combined superior-inferior relational graph until no vertexes are output in the combined superior-inferior relational graph;

if the nodes in the combined upper and lower relationship graph are deleted, the detection result is that no upper and lower relationship conflict exists; and if at least one node exists in the combined upper and lower bit relation graph, detecting that the upper and lower bit relation conflict exists.

Further, the method further comprises: and updating the local data index based on the local knowledge graph and the global data index based on the global knowledge graph according to the unstructured distribution network data of new equipment and/or new users.

Further, the step of updating the local knowledge-graph-based data local index and the global knowledge-graph-based data global index according to unstructured distribution network data of new devices and/or new users comprises:

acquiring unstructured distribution network data of new equipment and/or new users, and extracting entity, class and attribute information of the unstructured distribution network data of the new equipment and/or the new users;

judging whether the entity and the class of the unstructured distribution network data of the new equipment and/or the new user are matched with the entity and the class in a certain local knowledge graph;

if the judgment result is matching, fusing the entity of the unstructured distribution network data of the new equipment and/or the new user with the local knowledge map, updating the corresponding entity attribute and the superior-inferior relation between the entities, and updating the local data index table and the data global index based on the global knowledge map according to the fused local knowledge map;

and if the judgment result is not matched, creating a new entity and class, and updating the local data index based on the local knowledge graph and the global data index based on the global knowledge graph according to the new entity and class.

According to the technical scheme, the mass unstructured distribution network data integration method based on the knowledge graph technology is characterized in that a large data connector and a data acquisition unit are arranged in various informatization systems such as a marketing system, a production system, a distribution data acquisition and monitoring system and an electric energy meter, the distributed multi-source heterogeneous data acquisition, quality analysis and data cleaning processes are preposed to the various informatization systems, and data fusion calculation amount, storage pressure and data scheduling burden of a data management center are reduced. The data acquisition unit performs data sampling, quality analysis and data cleaning on the unstructured distribution network data such as user voice, pictures and texts of each information system, the processed unstructured distribution network data is used for constructing a local knowledge map and a local data index table of each information system, and the local knowledge map and the local data index table are transmitted to the data management center through the big data connector. And the data management center detects and eliminates the conflict between the local knowledge maps, and constructs a global knowledge map and a global data index table which are suitable for all data, so that the data sources are integrated by utilizing the global knowledge map and the global data index table. In the process of newly-added data integration, data integration can be optimized by using the global knowledge graph, and the local data index based on the local knowledge graph and the global data index based on the global knowledge graph are updated by using the acquired unstructured distribution network data of new equipment and/or new users. With the increase of integrated equipment and data, the constructed local knowledge map and the constructed global knowledge map are continuously updated, so that the subsequent retrieval and query of mass data of the distribution network, analysis of big data and the like are facilitated.

Drawings

FIG. 1 is a flow chart illustrating a distributed multi-source heterogeneous data index building according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a method for integrating data of a massive unstructured distribution network based on a knowledge graph technology according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating a method for constructing a local index of data based on a local knowledge-graph according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a local data index table according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating local data indexing based on local knowledge-graphs, in accordance with an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for building a global index of data based on a global knowledge-graph according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a global data index table according to an embodiment of the present invention;

FIG. 8 is a flow diagram illustrating a method for eliminating conflicts between local knowledge-graphs, in accordance with an embodiment of the present invention;

fig. 9 is a flowchart illustrating a method for extracting an unstructured distribution network data entity according to an embodiment of the present invention;

fig. 10 is a flowchart illustrating a method for integrating data of a massive unstructured distribution network based on a knowledge graph technology according to another embodiment of the present invention;

fig. 11 is a flowchart illustrating a method for updating a knowledge graph according to another embodiment of the invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a flow chart of distributed multi-source heterogeneous Data index construction shown in an embodiment of the present invention includes a plurality of informatization systems, such as an intelligent electric meter, an SCADA (Supervisory Control And Data Acquisition, power distribution Data Acquisition And monitoring) system, a marketing system, a production system, And the like, where each informatization system is equipped with a Data Acquisition unit And a big Data connector, And the Data Acquisition unit is configured to acquire, analyze, And clean Data of an unstructured distribution network Data of each informatization system, And discover And correct recognizable errors in the Data, including checking Data consistency, processing invalid values And missing values, And the like. And if the data acquisition unit acquires and processes: the system comprises intelligent electric meter data, remote measurement, remote control and remote regulation data of an SCADA system, user information data of a marketing system, equipment information data of a production system and the like. The big data connector is used for transmitting the local data index based on the local knowledge graph to the data management center.

In the invention, the unstructured distribution network data of each information system is in a distributed multi-source heterogeneous form, and the processes of acquisition, quality analysis and data cleaning of the distributed multi-source heterogeneous data are preposed to each information system without corresponding operation of a data management center, so that the data fusion calculation amount, the storage pressure and the data scheduling burden of the data management center are favorably reduced.

As shown in fig. 2, a method for integrating massive unstructured distribution network data based on a knowledge graph technology according to an embodiment of the present invention includes:

and step S10, acquiring the unstructured distribution network data of each information system by the data acquisition unit, and performing quality analysis and data cleaning treatment on the unstructured distribution network data of each information system respectively.

In the invention, the unstructured distribution network data of each information system come from different information systems, and the data structure and types are diversified, such as user voice data, image and/or text data, and the like, so that the unstructured distribution network data of each information system is in a distributed multi-source heterogeneous form, the processes of acquisition, quality analysis and data cleaning of the distributed multi-source heterogeneous data are prepositioned to each information system, and a data management center is not required to perform corresponding operation, thereby being beneficial to reducing the data fusion calculation amount, storage pressure and data scheduling burden of the data management center.

Step S20, according to the processed unstructured distribution network data of each information system, constructing a local data index based on a local knowledge graph, wherein the local data index based on the local knowledge graph comprises: and the local knowledge map and the local data index table of each informatization system.

In order to eliminate the difference of the data of each information system in business logic, data format and storage, the unstructured distribution network data of each information system needs to be abstracted into knowledge such as entities, attributes and relationships among the entities, a local knowledge graph and a local data index table are constructed, and thus a local data index based on the local knowledge graph is constructed.

And step S30, sending the local data index based on the local knowledge graph to a data management center through a big data connector.

The big data connector can be selected from an Oracle big data connector or other standard database big data connectors.

And step S40, constructing a global data index based on the global knowledge graph by the data management center, wherein the global data index based on the global knowledge graph comprises a global knowledge graph and a global data index table.

As shown in fig. 3, step S20 includes:

s201, performing entity extraction on the processed unstructured distribution network data of each information system to obtain an entity library of the unstructured distribution network data of each information system, wherein the entity library comprises entity, class and attribute information of the unstructured distribution network data of each information system.

S202, the local knowledge graph is constructed according to the upper and lower relations of each entity in the entity library.

The constructed local knowledge graph is not a general knowledge graph but a special knowledge graph aiming at the power distribution network, and the class refers to the classification of the entities, such as a user entity, an equipment entity and the like; the entity refers to an entity name under a certain class, such as a user name, an equipment name, a manufacturer name and the like; the attributes refer to information and data collected by an entity.

The equipment name mainly comprises overhead lines, cables, towers, Distribution transformers, isolating switches, circuit breakers, reclosers, sectionalizers, load switches on the columns, a ring main Unit, a voltage regulator, reactive compensation capacitors, Feeder Terminal Units (FTUs), data acquisition and monitoring Terminal units (DTUs), Distribution Transformer monitoring Terminal units (TTUs), Remote Terminal Units (RTUs) and other accessory facilities.

The file information, the power failure information, the electricity price information, the electricity charge information, the user information returned by the mobile phone APP and the like extracted from each information system are taken as the attributes of the user entity; and taking the equipment file, the equipment type, the voltage grade, the affiliated station area, the position information, the GIS information, the electric energy meter data, the four-way circuit condition, the state information and the like as the attributes of the equipment entity.

S203, taking the entity name of each entity in the entity library as a keyword, and constructing a local data index table, wherein the local data index table comprises local index information corresponding to each entity in the entity library, and the local index information comprises attributes, examples, a text to which the entity belongs, a data source name and a database to which the entity belongs. The name of the data source is the name of the information system where the entity is located, the database to which the entity belongs is the database where the unstructured distribution network data corresponding to the entity is located, and the database can comprise a plurality of data blocks for storing data.

As shown in fig. 4, which is a schematic diagram of a local data index table according to an embodiment of the present invention, a first column in the table is an entity name of each entity in an entity library of an information system, and the entity names of the entities in the entity library are used as keywords to list and distinguish the entities in the entity library; and listing information such as attributes, examples, belonged texts, data source names, belonged databases and the like corresponding to the row entities to the row in a table.

As shown in fig. 5, for a schematic diagram of a local data index based on a local knowledge graph according to an embodiment of the present invention, taking an entity name 1 as an example for description, when data of the entity name 1 in a text 2 needs to be indexed, according to a local knowledge graph and a local data index table of each information system, finding out a database to which the entity name 1 corresponds in the text 2 as the database 1, and then continuing to find out a target database in the database 1, where the data block 1, the data block 2, and the data block n are corresponding target data blocks, so as to retrieve needed unstructured data; when the data of the entity name 1 in the example 1 needs to be indexed, the database corresponding to the entity name 1 in the example 1 is found to be the database 2 according to the local knowledge map and the local data index table of each information system, and the database 2 is specially used for storing the data of the entity name 1 in the example 1. Therefore, target data required by a user can be inquired according to the local data index based on the local knowledge graph, and the method is convenient, quick and high in accuracy.

The local knowledge maps abstracted from the unstructured distribution network data by the informatization systems and the data sources are mutually independent, so that an 'information island' with various systems and dispersed information is formed, and the system is difficult to search and analyze in a centralized manner. Therefore, a unified intermediary is needed to be established to realize the sharing and integration of data between the application systems. Specifically, as shown in fig. 6, step S40 includes:

s401, performing conflict detection on the local knowledge maps of the various information systems, wherein the conflict detection comprises entity name conflict detection, upper and lower relation conflict monitoring, single-value attribute conflict detection and multi-value attribute conflict detection.

For entities extracted by different data sources such as a marketing system, a production system, an SCADA system, an intelligent electric meter and the like, the situation that different names refer to the same object or the same name refers to different entities is inevitable, and when data integration is carried out, conflicts inevitably exist among local knowledge maps, so that map detection must be carried out on the local knowledge maps to specifically eliminate the conflicts, identify and combine equivalent entities, and clear redundant knowledge and contradictory knowledge, so that an accurate global knowledge map is formed.

S402, if conflicts exist among the local knowledge maps of the various information systems, the conflicts are eliminated.

After the conflict between the local knowledge maps of the information systems is eliminated, the accurate global knowledge map can be generated, the unstructured distribution network data of the information systems can be integrated better, and the data management center can conveniently perform integrated management, query and index on the data.

And S403, unifying the local index information of each entity in the local data index table according to the entities, classes, attribute values and upper and lower relations of the local knowledge graph obtained in the process of detecting and eliminating the conflict, and constructing a global knowledge graph.

S404, constructing a mapping relation between the global knowledge graph and the local knowledge graphs of the information systems.

The indexes of the entity in all local knowledge maps are unified through the process of conflict detection and elimination among the local knowledge maps; and then, in the global scope, the indexes of the local knowledge maps in the global knowledge map are constructed, the data mapping relation of the cross-local knowledge maps is established, the information of the local knowledge maps, the caused conflicts and the like is added to each entity extracted by the data source on the basis of the local data index table, and the data indexes crossing the local knowledge maps are established, so that the data integration of the cross-system and the cross-database is realized.

S405, according to the mapping relation and the local data index table, a global data index table is constructed by taking the entity name of each entity in the entity library as a key word, the global data index table comprises global index information corresponding to each entity in the entity library, all the index information comprises the relationship, the caused conflict, the local index information and the local knowledge map, and FIG. 7 is a schematic diagram of the global data index table.

As shown in fig. 8, step S402 includes:

s4021, creating priorities of local knowledge maps of the information systems.

S4022, if entity name conflicts or superior-inferior relation conflicts exist among the local knowledge maps of the information systems, selecting the entity name or the superior-inferior relation of the local knowledge map with the highest priority as the entity name or the superior-inferior relation of the global knowledge map, and modifying the entity name and the superior-inferior relation of the corresponding local knowledge map.

When detecting entity name conflict or superior-inferior relation conflict, selecting the entity name or superior-inferior relation of the local knowledge graph with the highest priority as the entity or superior-inferior relation of the global knowledge graph, simultaneously incorporating the entity or superior-inferior relation into the global knowledge graph, and modifying the entity name and superior-inferior relation of the corresponding local knowledge graph to realize the global consistency of the entity name and the superior-inferior relation; when the local knowledge maps conflict, the entity names and the upper and lower relations of the global knowledge maps are used as the standard.

S4023, traversing the single-value attributes in each local knowledge graph, if one single-value attribute is detected to be multi-value, selecting the attribute value of the local knowledge graph with the highest priority as the attribute value of the attribute in the global knowledge graph, and modifying the attribute value of the corresponding local knowledge graph.

When the single-value attribute detects multiple values, the value of the local knowledge graph with the highest priority is selected as the value of the attribute in the global knowledge graph, meanwhile, the attribute is incorporated into the global knowledge graph, the attribute value of the corresponding local knowledge graph is modified, and the global consistency of the single-value attribute is realized. When the local knowledge maps conflict, the attribute values of the global knowledge map are used as the standard.

S4024, if the multi-value attribute values of the local knowledge graphs are detected to be inconsistent, combining the attribute values of all the local knowledge graphs to form the attribute value of the global knowledge graph, and modifying the corresponding attribute value of the local knowledge graph.

For the multi-value attributes, if the attribute values of the local knowledge maps are not consistent, the values of all the local knowledge maps are combined to form the attributes of the global knowledge map, and the attribute values of the corresponding local knowledge maps are modified simultaneously to realize the global consistency of the multi-value attributes. When the local knowledge maps conflict, the attribute values of the global knowledge map are used as the standard.

As shown in fig. 9, step S201 includes:

and S2011, judging whether the processed unstructured distribution network data of each information system is text data.

The unstructured distribution network data can comprise different types of data forms such as user voice, images and/or texts, and the entity extraction method is different for different types of data.

And S2012, if the processed unstructured distribution network data of the information systems are text data, extracting entity, class and attribute information according to preset rules and a dictionary method.

For text data with fixed formats such as equipment archives, operation manuals, standards and the like in a production system, extracting entity, class and attribute information in the text data by adopting a method based on rules and a dictionary; the method comprises the steps of requiring a power grid expert to formulate an entity extraction rule conforming to the power grid industry, and extracting entities such as equipment names, equipment types, names of persons, place names, organization names, specific time and the like in texts and class and attribute information thereof by utilizing a dictionary method.

And S2013, if the processed unstructured distribution network data of the information systems are not text data, converting the processed unstructured distribution network data of the information systems into texts.

S2014, performing word segmentation on the text, analyzing the syntactic structure of the text and the dependency relationship among words in a sentence by adopting a syntactic analysis algorithm based on natural language processing, and then extracting entity, class and attribute information.

When the unstructured distribution network data are user voice data, converting the unstructured distribution network data into texts by adopting a voice conversion technology based on a hidden Markov model; when the unstructured distribution network data are images, characters in the images are converted into texts by adopting an image recognition technology based on a support vector machine. And then, segmenting the text by adopting a natural language segmentation technology based on character string matching, and then extracting entities, classes and attributes in the text, namely segmenting the text, analyzing the syntactic structure of a sentence and the dependency relationship among words in the sentence by utilizing a syntactic analysis algorithm of natural language processing, and then identifying the entities, the classes and the attributes.

When the extraction of the entities, the attributes, and the like is completed, an entity library is obtained, and on this basis, the relationship between the two entities is identified by using an entity relationship extraction technology of a support vector machine model based on a character string sequence kernel, and the relationship between the entities is established, that is, step S202 includes:

G_L＝(E,R,S)

representing a set of triples in the local knowledge-graph.

The basic form of the triple mainly comprises an entity 1, a relation, an entity 2, concepts, attributes, attribute values and the like, and the mapping of any entity and original data where the entity is located can be established through a triple set, and the mapping is realized by a local data index table; for each entity extracted from the data source, an index table is established by taking the entity name as a key word, the index table comprises a series of data-related information such as attributes, data source names, affiliation, affiliated database, affiliated table, affiliated text, examples, affiliated local knowledge map and the like, and the data can be quickly positioned in a single distribution network information system through the local data index table, so that the data can be inquired and extracted.

In step S401, the method for detecting entity name conflict includes:

Sim(A,B)＝Dis(L_A,L_B)+Dis(S_A,S_B)

Respectively establishing indexes, namely local data index tables, for entities, entity classes and attributes in each local knowledge graph, then, searching for the entity B in the indexes of other local knowledge graphs for the entity A in a certain local knowledge graph, calculating the similarity Sim (A, B) of the entity A and the entity B, and if the class L of the entity in the current local knowledge graph is L_AAnd attribute S_AClass L of an entity B in other local knowledge-graphs_BAnd attribute S_BVery similarly, but with different entity names, an entity name conflict is detected.

In step S401, the method for monitoring a collision of an upper-lower order relationship includes:

G＝G_A∪G_q1∪G_q2…∪G_qn

wherein G is a combined upper and lower relation graph; g_AIs a superior-inferior relation graph of the entity A; g_q1、G_q2…G_qnRespectively taking a superior-inferior relation graph of each entity in the superior-inferior relation entity set, wherein n is the number of the entities in the superior-inferior relation entity set;

As shown in fig. 10, for a massive unstructured distribution network data integration method based on the knowledge graph technology according to another embodiment of the present invention, after step S203, the method further includes:

s50, updating the local data index based on the local knowledge graph and the global data index based on the global knowledge graph according to the unstructured distribution network data of new equipment and/or new users.

The data management center is responsible for maintaining and updating the global knowledge map, the local knowledge map, the global data index table and the local data index table and managing data exchange. According to the unstructured distribution network data of new equipment and/or new users, the local data index based on the local knowledge map and the global data index based on the global knowledge map are updated, so that the integrated data of the data management center have real-time performance and accuracy, and when new distribution network equipment and an information system are added, the integrated data can adapt to the dynamic state change of a distribution network, and the centralized management of the data is realized. When the related data of a certain entity needs to be inquired, the related data information and the database to which the related data belongs can be inquired through the global data index table, and therefore data integration in various information systems is achieved.

Specifically, as shown in fig. 11, step S50 includes:

s501, acquiring unstructured distribution network data of new equipment and/or new users, and extracting entity, class and attribute information of the unstructured distribution network data of the new equipment and/or the new users;

s502, judging whether the entity and the class of the unstructured distribution network data of the new equipment and/or the new user are matched with the entity and the class in a certain local knowledge graph;

s503, if the judgment result is matching, fusing the entity of the unstructured distribution network data of the new device and/or the new user with the local knowledge map, updating the corresponding entity attribute and the superior-inferior relation between the entities, and updating the local data index table and the data global index based on the global knowledge map according to the fused local knowledge map;

s504, if the judgment result is not matched, a new entity and a new class are created, and the local data index based on the local knowledge graph and the global data index based on the global knowledge graph are updated according to the new entity and the new class.

Entities, classes and attributes in the global knowledge graph come from a plurality of local knowledge graphs, the universality is realized, the distribution network data is strongly identified, the entities and the attributes of the newly added data source are rapidly extracted by adopting the global knowledge graph and the local knowledge graphs, the integration speed and the accuracy of the newly added data source are improved, and the optimization of the data integration is realized; and for the entities which cannot be identified by the knowledge graph, extracting corresponding entities, classes and attributes, matching the entities with the classes and the entities in the original knowledge graph, fusing if the matching degree is high, updating the superior-inferior relation between the entity attributes and the entities, otherwise, creating a new class, and then updating the local data index based on the local knowledge graph and the global data index based on the global knowledge graph, thereby realizing the optimization of the knowledge graph.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A massive unstructured distribution network data integration method based on knowledge graph technology is characterized by comprising the following steps:

constructing a data global index based on a global knowledge graph by the data management center, wherein the data global index based on the global knowledge graph comprises the global knowledge graph and a global data index table;

the step of constructing the global index of the data based on the global knowledge graph by the data management center comprises the following steps:

according to the mapping relation and the local data index table, taking the entity name of each entity in an entity library as a key word, and constructing a global data index table, wherein the global data index table comprises global index information corresponding to each entity in the entity library, and the global index information comprises the relationship, the caused conflict, the local index information and the local knowledge map;

the entity library comprises entities, classes and attribute information of the unstructured distribution network data of the information systems.

2. The method of claim 1, wherein the step of constructing a local knowledge-graph-based data local index based on the processed unstructured distribution network data of the information systems comprises:

performing entity extraction on the processed unstructured distribution network data of each information system to obtain an entity library of the unstructured distribution network data of each information system;

constructing the local knowledge graph according to the superior-inferior relation of each entity in the entity library;

3. The method of claim 1, wherein if there is a conflict between the local knowledge-graphs of the information systems, the step of eliminating the conflict comprises:

4. The method of claim 2, wherein the step of performing entity extraction on the processed unstructured distribution network data of the information systems comprises:

5. The method of claim 2, wherein the step of constructing the local knowledge-graph according to the relationship of each entity in the entity library comprises:

wherein,

is the local knowledge-graph;

is a collection of entities in the entity library, and comprises

A different entity;

is a collection of entity relations in the entity library, and contains all the entity relations

A variety of different entity relationships;

representing a set of triples in the local knowledge-graph.

6. The method of claim 1, wherein the method for detecting entity name conflict comprises:

wherein,

is the similarity of the entity A and the entity B;

is the class of the entity A

Class with said entity B

The distance of (d);

is the attribute of the entity A

Attributes with the entity B

The distance of (d);

7. The method of claim 1, wherein the method of context conflict monitoring comprises:

wherein,

is a combined upper and lower relation graph;

is a superior-inferior relation graph of the entity A;

respectively taking a superior-inferior relation graph of each entity in the superior-inferior relation entity set, wherein n is the number of the entities in the superior-inferior relation entity set;

8. The method of claim 2, further comprising: and updating the local data index based on the local knowledge graph and the global data index based on the global knowledge graph according to the unstructured distribution network data of new equipment and/or new users.

9. The method of claim 8, wherein the step of updating the local knowledgegraph-based data local index and the global knowledgegraph-based data global index based on unstructured distribution network data for new devices and/or new users comprises: