CN112363996A - Method, system, and medium for building a physical model of a power grid knowledge graph - Google Patents

Method, system, and medium for building a physical model of a power grid knowledge graph Download PDF

Info

Publication number
CN112363996A
CN112363996A CN202011197189.XA CN202011197189A CN112363996A CN 112363996 A CN112363996 A CN 112363996A CN 202011197189 A CN202011197189 A CN 202011197189A CN 112363996 A CN112363996 A CN 112363996A
Authority
CN
China
Prior art keywords
source
objects
field
relationship
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011197189.XA
Other languages
Chinese (zh)
Other versions
CN112363996B (en
Inventor
沈亮
杨帅
朱广新
廖小琦
王春梅
宜东海
吴桂栋
吴一
郝保聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN202011197189.XA priority Critical patent/CN112363996B/en
Publication of CN112363996A publication Critical patent/CN112363996A/en
Application granted granted Critical
Publication of CN112363996B publication Critical patent/CN112363996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system and a medium for establishing a physical model for a power grid knowledge graph. The method comprises the following steps: determining a table schema for defining table objects and fields thereof; generating table information of all table objects according to a table mode based on a first data source to generate a physical table set of a physical model; determining a relationship schema for defining a relationship between a source table object and a target table object; for each pair of source table objects and target table objects in the second data source that are subject to de-duplication processing, generating corresponding table relationship information in a relationship mode based on the second data source to generate a set of relationships of the physical model; a physical model including table objects, fields, and relationships is built based on the set of physical tables and the set of relationships. By using the scheme of the invention, knowledge extraction can be carried out on different data sources, leakage and deficiency of the existing model are checked and repaired to make up for the design shortness of the existing model, a more reasonable control model is provided for a user, and information matching of a unified data model is supported.

Description

Method, system, and medium for building a physical model of a power grid knowledge graph
Technical Field
The present invention relates to knowledge-graph technology, and more particularly, to a method for building a physical model for a power grid knowledge-graph, and a corresponding system and computer-readable storage medium.
Background
With the further development of knowledge graph technology, the knowledge graph lays a foundation for large-scale knowledge base organization and intelligent application by the strong semantic processing capability and knowledge organization capability of the knowledge graph. A knowledge graph is composed of a large number of entities and entity associations. Through the knowledge graph, entities such as landmarks, names of people, cities, sports teams, buildings, geographic features, movies, celestial bodies, artistic works and the like can be retrieved, and information related to the entities is obtained. This is the key to building intelligent applications, which integrate into the collective intelligence of the network and can be more humanlike to understand the world. In a specific application occasion, a domain knowledge map is required to be built based on a specific domain ontology base, and information intelligent retrieval and domain intelligent application construction facing to a specific domain are supported. Knowledge graph construction facing a specific field not only needs general knowledge, but also focuses on combining field professional knowledge. The construction of the domain knowledge graph needs to support the practical engineering application, and compared with the construction of a general knowledge graph, the construction of the domain knowledge graph has higher requirements on the aspects of identification rate, accuracy and other related indexes. In order to satisfy the field-oriented large-scale knowledge base and intelligent application construction, an information extraction technology adapted to field characteristics and a construction method of a field knowledge map need to be researched.
In recent years, a large number of knowledge graphs with Chinese as a main language are provided domestically, and the knowledge graphs are mainly constructed based on the structured information of encyclopedia and Wikipedia and aim to utilize community strength to maintain the Schema standard of the knowledge graph of the open domain. The construction mode of the knowledge graph comprises manual editing and automatic extraction, but the automatic extraction method is mainly based on the structured information in the online encyclopedia and ignores the unstructured text, and most information in the Internet is just presented in an unstructured free text form. In the same period of development of link data, a plurality of knowledge acquisition methods based on information extraction technology are proposed to construct an open domain knowledge graph based on free text. In 2007, Bank o et al, Washington university, first proposed open-Domain information extraction (OIE), and directly extracted entity-relationship triples, namely, three parts, namely, a head entity, a relationship indicator and a tail entity, from a large-scale free text. Before the OIE is proposed, many free text-oriented information abstractions have been proposed, but the main idea of these methods is to train a corresponding extractor for each target relationship. Such a conventional information extraction method cannot work efficiently when facing massive relation types in internet texts, that is, it is unrealistic to train an extractor for each target relation, and what is more serious, in many cases, the type of relation cannot be determined in advance for massive web texts.
In addition, the current knowledge resource classification, intelligent search and cross-domain knowledge fusion and representation based on enterprise-level data models are still in a starting stage, an intuitive and popular model interface facing relevant managers and business personnel is lacked, and the logical link search capability and the static semantic analysis and evaluation capability of the data models are also severely limited. The data model such as the national grid company enterprise public data model (SG-CIM) is a comprehensive abstraction of data in the aspects of company enterprise-level power grid, assets, finance and the like, and is not only huge in quantity, but also extremely numerous in professional categories, so that the following problems still exist in three aspects of model achievement, application and support: (1) the quality of model design still needs to be improved, namely, in the current model design result, the practical problems of inconsistent abstraction degrees of partial data objects, inaccurate entity relationship, incomplete data objects and attributes, incomplete duplicate removal, incomplete data tracing, non-correspondence between standard codes and source end service system codes and the like still exist; (2) the mapping rate of the model is not high, namely, each unit is mapped and compared based on physical models of different versions, so that the average mapping rate is low; (3) the method is lack of tool support, namely the existing data model management and control mostly adopts an offline mode, the process is complex, the communication efficiency is low, the model design result is abstract, all levels of personnel are difficult to understand the model, the application capability is insufficient, and the quality of model application and iteration perfection cannot be guaranteed.
Therefore, there is a need to provide an improved solution to overcome the drawbacks of the existing data models.
Disclosure of Invention
The present invention is directed to a solution to the above technical problem.
Specifically, according to a first aspect of the present invention, there is provided a method for building a physical model for a power grid knowledge-graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
In one embodiment, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode.
In one embodiment, a library of table schemas for table objects and their fields is provided from which a table schema for defining table objects and their fields is determined.
In one embodiment, a library of relational schemas is provided that represent relationships between table objects, from which a relational schema is determined that defines relationships between source table objects and target table objects.
In one embodiment, an alias library is provided with table objects, fields thereof and relations among the table objects, wherein the alias library comprises aliases recorded in the past and occurrence frequencies thereof, and records the relations among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source into the alias library and accumulates the occurrence frequencies; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
According to a second aspect of the present invention, there is provided a system for building a physical model for a power grid knowledge-graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generation unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
wherein the relationship set generation unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
wherein the processing unit is configured to:
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
In one embodiment, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relations between the source table objects and the target table objects, and the association fields between the source table objects and the target table objects are all the same, it is determined that the relations between the pair of source table objects and the target table objects are the same as the relations between the another pair of source table objects and the target table objects, and for the same relations, only one relation is normalized and table relation information of the corresponding source table objects and target table objects is generated according to the relation mode.
In one embodiment, a library of table schemas for table objects and their fields is provided from which a table schema for defining table objects and their fields is determined.
In one embodiment, a library of relational schemas is provided that represent relationships between table objects, from which a relational schema is determined that defines relationships between source table objects and target table objects.
In one embodiment, an alias library is provided with table objects, fields thereof and relations among the table objects, wherein the alias library comprises aliases recorded in the past and occurrence frequencies thereof, and records the relations among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source into the alias library and accumulates the occurrence frequencies; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
According to a third aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, causes the above-described method for establishing a physical model for a power grid knowledge graph to be performed.
According to the scheme of the invention, the data of the table objects, the fields and the relations are obtained from a plurality of data sources, the data are subjected to standardization processing, and a uniform and complete data model for the power grid knowledge graph is established according to a predefined table mode, a predefined field mode and a predefined relation mode. By utilizing the method and the system, knowledge extraction can be carried out on different data sources, the existing model is subjected to gap and leakage detection and is made up for the existing model design short board, meanwhile, a more reasonable management and control model can be provided for management and business personnel, and information matching and sharing of a company unified data model are supported. In addition, the invention can further promote the model standard to implement and construct a complete system based on the existing data model, lays a solid foundation for further promoting the data quality management, supports the construction of a data middle station and a service middle station, and obtains direct or indirect benefits in practical application.
Drawings
Non-limiting and non-exhaustive embodiments of the present invention are described by way of example with reference to the following drawings, in which:
FIG. 1 is a flow diagram schematically illustrating a method for building a physical model for a power grid knowledge-graph, in accordance with one embodiment of the present invention;
FIG. 2 is a flow diagram that schematically illustrates a set of physical tables that build a physical model, in accordance with an embodiment of the present invention;
FIG. 3 is a flow diagram that schematically illustrates a set of relationships that establish a physical model, in accordance with an embodiment of the present invention; and
FIG. 4 is a schematic diagram illustrating a system for building a physical model for a grid knowledge graph according to one embodiment of the invention.
Detailed Description
In order to make the above and other features and advantages of the present invention more apparent, the present invention is further described below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting.
As a first aspect of the invention, a method is provided for building a physical model for a power grid knowledge graph. Fig. 1 schematically shows a method S100 for building a physical model for a power grid knowledge graph according to an embodiment of the invention. As shown in fig. 1, S100 may include step S101, step S102, step S103, step S104, step S105, and step S106.
In step S101, a table schema for defining the table object and its fields is determined. A table schema may also be referred to herein as a table definition, which is used to define constituent members of a table object, and may include, for example, various suitable constituent members for distinguishing one table object from other table objects.
In one embodiment, the table schema may include a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object. The table name may include at least one of a table english name and a table chinese name of the table object. For example, the table schema may be determined in json format, as follows:
{
'name' [ table name (English), table name (Chinese) ],
'area' a subject field,
a secondary topic area,
type' is a table type of the data,
description of the table description,
'department' responsibility department,
source system name of data source,
a source table [ data source system table name (English), data source system table name (Chinese) ],
the 'more' is a remark that,
'fields' [ field List ]
}
In one embodiment, the list of fields in the table schema is determined according to a predefined field schema based on the field related information and the field source related information in the first data source, and the field schema may include a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system. A field schema may also be referred to herein as a field definition, which is used to define the constituent members of a field. For example, the field pattern may be determined in json format, as follows:
{
'name' [ field (English), field (Chinese) ],
'datatype' field data type,
description of a field description,
a standard code,
'storage format' data storage format,
a hash column,
'department' responsibility department,
'source system' [ data Source System English name, data Source System Chinese name ],
a source table [ data source system table name (English), data source system table name (Chinese) ],
a source field [ data source system field (English), data source system field (Chinese) ],
source data type data source field type,
remarks to' more
}
In step S102, a first data source comprising a plurality of table objects is received, the first data source comprising table object related information, field related information, table object source related information and/or field source related information. The first data source may be broadly understood herein to encompass data sources in a variety of possible forms, including structured, semi-structured, and unstructured forms, such as relational databases, data silos, non-relational databases, document libraries, various types of reports, and the like. Preferably, the first data source of the present invention comprises a data source in the form of an excel document.
In step S103, for each table object, generating corresponding table information according to the table schema based on the first data source, so as to obtain a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, where the table information at least indicates a table name, a field, a table object source, and a field source of the table object. It should be understood that for each table object, the table name may be referred to as a recognition criterion of the table object, for example, a recognition criterion of "table english name + table chinese name" as the table object, a recognition criterion of "table english name" as the table object, or a recognition criterion of "table chinese name" as the table object. The set of physical tables of the physical model may be stored in various suitable file forms, such as a json storage file form, as desired. In one embodiment, the json storage file form of the physical table set of the physical model is as follows:
Figure BDA0002754364840000081
Figure BDA0002754364840000091
Figure BDA0002754364840000101
step S103 is described in detail below with reference to fig. 2.
As shown in fig. 2, the excel document as the first data source includes three parts of "data table information" indicating the table names and relevant information of fields of all table objects, "data table comparison table" indicating the table source relevant information of the source system of the table objects, and "field comparison table" indicating the field source relevant information of the source system of the fields. The source system may, for example, include a data platform of various possible power knowledge aspects. Because the information has the problem of inconsistent capital and small cases of the table names, the capital and small insensitive principle is adopted in the process of establishing the model, and the English names of the table are uniformly subjected to capital treatment. The specific process is as follows: firstly, taking 'table English name + table Chinese name' as a discrimination standard of table objects, and aggregating according to the 'table English name + table Chinese name' to obtain table name information of each table object; secondly, for each table object, carrying out normalization processing on the table name of the table object, for example, removing a space and a line break in the table object; thirdly, standardizing field names, field data types and the like of all fields of the table object, for example, removing spaces, line breaks and the like, sorting according to field definitions to obtain a standardized field, supplementing information of the field based on a field comparison table, for example, acquiring the name of a source system of the field, the table name of the source system, the field type of the source system and the like from the field comparison table, standardizing the information, and integrating the information subjected to the standardized processing into the field information; after all fields of the table object are processed, generating the table of the table object and field information thereof according to the table definition; then, supplementing the table and the field information thereof based on the data table comparison table, for example, acquiring the name of the source system of the table object, the table name of the data source system and the like from the data table comparison table, carrying out standardization processing on the information, and integrating the information subjected to the standardization processing into the information of the table object; finally, all the information of the integrated table object is arranged into corresponding table information, and the table information can be represented by a json string. And repeating the steps until all the table objects are processed, thereby obtaining the physical table set containing all the table objects. The set of physical tables containing all table objects may be exported in a json storage file.
In step S104, a relationship schema for defining the relationship between the source table object and the target table object is determined. A relationship schema may also be referred to herein as a relationship definition for defining constituent members of a relationship between pairs of table objects, such as may represent an association between a source table object and a target table object. The table name of the source table object and the table name of the target table object may include at least one of a corresponding table english name and a table chinese name. For example, the relationship schema may be determined in json format, as follows:
{
'entity1' [ Source Table object Table name (English), Source Table object Table name (Chinese) ],
'entity2' [ target table object table name (English), target table object table name (Chinese) ],
a relationship of 'relationship',
a 'field' association field, an association field,
'area' a subject field,
second subject area
}
In step S105, a second data source including relationship related information of relationships between source table objects and target table objects is received, the second data source including a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, table relationship information of the pair of source table objects and target table objects is generated according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set. The second data source may be broadly understood herein to encompass data sources in a variety of possible forms, including structured, semi-structured, and unstructured forms, such as relational databases, data silos, non-relational databases, document libraries, various types of reports, and the like. Preferably, the second data source of the present invention comprises a data source in the form of an excel file. The first data source and the second data source of the invention can comprise data based on each business system under the power grid and time sequence data acquired on the smart power grid, mainly comprise company marketing data, quantitative acquisition data, operation and inspection data and some graphical image webpage data, and can process, extract knowledge and fuse the data in three different forms of structuring, semi-structuring and unstructured. The set of relationships of the physical model may be stored in various suitable file forms, such as a json storage file form, as desired. In one embodiment, the json storage file form of the set of relationships of the physical model is as follows:
Figure BDA0002754364840000121
in one embodiment, step S105 may include: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode. That is, the table name of the source table object, the table name of the target table object, the association relationship between the source table object and the target table object, and the association field between the source table object and the target table object are taken as identifiers for identifying one relationship.
Step S105 is described in detail below with reference to fig. 3.
As shown in fig. 3, an excel document as a second data source is read and data of "association" therein is acquired. Because the data of the incidence relation has the problem of inconsistent capital and small cases of the table names, the capital and small insensitive principle is adopted in the process of establishing the model, and the English names of the table are uniformly subjected to capital treatment. The specific process is as follows: taking the table name (English) of a source table object, the table name (Chinese) of the source table object, the table name (English) of a target table object, the associated table name (Chinese) of the target table object, corresponding association relations and corresponding association fields as the judgment standard of each relation, aggregating the six items aiming at each relation to obtain aggregation identifiers of a plurality of table relations, and judging the table relations with the same aggregation identifier as the same aggregation group; for the table relationships of the same aggregation group, only the first piece of data (so that the table relationships can be deduplicated to avoid redundancy) is subjected to normalization processing on the table names, the association relationships, the association fields and the like, for example, spaces, line feed characters, redundant horizontal bars, equal numbers and the like are removed; the information based on the normalization process is sorted into corresponding table relationship information according to the relationship definition, and the table relationship information can be represented by a json string. And repeating the steps until all the table relations are processed, thereby obtaining a relation set containing all the table relations. The set of relationships that contains all the table relationships may be exported in a json storage file.
In step S106, a physical model including table objects, fields and relationships is built based on the physical table set of the physical model and the relationship set of the physical model.
In one embodiment, the method of the present invention further comprises: and calculating the similarity between the table object pairs in the physical model based on the physical table set of the physical model, and performing de-duplication processing on the table object pairs with the similarity exceeding a preset threshold value to generate the physical table set of the physical model with lower redundancy. The physical table set of the physical model can be matched with the corresponding logic model to realize consistency detection of the model, so that the rationality and completeness of static semantics of the existing model (for example, a national power grid company enterprise public data model SG-CIM4.0) are improved, redundancy is effectively reduced, non-spatial knowledge data which are difficult to observe are converted into a spatial map, cognition and understanding of personnel in related fields are facilitated, and an effective solution is provided for correlation and communication of cross-domain entities. Meanwhile, the strong semantic processing capability of the knowledge graph technology to describe entities, attributes and relationships can be well embodied.
As a second aspect of the invention, a system for building a physical model for a power grid knowledge-graph is provided. Fig. 4 schematically illustrates a system 200 for building a physical model for a grid intellectual graph according to one embodiment of the invention. The system 200 may include a physical table set generation unit 201, a relationship set generation unit 202, and a processing unit 203. The processing unit 203 is communicatively coupled with the physical table set generation unit 201 and the relationship set generation unit 202.
The physical table set generating unit 201 may be configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set including the physical model of the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object.
The relationship set generation unit 202 may be configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set.
In one embodiment, for a pair of source table object and target table object and another pair of source table object and target table object in the second data source, if the table name of its respective source table object, the table name of the target table object, the association relationship between the source table object and the target table object, and the association field between the source table object and the target table object are all the same, it is determined that the relationship between the pair of source table object and target table object is the same as the relationship between the another pair of source table object and target table object, and for the same relationship, only one of the relationships is normalized and table relationship information of the corresponding source table object and target table object is generated according to the relationship mode.
The processing unit 203 may be configured to: based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
It will be appreciated that the specific features described herein in relation to the method for building a physical model for a power grid intellectual graph of the first aspect may also be applied similarly to the system for building a physical model for a power grid intellectual graph of the second aspect to similar extensions. For the sake of simplicity, it is not described in detail.
It should be understood that the various elements of the system 200 for building a physical model for a power grid knowledge graph of the present invention may be implemented in whole or in part by software, hardware, firmware, or a combination thereof. The units may be embedded in a processor of the computer device in a hardware or firmware form or independent of the processor, or may be stored in a memory of the computer device in a software form for being called by the processor to execute operations of the units. Each of the units may be implemented as a separate component or module, or two or more units may be implemented as a single component or module.
It will be appreciated by those of ordinary skill in the art that the schematic diagram of the system 200 shown in fig. 4 is merely an illustrative block diagram of portions of structure associated with aspects of the present invention and does not constitute a limitation of the computer device, processor or computer program embodying aspects of the present invention. A particular computer device, processor or computer program may include more or fewer components or modules than shown in the figures, or may combine or split certain components or modules, or may have a different arrangement of components or modules.
In the present invention, a library of table schemas of table objects and fields thereof is provided, from which a table schema for defining the table objects and fields thereof is determined.
In the present invention, a library of relational patterns representing relationships between table objects is provided, from which relational patterns defining relationships between source table objects and target table objects are determined.
In the invention, an alias set library of the relationship among table objects, fields thereof and the table objects is arranged, the alias set library comprises aliases recorded in the past and the occurrence frequency thereof, the relationship among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source is recorded in the alias library, and the occurrence frequency is accumulated; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
In a preferred embodiment, for the table object alias set library, the alias set library of its fields and the alias set library of the relationship between table objects, a tag is set for each record to distinguish between different acquisitions. In this way, alias libraries from different sources, such as different departments, may be merged, and if two records have the same label, they are considered to be from the same collection, not accumulated. The tags include, for example, date, time, random sequence. The date is in an 8-bit pattern such as 20201030 with the time accurate to minutes or seconds such as 1830 or 183025 and the random sequence is a random number of 6-10 bits for verification. Name transitions of table objects, fields thereof and relationships between table objects can be tracked by recording acquisition dates, generally showing the most popular and most massively used names, having a normative effect on uniform names.
As a third aspect of the invention, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of the first aspect of the invention. In one embodiment, the computer program is distributed across a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation, or perform two or more method steps/operations.
It will be appreciated by those skilled in the art that all or part of the steps of the method for establishing a physical model for a power grid knowledge graph of the present invention may be directed to associated hardware, such as a computer device or a processor, by a computer program, which may be stored in a non-transitory computer readable storage medium, which when executed performs the steps of the ancillary method of the present invention. Any reference herein to memory, storage, databases, or other media may include non-volatile and/or volatile memory, as appropriate. Examples of non-volatile memory include read-only memory (ROM), programmable ROM (prom), electrically programmable ROM (eprom), electrically erasable programmable ROM (eeprom), flash memory, magnetic tape, floppy disk, magneto-optical data storage device, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.
The respective technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the present specification as long as there is no contradiction between such combinations.
While the present invention has been described in connection with the embodiments, it is to be understood by those skilled in the art that the foregoing description and drawings are merely illustrative and not restrictive of the broad invention, and that this invention not be limited to the disclosed embodiments. Various modifications and variations are possible without departing from the spirit of the invention.

Claims (10)

1. A method for building a physical model for a power grid knowledge graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
2. The method of claim 1, the field determined according to a predefined field schema based on field-related information and field source-related information in the first data source, the field schema including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
3. The method of claim 1, the table schema comprising a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list of a table object.
4. The method of claim 1, the relationship schema comprising a table name of a source table object, a table name of a target table object, an association between a source table object and a target table object, an association field between a source table object and a target table object, a subject field, and a secondary subject field.
5. The method of claim 4, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode.
6. A system for building a physical model for a power grid knowledge graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generation unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
wherein the relationship set generation unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
wherein the processing unit is configured to:
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
7. The system of claim 6, the field determined according to a predefined field schema based on field related information and field source related information in the first data source, the field schema including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
8. The system of claim 6, the table schema comprising a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data source system, a table name and a field list of a data source system for a table object, the relationship schema comprising a table name for a source table object, a table name for a target table object, an association between a source table object and a target table object, an association field between a source table object and a target table object, a subject field and a secondary subject field.
9. The system of claim 8, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relations between the source table objects and the target table objects, and the association fields between the source table objects and the target table objects are all the same, it is determined that the relations between the pair of source table objects and the target table objects are the same as the relations between the another pair of source table objects and the target table objects, and for the same relations, only one relation is normalized and table relation information of the corresponding source table objects and target table objects is generated according to the relation mode.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 5.
CN202011197189.XA 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph Active CN112363996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011197189.XA CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011197189.XA CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Publications (2)

Publication Number Publication Date
CN112363996A true CN112363996A (en) 2021-02-12
CN112363996B CN112363996B (en) 2023-10-24

Family

ID=74512400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011197189.XA Active CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Country Status (1)

Country Link
CN (1) CN112363996B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535739A (en) * 2021-09-16 2021-10-22 国网浙江省电力有限公司信息通信分公司 Data market layer table establishing method based on power grid energy data
CN114168608A (en) * 2021-12-16 2022-03-11 中科雨辰科技有限公司 Data processing system for updating knowledge graph

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019825A (en) * 2017-07-25 2019-07-16 华为技术有限公司 A kind of method and device for analyzing data semantic
US20190354544A1 (en) * 2011-02-22 2019-11-21 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
CN111159365A (en) * 2019-11-26 2020-05-15 国网湖南省电力有限公司 Method, system and storage medium for implementing intelligent question-answering system of scheduling model body

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354544A1 (en) * 2011-02-22 2019-11-21 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and search engines
CN110019825A (en) * 2017-07-25 2019-07-16 华为技术有限公司 A kind of method and device for analyzing data semantic
CN111159365A (en) * 2019-11-26 2020-05-15 国网湖南省电力有限公司 Method, system and storage medium for implementing intelligent question-answering system of scheduling model body

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘俊楠;刘海砚;陈晓慧;郭漩;郭文月;朱新铭;赵清波;: "面向多源地理空间数据的知识图谱构建", 地球信息科学学报, no. 07 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535739A (en) * 2021-09-16 2021-10-22 国网浙江省电力有限公司信息通信分公司 Data market layer table establishing method based on power grid energy data
CN113535739B (en) * 2021-09-16 2021-12-07 国网浙江省电力有限公司信息通信分公司 Data market layer table establishing method based on power grid energy data
CN114168608A (en) * 2021-12-16 2022-03-11 中科雨辰科技有限公司 Data processing system for updating knowledge graph

Also Published As

Publication number Publication date
CN112363996B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN111708773B (en) Multi-source scientific and creative resource data fusion method
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN109726393B (en) Policy analysis system and method based on natural language processing technology
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN111191125A (en) Data analysis method based on tagging
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN113204603B (en) Category labeling method and device for financial data assets
CN108241867B (en) Classification method and device
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN111192176A (en) Online data acquisition method and device supporting education informatization assessment
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN106980639B (en) Short text data aggregation system and method
CN111190880A (en) Database detection method and device and computer readable storage medium
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
Paraschiv et al. A unified graph-based approach to disinformation detection using contextual and semantic relations
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN111259975B (en) Method and device for generating classifier and method and device for classifying text
CN112907358A (en) Loan user credit scoring method, loan user credit scoring device, computer equipment and storage medium
CN112214615A (en) Policy document processing method and device based on knowledge graph and storage medium
CN114547346B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN112364177B (en) Method, system and medium for establishing logic model of power grid knowledge graph
CN111026705B (en) Building engineering file management method, system and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant