CN112363996B - Method, system and medium for establishing physical model of power grid knowledge graph - Google Patents

Method, system and medium for establishing physical model of power grid knowledge graph Download PDF

Info

Publication number
CN112363996B
CN112363996B CN202011197189.XA CN202011197189A CN112363996B CN 112363996 B CN112363996 B CN 112363996B CN 202011197189 A CN202011197189 A CN 202011197189A CN 112363996 B CN112363996 B CN 112363996B
Authority
CN
China
Prior art keywords
source
field
relationship
data source
table object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011197189.XA
Other languages
Chinese (zh)
Other versions
CN112363996A (en
Inventor
沈亮
杨帅
朱广新
廖小琦
王春梅
宜东海
吴桂栋
吴一
郝保聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN202011197189.XA priority Critical patent/CN112363996B/en
Publication of CN112363996A publication Critical patent/CN112363996A/en
Application granted granted Critical
Publication of CN112363996B publication Critical patent/CN112363996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system and a medium for establishing a physical model for a power grid knowledge graph. The method comprises the following steps: determining a table schema for defining table objects and fields thereof; generating table information of all table objects according to a table mode based on a first data source to generate a physical table set of a physical model; determining a relationship schema for defining a relationship between a source table object and a target table object; generating corresponding table relation information according to a relation mode based on the second data source for each pair of source table objects and target table objects subjected to the de-duplication processing in the second data source so as to generate a relation set of a physical model; a physical model is established that includes table objects, fields, and relationships based on the set of physical tables and the set of relationships. By utilizing the scheme of the invention, knowledge extraction can be carried out on different data sources, and leakage detection and repair are carried out on the existing model to make up for the designed short plates of the existing model, so that a more reasonable management and control model is provided for users, and information matching of a unified data model is supported.

Description

Method, system and medium for establishing physical model of power grid knowledge graph
Technical Field
The present invention relates to knowledge graph technology, and more particularly, to a method for establishing a physical model for a power grid knowledge graph, and a corresponding system and computer-readable storage medium.
Background
With the further development of knowledge graph technology, the knowledge graph lays a foundation for large-scale knowledge base organization and intelligent application by the strong semantic processing capability and knowledge organization capability. The knowledge graph is composed of a large number of entities and entity associations. Entities such as landmarks, names of people, cities, sports teams, buildings, geographic features, movies, celestial bodies, art works and the like can be retrieved through the knowledge graph, and information related to the entities can be acquired. This is the key to building intelligent applications that incorporate the collective intelligence of the network and can be more like a person to understand the world. In specific application occasions, the construction of the domain knowledge graph based on the ontology library of the specific domain is required, and the intelligent information retrieval and the intelligent application construction of the domain oriented to the specific domain are supported. The knowledge graph construction for the specific field not only needs general knowledge, but also focuses on combining the field expertise. The construction of the domain knowledge graph needs to support the application of actual engineering, and has higher requirements on relevant indexes such as recognition rate, accuracy and the like compared with the construction of the general knowledge graph. In order to meet the requirements of large-scale knowledge base and intelligent application construction for the fields, information extraction technology for researching and adapting to the characteristics of the fields and construction method of the field knowledge graph are required.
In recent years, a large number of Chinese-based knowledge maps are introduced in China, which are mainly constructed based on structural information of hundred-degree encyclopedia and wikipedia, and are aimed at maintaining the Schema standard of the open domain knowledge maps by using community force. The knowledge graph is constructed by manual editing and automatic extraction, but the automatic extraction method mainly ignores unstructured text based on structured information in online encyclopedia, and most of information in the Internet is presented in an unstructured free text form. In the same period of the development of link data, many knowledge acquisition methods based on information extraction technology are proposed to construct open domain knowledge graphs based on free text. In 2007, washington university Banko et al first proposed open domain information extraction (OIE) to directly extract entity relationship triples, i.e., head entity, relationship indicator, and tail entity, directly from large-scale free text. Before OIE, a number of free text oriented information extraction has been proposed, but the main idea of these approaches is to train a corresponding extractor for each target relationship. Such conventional information extraction methods cannot work efficiently in the face of massive relation categories in internet texts, i.e. training the extractor for each target relation is impractical, and more serious, in many cases, we cannot define the type of relation in advance for massive network texts.
In addition, the current knowledge resource classification, intelligent search and cross-domain knowledge fusion and representation based on enterprise-level data models are still in a starting stage, and lack of visual popular model interfaces for related management staff and business staff, and meanwhile, the logic link searching capability and the static semantic analysis evaluation capability of the data models are severely limited. Data models such as national grid company public data model (SG-CIM) are not only huge in quantity as a comprehensive abstraction of company enterprise level power grid, assets, finances and the like, but also involve a great deal of specialized categories, so that the following problems still exist in terms of model achievement, application and support: (1) The model design quality still needs to be perfect, namely, in the current model design result, the actual problems of inconsistent abstract degree of partial data objects, inaccurate entity relation, incomplete data objects and attributes, incomplete duplicate removal, incomplete data tracing, non-correspondence between standard codes and source end service system codes and the like still exist; (2) The mapping rate of the model is not high, namely, each unit carries out mapping comparison based on physical models of different versions, so that the average mapping rate is lower; (3) The existing data model management and control mostly adopts an offline mode, the flow is complex, the communication efficiency is low, model design results are abstract, the model is difficult to understand by all levels of personnel, the application capability is insufficient, and the quality of model application and iteration perfection cannot be guaranteed.
Accordingly, there is a need to provide an improved solution to overcome the drawbacks of the existing data models.
Disclosure of Invention
The invention aims to provide a scheme for solving the technical problems.
Specifically, according to a first aspect of the present invention, there is provided a method for building a physical model for a power grid knowledge graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, so as to obtain a table information set of all table objects included by the first data source, and generating a physical table set of the physical model comprising the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of a relationship between a source table object and a target table object, the second data source comprising a plurality of pairs of source table objects and target table objects, for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship pattern based on the second data source, thereby obtaining a table relationship information set of all relationships comprised by the second data source, to generate a relationship set of the physical model comprising the table relationship information set;
Based on the set of physical tables of the physical model and the set of relationships of the physical model, a physical model is established that includes table objects, fields, and relationships.
In one embodiment, the fields are determined in a predefined field pattern based on field related information and field source related information in the first data source, the field pattern including a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system.
In one embodiment, the table schema includes a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list.
In one embodiment, the relationship schema includes a table name of a source table object, a table name of a target table object, an association relationship between the source table object and the target table object, an association field between the source table object and the target table object, a topic field, and a secondary topic field.
In one embodiment, generating table relationship information for a pair of source and target table objects in the relationship schema based on the second data source includes: and for a pair of the source table object and the target table object and another pair of the source table object and the target table object in the second data source, if the table names of the respective source table objects, the table names of the target table object, the association relation between the source table object and the target table object and the association fields between the source table object and the target table object are the same, judging that the relation between the pair of the source table object and the target table object is the same as the relation between the other pair of the source table object and the target table object, and for the same relation, carrying out normalization processing on only one relation and generating the table relation information of the corresponding source table object and the target table object according to the relation mode.
In one embodiment, a library of table patterns for table objects and their fields is provided, from which table patterns are determined for defining the table objects and their fields.
In one embodiment, a library of relationship patterns representing relationships between table objects is provided, from which a relationship pattern for defining relationships between source and target table objects is determined.
In one embodiment, an alias set base is provided with a table object, a field thereof and a relation among the table objects, wherein the alias set base comprises aliases recorded in the past and occurrence frequencies thereof, the relation among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source is recorded in the alias base, and the occurrence frequencies are accumulated; the displayed table object, the fields and the table object are the table object with the largest occurrence frequency, the fields and the table object.
According to a second aspect of the present invention, there is provided a system for building a physical model for a grid knowledge graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generating unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, so as to obtain a table information set of all table objects included by the first data source, and generating a physical table set of the physical model comprising the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
Wherein the relation set generating unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of a relationship between a source table object and a target table object, the second data source comprising a plurality of pairs of source table objects and target table objects, for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship pattern based on the second data source, thereby obtaining a table relationship information set of all relationships comprised by the second data source, to generate a relationship set of the physical model comprising the table relationship information set;
wherein the processing unit is configured to:
based on the set of physical tables of the physical model and the set of relationships of the physical model, a physical model is established that includes table objects, fields, and relationships.
In one embodiment, the fields are determined in a predefined field pattern based on field related information and field source related information in the first data source, the field pattern including a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system.
In one embodiment, the table schema includes a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list.
In one embodiment, the relationship schema includes a table name of a source table object, a table name of a target table object, an association relationship between the source table object and the target table object, an association field between the source table object and the target table object, a topic field, and a secondary topic field.
In one embodiment, generating table relationship information for a pair of source and target table objects in the relationship schema based on the second data source includes: and judging that the relationship between the pair of the source table object and the target table object is the same as the relationship between the other pair of the source table object and the target table object, and for the same relationship, carrying out normalization processing on only one relationship and generating table relationship information of the corresponding source table object and the target table object according to the relationship mode if the table names of the respective source table objects, the table names of the target table object, the association relationship between the source table object and the target table object and the association fields between the source table object and the target table object are the same.
In one embodiment, a library of table patterns for table objects and their fields is provided, from which table patterns are determined for defining the table objects and their fields.
In one embodiment, a library of relationship patterns representing relationships between table objects is provided, from which a relationship pattern for defining relationships between source and target table objects is determined.
In one embodiment, an alias set base is provided with a table object, a field thereof and a relation among the table objects, wherein the alias set base comprises aliases recorded in the past and occurrence frequencies thereof, the relation among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source is recorded in the alias base, and the occurrence frequencies are accumulated; the displayed table object, the fields and the table object are the table object with the largest occurrence frequency, the fields and the table object.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the above-described method for building a physical model for a grid knowledge graph to be performed.
According to the scheme of the invention, the data of the table objects, the fields and the relations are acquired from a plurality of data sources, normalized and a unified and complete data model for the power grid knowledge graph is established according to a predefined table mode, a predefined field mode and a predefined relation mode. By using the method and the system, knowledge extraction can be performed on different data sources, the existing model is subjected to defect detection and leakage repair, the existing model is made up for designing the short plates, meanwhile, a more reasonable management and control model can be provided for management and business personnel, and information matching and sharing of a company unified data model are supported. In addition, the invention can further promote model standard implementation and construct a complete system based on the existing data model, lays a solid foundation for further promoting data quality management, simultaneously supports the construction of the data center and the business center, and obtains direct or indirect benefits in practical application.
Drawings
Non-limiting and non-exhaustive embodiments of the present invention are described by way of example with reference to the following drawings, wherein:
FIG. 1 is a flow chart schematically illustrating a method for building a physical model for a grid knowledge graph, in accordance with one embodiment of the present invention;
FIG. 2 is a flow diagram schematically illustrating the creation of a physical table set of a physical model according to one embodiment of the invention;
FIG. 3 is a flow diagram schematically illustrating the creation of a relationship set for a physical model according to one embodiment of the invention; and
fig. 4 is a schematic diagram illustrating a system for building a physical model for a grid knowledge graph, in accordance with an embodiment of the invention.
Detailed Description
To further clarify the above and other features and advantages of the present invention, a further description of the invention will be rendered by reference to the appended drawings. It should be understood that the specific embodiments presented herein are for purposes of explanation to those skilled in the art and are intended to be illustrative only and not limiting.
As a first aspect of the invention, a method for building a physical model for a power grid knowledge graph is provided. Fig. 1 schematically shows a method S100 for building a physical model for a grid knowledge graph, according to one embodiment of the invention. As shown in fig. 1, S100 may include step S101, step S102, step S103, step S104, step S105, and step S106.
In step S101, a table schema for defining a table object and its fields is determined. The table schema may also be referred to herein as a table definition, which is used to define constituent members of a table object, for example, may include various suitable constituent members for distinguishing one table object from other table objects.
In one embodiment, the table schema may include a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list. The table names may include at least one of table english names and table chinese names of the table objects. For example, the table schema may be determined in json format as follows:
{
'name' [ Table name (English), table name (Chinese) ],
the 'area' subject field,
a secondary theme domain,
the type of the 'type' table type,
the 'description' table description,
'division' responsibility department,
the 'source system' data source system name,
'Source table' [ data Source System Table name (English), data Source System Table name (Chinese) ],
'more' notes that,
'fields': field list
}
In one embodiment, the list of fields in the table schema is determined in accordance with a predefined field schema based on the field related information and the field source related information in the first data source, the field schema may include a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system. The field schema may also be referred to herein as a field definition, which is used to define constituent members of a field. For example, the field patterns may be determined in json format as follows:
{
'name': field (english), field (chinese) ],
the 'datatype' field data type,
the 'description' field description,
standard code,
'storage format' data storage format,
'hash column' Ha Xilie,
'division' responsibility department,
' Source System [ data Source System English name, data Source System Chinese name ],
'Source table' [ data Source System Table name (English), data Source System Table name (Chinese) ],
'source field': [ data Source System field (English), data Source System field (Chinese) ],
the 'source datatype' data source system field type,
'more' remarks
}
In step S102, a first data source comprising a plurality of table objects is received, the first data source comprising table object related information, field related information, table object source related information and/or field source related information. The first data source is to be understood broadly herein to encompass various possible forms of data sources, including structured, semi-structured, and unstructured forms of data sources, such as relational databases, bins, non-relational databases, document libraries, types of reports, and the like. Preferably, the first data source of the present invention comprises a data source in the form of an excel document.
In step S103, for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, where the table information indicates at least a table name, a field, a table object source and a field source of the table object. It should be understood that, for each table object, a table name may be used as the recognition standard of the table object, for example, "table english name+table chinese name" may be used as the recognition standard of the table object, "table english name" may be used as the recognition standard of the table object, or "table chinese name" may be used as the recognition standard of the table object. The set of physical tables of the physical model may be stored in a variety of suitable file forms, such as json storage file forms, as desired. In one embodiment, the json storage file form for the physical table set of the physical model is as follows:
step S103 is described in detail below in conjunction with fig. 2.
As shown in fig. 2, the excel document as the first data source includes three parts of "data table information", "data table lookup table", and "field lookup table", wherein "data table information" indicates table names and field related information of all table objects, "data table lookup table" indicates table source related information of source system of table objects, and "field lookup table" indicates field source related information of source system of fields. The source system may, for example, include a data platform for various possible power knowledge aspects. Because of the problem of inconsistent case of table names in the information, a principle of insensitive case is adopted in the process of establishing a model, and English names of the tables are uniformly processed in case. The specific process is as follows: firstly, taking 'table English name + table Chinese name' as a criterion of table objects, and converging according to the 'table English name + table Chinese name' to obtain table name information of each table object; secondly, for each table object, normalizing the table name of the table object, for example, removing blank spaces, line wrapping symbols and the like; thirdly, carrying out standardization processing on field names, field data types and the like of all fields of the table object, for example, removing blank spaces, line wrapping symbols and the like, sorting according to field definition to obtain a standardized field, supplementing information of the field based on a field comparison table, for example, acquiring names of source systems of the fields, table names of the source systems, field types of the source systems and the like from the field comparison table, carrying out standardization processing on the information, and integrating the standardized information into the field information; after all the fields of the table object are processed, generating a table of the table object and field information of the table object according to table definition; then, the table and the field information thereof are supplemented based on a data table comparison table, for example, the name of a source system of a table object, the table name of the data source system and the like are obtained from the data table comparison table, normalization processing is carried out on the information, and then the normalization processing information is integrated into the information of the table object; finally, all the information of the integrated table object is organized into corresponding table information, and the table information can be represented by a json string. Repeating the steps until all the table objects are processed, thereby obtaining a physical table set containing all the table objects. The set of physical tables containing all table objects may be exported in json storage files.
In step S104, a relationship pattern for defining a relationship between the source table object and the target table object is determined. A relationship schema may also be referred to herein as a relationship definition, which is used to define constituent members of a relationship between pairs of table objects, e.g., may represent a correlation between a source table object and a target table object. The table names of the source table object and the target table object may include at least one of a corresponding table english name and a table chinese name. For example, the relationship pattern may be determined in json format as follows:
{
'entity1' [ Source Table object Table name (English), source Table object Table name (Chinese) ],
'entity2' [ target table object table name (English), target table object table name (Chinese) ],
' relation ' of incidence ' is that,
the field' associated with the field,
the 'area' subject field,
'secondary area' secondary theme zone
}
In step S105, a second data source including relationship related information of a relationship between a source table object and a target table object is received, the second data source including a plurality of pairs of source table objects and target table objects, table relationship information of the pairs of source table objects and target table objects is generated according to the relationship pattern based on the second data source for each pair of source table objects and target table objects, thereby obtaining a table relationship information set of all relationships included in the second data source to generate a relationship set of the physical model including the table relationship information set. The second data source is to be understood broadly herein to encompass various possible forms of data sources, including structured, semi-structured, and unstructured forms of data sources, such as relational databases, bins, non-relational databases, document libraries, types of reports, and the like. Preferably, the second data source of the present invention comprises a data source in the form of an excel file. The first data source and the second data source can comprise data based on each service system under the power grid and time sequence data collected on the intelligent power grid, mainly comprise company marketing data, quantitative collection data, operation and inspection data and graphic image webpage data, and can process, extract knowledge and integrate specifications for three different forms of structured, semi-structured and unstructured data. The set of relationships for the physical model may be stored in a variety of suitable file forms, such as json storage file forms, as desired. In one embodiment, the json storage file for the relationship set of the physical model is in the form of:
In one embodiment, step S105 may include: and for a pair of the source table object and the target table object and another pair of the source table object and the target table object in the second data source, if the table names of the respective source table objects, the table names of the target table object, the association relation between the source table object and the target table object and the association fields between the source table object and the target table object are the same, judging that the relation between the pair of the source table object and the target table object is the same as the relation between the other pair of the source table object and the target table object, and for the same relation, carrying out normalization processing on only one relation and generating the table relation information of the corresponding source table object and the target table object according to the relation mode. That is, the table name of the source table object, the table name of the target table object, the association relationship between the source table object and the target table object, and the association field between the source table object and the target table object are taken as identifiers for identifying one relationship.
Step S105 is described in detail below in conjunction with fig. 3.
As shown in fig. 3, an excel document as a second data source is read and data of "association relationship" therein is acquired. Because the data of the association relation has the problem of inconsistent case and case of the table names, the principle of insensitive case and case is adopted in the process of establishing a model, and the table and English names are uniformly processed in case. The specific process is as follows: the method comprises the steps of taking a table name (English) of a source table object, a table name (Chinese) of a source table object, a table name (English) of a target table object, an associated table name (Chinese) of the target table object, a corresponding association relation and a corresponding association field as discrimination criteria of each relation, aggregating the six items for each relation to obtain aggregation identifiers of a plurality of table relations, and judging the table relations with the same aggregation identifier as the same aggregation group; for the table relation of the same aggregation group, the table name, the association relation, the association field and the like of the table relation are normalized only for the first piece of data (the table relation can be de-duplicated to avoid redundancy), for example, blank spaces, line-feeding symbols, redundant bars, equal signs and the like are removed; the normalized information is organized according to a relational definition into corresponding table relational information, which may be represented by a json string. Repeating the steps until all the table relations are processed, thereby obtaining a relation set containing all the table relations. The relationship set containing all table relationships may be exported in json storage files.
In step S106, a physical model including table objects, fields, and relationships is built based on the set of physical tables of the physical model and the set of relationships of the physical model.
In one embodiment, the method of the present invention further comprises: similarity between pairs of table objects in a physical model is calculated based on the physical table set of the physical model, and the pairs of table objects having the similarity exceeding a predetermined threshold are subjected to a deduplication process to generate a physical table set of the physical model having low redundancy. The physical table set of the physical model can be matched with a corresponding logic model to realize consistency detection of the model, so that the rationality and completeness of static semantics of the existing model (for example, a national grid company enterprise public data model SG-CIM 4.0) are improved, redundancy is effectively reduced, and finally, non-spatial knowledge data which are difficult to observe are converted into a spatial map, so that cognition and understanding of related field personnel are facilitated, and an effective solution is provided for association and penetration of cross-domain entities. Meanwhile, the powerful semantic processing capability of knowing the entity, attribute and relation described by the graph recognition technology can be well embodied.
As a second aspect of the present invention, a system for building a physical model for a grid knowledge graph is provided. Fig. 4 schematically illustrates a system 200 for building a physical model for a grid knowledge graph, in accordance with an embodiment of the invention. The system 200 may include a physical table set generation unit 201, a relation set generation unit 202, and a processing unit 203. The processing unit 203 is communicatively coupled with the physical table set generation unit 201 and the relation set generation unit 202.
The physical table set generating unit 201 may be configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, so as to obtain a table information set of all the table objects included by the first data source, and generating a physical table set of the physical model comprising the table information set, wherein the table information at least indicates the table name, the field, the table object source and the field source of the table object.
The relation set generating unit 202 may be configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of a relationship between a source table object and a target table object, the second data source comprising a plurality of pairs of source table objects and target table objects, for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship pattern based on the second data source, thereby obtaining a table relationship information set of all relationships comprised by the second data source, to generate a relationship set of the physical model comprising the table relationship information set.
In one embodiment, for a pair of source table object and target table object and another pair of source table object and target table object in the second data source, if the table names of their respective source table objects, the table names of the target table object, the association relationship between the source table object and the target table object, and the association fields between the source table object and the target table object are all the same, it is determined that the relationship between the pair of source table object and the target table object is the same as the relationship between the other pair of source table object and the target table object, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table object and the target table object is generated according to the relationship mode.
The processing unit 203 may be configured to: based on the set of physical tables of the physical model and the set of relationships of the physical model, a physical model is established that includes table objects, fields, and relationships.
In one embodiment, the fields are determined in a predefined field pattern based on field related information and field source related information in the first data source, the field pattern including a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system.
In one embodiment, the table schema includes a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list.
In one embodiment, the relationship schema includes a table name of a source table object, a table name of a target table object, an association relationship between the source table object and the target table object, an association field between the source table object and the target table object, a topic field, and a secondary topic field.
It will be appreciated that the specific features described herein above in relation to the method for building a physical model for a grid knowledge graph of the first aspect may similarly be applied to the system for building a physical model for a grid knowledge graph of the second aspect as well for similar extensions. For the sake of simplicity, it is not described in detail.
It should be appreciated that the various elements of the system 200 for building a physical model for a grid knowledge graph of the present invention may be implemented in whole or in part by software, hardware, firmware, or a combination thereof. The units may each be embedded in the processor of the computer device in hardware or firmware or separate from the processor, or may be stored in the memory of the computer device in software for the processor to call to perform the operations of the units. Each of the units may be implemented as a separate component or module, or two or more units may be implemented as a single component or module.
It will be appreciated by those of ordinary skill in the art that the schematic diagram of the system 200 shown in FIG. 4 is merely an exemplary illustrative block diagram of some of the structures associated with aspects of the present invention and is not intended to limit the computer device, processor or computer program embodying aspects of the present invention. A particular computer device, processor, or computer program may include more or fewer components or modules than those shown in the figures, or may combine or split certain components or modules, or may have a different arrangement of components or modules.
In the present invention, a library of table patterns of table objects and fields thereof is provided, and a table pattern for defining the table objects and fields thereof is determined from the library of table patterns.
In the present invention, a library of relationship patterns representing relationships between table objects is provided, and a relationship pattern for defining a relationship between a source table object and a target table object is determined from the library of relationship patterns.
The invention provides an alias collection library with a relationship among table objects, fields and table objects, wherein the alias collection library comprises aliases recorded in the past and occurrence frequencies thereof, the relationship among the table objects, the fields and the table objects appearing in the first data source and the second data source is recorded in the alias library, and the occurrence frequencies are accumulated; the displayed table object, the fields and the table object are the table object with the largest occurrence frequency, the fields and the table object.
In a preferred embodiment, for the table object alias set library, the alias set library of its fields and the alias set library of the relation between table objects, a label is provided for each record for distinguishing between different acquisitions. In this way, alias libraries from different sources, e.g., different departments, can be consolidated and if two records have the same label, they are considered to be from the same collection, without accumulating computations. The tag comprises for example a date, a time, a random sequence. The date is in an 8-bit number pattern, such as 20201030, and the time is accurate to minutes or seconds, such as 1830 or 183025, and the random sequence is, for example, a 6-10 bit random number for verification. Name transitions of the table object, its fields and the relationship between table objects can be tracked by recording the acquisition date, which generally shows the most popular, most used names, with a canonical effect on the unified names.
As a third aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method of the first aspect of the present invention. In one embodiment, the computer program is distributed over a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor, or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation or two or more method steps/operations.
It will be appreciated by those of ordinary skill in the art that all or part of the steps of the method for building a physical model for a grid knowledge graph of the present invention may be accomplished by a computer program, such as a computer device or processor, which may be stored in a non-transitory computer readable storage medium, which when executed performs the steps of the auxiliary method of the present invention. Any reference herein to memory, storage, database, or other medium may include non-volatile and/or volatile memory, as the case may be. Examples of nonvolatile memory include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic tape, floppy disk, magneto-optical data storage, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.
The technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the description provided that such combinations are not inconsistent.
While the invention has been described in conjunction with embodiments, it will be understood by those skilled in the art that the foregoing description and drawings are illustrative only and that the invention is not limited to the disclosed embodiments. Various modifications and variations are possible without departing from the spirit of the invention.

Claims (10)

1. A method for building a physical model for a power grid knowledge graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, so as to obtain a table information set of all table objects included by the first data source, and generating a physical table set of the physical model comprising the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
Receiving a second data source comprising relationship related information of a relationship between a source table object and a target table object, the second data source comprising a plurality of pairs of source table objects and target table objects, for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship pattern based on the second data source, thereby obtaining a table relationship information set of all relationships comprised by the second data source, to generate a relationship set of the physical model comprising the table relationship information set;
based on the set of physical tables of the physical model and the set of relationships of the physical model, a physical model is established that includes table objects, fields, and relationships.
2. The method of claim 1, the field being determined in a predefined field pattern based on field related information and field source related information in the first data source, the field pattern comprising a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system.
3. The method of claim 1, the table schema comprising a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list.
4. The method of claim 1, the relationship schema comprising a table name of a source table object, a table name of a target table object, an association relationship between a source table object and a target table object, an association field between a source table object and a target table object, a topic field, and a secondary topic field.
5. The method of claim 4, generating table relationship information for a pair of source and target table objects in the relationship schema based on the second data source comprises: and for a pair of the source table object and the target table object and another pair of the source table object and the target table object in the second data source, if the table names of the respective source table objects, the table names of the target table object, the association relation between the source table object and the target table object and the association fields between the source table object and the target table object are the same, judging that the relation between the pair of the source table object and the target table object is the same as the relation between the other pair of the source table object and the target table object, and for the same relation, carrying out normalization processing on only one relation and generating the table relation information of the corresponding source table object and the target table object according to the relation mode.
6. A system for building a physical model for a grid knowledge graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generating unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, so as to obtain a table information set of all table objects included by the first data source, and generating a physical table set of the physical model comprising the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
wherein the relation set generating unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of a relationship between a source table object and a target table object, the second data source comprising a plurality of pairs of source table objects and target table objects, for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship pattern based on the second data source, thereby obtaining a table relationship information set of all relationships comprised by the second data source, to generate a relationship set of the physical model comprising the table relationship information set;
Wherein the processing unit is configured to:
based on the set of physical tables of the physical model and the set of relationships of the physical model, a physical model is established that includes table objects, fields, and relationships.
7. The system of claim 6, the fields being determined in a predefined field pattern based on field related information and field source related information in the first data source, the field pattern comprising a field name of a field, a field data type, a field description, a standard code, a data storage format, ha Xilie, a responsibility department, a name of a data source system, a table name of a data source system, a field name of a data source system, and a field type of a data source system.
8. The system of claim 6, the table schema comprising a table name of a table object, a topic field, a secondary topic field, a table type, a table description, a responsibility department, a name of a data source system, a table name of a data source system, and a field list, the relationship schema comprising a table name of a source table object, a table name of a target table object, an association between a source table object and a target table object, an association field between a source table object and a target table object, a topic field, and a secondary topic field.
9. The system of claim 8, generating table relationship information for a pair of source and target table objects in the relationship schema based on the second data source comprises: and judging that the relationship between the pair of the source table object and the target table object is the same as the relationship between the other pair of the source table object and the target table object, and for the same relationship, carrying out normalization processing on only one relationship and generating table relationship information of the corresponding source table object and the target table object according to the relationship mode if the table names of the respective source table objects, the table names of the target table object, the association relationship between the source table object and the target table object and the association fields between the source table object and the target table object are the same.
10. A computer readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the method of any of claims 1 to 5.
CN202011197189.XA 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph Active CN112363996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011197189.XA CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011197189.XA CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Publications (2)

Publication Number Publication Date
CN112363996A CN112363996A (en) 2021-02-12
CN112363996B true CN112363996B (en) 2023-10-24

Family

ID=74512400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011197189.XA Active CN112363996B (en) 2020-10-30 2020-10-30 Method, system and medium for establishing physical model of power grid knowledge graph

Country Status (1)

Country Link
CN (1) CN112363996B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535739B (en) * 2021-09-16 2021-12-07 国网浙江省电力有限公司信息通信分公司 Data market layer table establishing method based on power grid energy data
CN114168608B (en) * 2021-12-16 2022-07-15 中科雨辰科技有限公司 Data processing system for updating knowledge graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019825A (en) * 2017-07-25 2019-07-16 华为技术有限公司 A kind of method and device for analyzing data semantic
CN111159365A (en) * 2019-11-26 2020-05-15 国网湖南省电力有限公司 Method, system and storage medium for implementing intelligent question-answering system of scheduling model body

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222052B2 (en) * 2011-02-22 2022-01-11 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019825A (en) * 2017-07-25 2019-07-16 华为技术有限公司 A kind of method and device for analyzing data semantic
CN111159365A (en) * 2019-11-26 2020-05-15 国网湖南省电力有限公司 Method, system and storage medium for implementing intelligent question-answering system of scheduling model body

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向多源地理空间数据的知识图谱构建;刘俊楠;刘海砚;陈晓慧;郭漩;郭文月;朱新铭;赵清波;;地球信息科学学报(07);全文 *

Also Published As

Publication number Publication date
CN112363996A (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN109446343B (en) Public safety knowledge graph construction method
McKenzie et al. Weighted multi-attribute matching of user-generated points of interest
CN104239513B (en) A kind of semantic retrieving method of domain-oriented data
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN112000773B (en) Search engine technology-based data association relation mining method and application
CN111522901B (en) Method and device for processing address information in text
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN105760439A (en) Figure cooccurrence relation graph establishing method based on specific behavior cooccurrence network
CN112559747B (en) Event classification processing method, device, electronic equipment and storage medium
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN110019542B (en) Generation of enterprise relationship, generation of organization member database and identification of same name member
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
Christen et al. A probabilistic geocoding system based on a national address file
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN112149422A (en) Enterprise news dynamic monitoring method based on natural language
CN112199488B (en) Incremental knowledge graph entity extraction method and system for power customer service question and answer
CN111008285B (en) Author disambiguation method based on thesis key attribute network
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN116881395A (en) Public opinion information detection method and device
CN112364177B (en) Method, system and medium for establishing logic model of power grid knowledge graph
Christen et al. A probabilistic geocoding system utilising a parcel based address file
CN114706948A (en) News processing method and device, storage medium and electronic equipment
CN113220843A (en) Method, device, storage medium and equipment for determining information association relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant