CN116383335A - Integration method and system for multi-source heterogeneous power data set - Google Patents

Integration method and system for multi-source heterogeneous power data set Download PDF

Info

Publication number
CN116383335A
CN116383335A CN202310663877.8A CN202310663877A CN116383335A CN 116383335 A CN116383335 A CN 116383335A CN 202310663877 A CN202310663877 A CN 202310663877A CN 116383335 A CN116383335 A CN 116383335A
Authority
CN
China
Prior art keywords
ontology
local
similarity
metadata
power domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310663877.8A
Other languages
Chinese (zh)
Inventor
粟海斌
刘珺
詹柱
刘斌
欧阳宏剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fangxin Technology Co ltd
Original Assignee
Fangxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fangxin Technology Co ltd filed Critical Fangxin Technology Co ltd
Priority to CN202310663877.8A priority Critical patent/CN116383335A/en
Publication of CN116383335A publication Critical patent/CN116383335A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Water Supply & Treatment (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an integration method and system for a multisource heterogeneous power data set, which are implemented by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The invention avoids the storage and transmission of a large amount of data and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.

Description

Integration method and system for multi-source heterogeneous power data set
Technical Field
The invention relates to the technical field of power data management and control, and particularly discloses an integration method and system for a multi-source heterogeneous power data set.
Background
The ontology is initially a concept in the philosophy domain, and an objective description is made of the real world existence. The scholars in the field of artificial intelligence, neches et al, define ontologies (ontologies), namely: ontologies are basic terms and relationships that constitute related-art vocabularies, and definitions of rules that specify the extension of these vocabularies that are constructed using these terms and relationships: an ontology is a conceptual specification. Based on the characteristic that the ontology specification describes the concept, an ontology construction rule is utilized to establish the electric power field ontology, and from the establishment of the local ontology, the local ontology is integrated by utilizing an ontology mapping rule, so that the electric power field ontology is constructed, and a basis is provided for metadata integration of heterogeneous environments.
Metadata is data about data with which the distribution of the data in a database can be recorded. With the rapid development of network technology. Metadata has evolved from the original data description and indexing approach to one of the tools and methods necessary for data presentation, data conversion, data management, and data usage throughout the information transfer process. In the power multi-source heterogeneous data environment, the data format, content, quality and the like are greatly different. The metadata is used for carrying out uniform logic representation on heterogeneous data sources, so that the heterogeneous problem of each data source is solved, a uniform infrastructure is provided for data integration, and the unified infrastructure is used for describing integrated data and data sources.
Data integration is a process of logically or physically concentrating data in several scattered data sources into one data set, with the goal of achieving data sharing and information exchange, and the core task is to integrate the distributed heterogeneous data sources associated with each other. In order to better solve the problems of isomerism and the like, the metadata is integrated to form global metadata, so that the demand data of a user is ensured to be found more quickly, and meanwhile, the quality and the applicability of the data integration are improved.
At present, the informatization construction of the power system in China is gradually developed from various application development stages centering on a digital power system to a system integration stage centering on an intelligent power system. At present, various application systems are developed for each power system, and information technology is utilized to realize collection, processing and application of information such as power generation, power transmission, charging, office work and the like, but due to different business and division work of each department, each application system is isolated from each other, data cannot be communicated, the problems of information island and data redundancy are generated, and the value of the data cannot be fully developed and utilized.
In recent years, a plurality of scholars have achieved a relatively great scientific research result in the aspect of heterogeneous data integration, but the current working difficulty is that the storage cost and the transmission cost of multi-source heterogeneous data integration are relatively high, and the network bandwidth is required to be very high.
Therefore, the existing multi-source heterogeneous data integration method has higher storage cost and transmission cost, has high requirements on network bandwidth, and is a technical problem to be solved urgently at present.
Disclosure of Invention
The invention provides an integration method and system for a multi-source heterogeneous power data set, and aims to solve the technical problems of high storage cost and transmission cost and high requirement on network bandwidth in the existing multi-source heterogeneous data integration method.
One aspect of the invention relates to an integration method for a multi-source heterogeneous power data set, comprising the following steps:
a local body construction step, namely constructing a local body;
a power domain ontology construction step of generating a power domain ontology on the basis of the local ontology;
a local metadata extraction step, namely extracting local metadata based on a local body, and forming a mapping relation between the local body and the local metadata;
and a metadata integration step, namely integrating the local metadata into global metadata under the guidance of the power domain ontology.
Further, the local ontology construction step includes:
comprehensively analyzing the local data sources to obtain the mode of the database;
and establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
Further, the electric power domain ontology construction step includes:
on the basis of obtaining the mapping relation between the local ontology, finding out similar concepts in each local ontology;
abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the power domain ontology construction step, a current ontology mapping algorithm is adopted to establish an ontology mapping.
Further, the current ontology mapping algorithm includes:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure SMS_1
wherein (1)>
Figure SMS_2
Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
Figure SMS_3
wherein,,
Figure SMS_4
for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure SMS_5
wherein,,
Figure SMS_6
is comprehensive similarity, ->
Figure SMS_7
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Another aspect of the invention relates to an integrated system for a multi-source heterogeneous power data set, comprising:
the local body construction module is used for constructing a local body;
the power domain ontology construction module is used for generating a power domain ontology on the basis of the local ontology;
the local metadata extraction module is used for extracting the local metadata based on the local ontology and forming a mapping relation between the local ontology and the local metadata;
and the metadata integration module is used for integrating the local metadata into global metadata under the guidance of the electric power field ontology.
Further, the local ontology construction module includes:
the analysis unit is used for comprehensively analyzing the local data sources to obtain the mode of the database;
the establishing unit is used for establishing an ER model of the database, obtaining a local ontology based on the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the relation between the entities is established in the ER model.
Further, the power domain ontology construction module includes:
the searching unit is used for finding similar concepts in each local ontology on the basis of obtaining the mapping relation among the local ontologies;
an abstract unit for abstracting the class of the same concept into one class in the electric power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the local metadata extraction module, a current ontology mapping algorithm is used to establish an ontology mapping.
Further, the current ontology mapping algorithm includes:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure SMS_8
wherein,,
Figure SMS_9
structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2; calculating semantic similarity, wherein the calculated semantic similarity is as follows: />
Figure SMS_10
Wherein (1)>
Figure SMS_11
For semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure SMS_12
wherein,,
Figure SMS_13
is comprehensive similarity, ->
Figure SMS_14
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
The beneficial effects obtained by the invention are as follows:
the invention provides an integration method and system for a multisource heterogeneous power data set, which are implemented by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integration method and the system for the multi-source heterogeneous power data set, provided by the invention, avoid the storage and transmission of a large amount of data and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
Drawings
FIG. 1 is a flow chart of an embodiment of an integration method for multi-source heterogeneous power data set according to the present invention;
FIG. 2 is a flow chart of an embodiment of the local ontology construction step shown in FIG. 1;
FIG. 3 is a diagram of the relationship between a local data source and a local ontology in the present invention;
FIG. 4 is a flowchart illustrating an embodiment of the power domain ontology construction step shown in FIG. 1;
FIG. 5 is a diagram illustrating a metadata structure according to the present invention;
FIG. 6 is a functional block diagram of one embodiment of an integrated system for multi-source heterogeneous power data sets provided by the present invention;
FIG. 7 is a functional block diagram of an embodiment of the local ontology-building module shown in FIG. 6;
FIG. 8 is a functional block diagram of an embodiment of the power domain ontology-building module shown in FIG. 6.
Reference numerals illustrate:
10. a local body construction module; 20. the power domain body building module; 30. a local metadata extraction module; 40. a metadata integration module; 11. an analysis unit; 12. a building unit; 21. a search unit; 22. an abstract unit.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1 and 2, a first embodiment of the present invention proposes an integration method for multi-source heterogeneous power data sets, including the following steps:
step S100, a local body construction step is performed, and a local body is constructed.
Analyzing a data source and constructing a local ontology; before constructing the local ontology, comprehensively analyzing the data source to obtain a mode of a database, thereby establishing an ER (Entity Relationship Diagram, entity contact diagram) model of the database, and obtaining the local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure. The logical structure of the database may be represented by an ER (Entity Relationship Diagram, entity association) diagram. The data model is a knowledge of the real world, which is made up of a set of basic objects called entities and the links of these objects. An entity is a "thing" or an "object" that is distinguishable from other objects in the real world, described by a collection of attributes. A contact is an interrelation between entities. The set of all entities of the same type is referred to as an entity set and the set of all contacts of the same type is referred to as a contact set. By analyzing the data patterns and ontology concepts, a relationship between the ER model and the local ontology can be obtained, and FIG. 3 illustrates a relationship between the local data sources and the local ontology.
Step S200, an electric power domain ontology construction step, wherein the electric power domain ontology is generated on the basis of the local ontology.
On the basis of obtaining the mapping relation between the local ontologies, similar concepts in each local ontology are found. Abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if a class appears in only one data source, its class and attributes are directly put in the location corresponding to the power domain ontology.
Step S300, a local metadata extraction step, namely extracting the local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata.
Based on the local ontology, extracting the local metadata, and forming a mapping relation metadata extraction rule between the local ontology and the local metadata as follows:
rule 1: for the class correspondence in the local ontology to be metadata, metadata = protocol type: the host name/root directory/class name.
Rule 2: the object properties in the local ontology are mapped as foreign key properties.
Rule 3: the primary key mapping in the data attribute in the local ontology is a metadata identification.
Rule 4: the other attributes except the primary key in the data attributes in the local ontology are mapped into general attributes.
Rule 5: the number of records in the class-corresponding table in each local ontology is added as a metadata entry into the corresponding metadata entry.
Rule 6: the type, length, etc. of the class in the local ontology will be generated to map to the tag properties of the metadata.
Step 400, a metadata integration step, in which local metadata is integrated into global metadata under the guidance of the power domain ontology.
According to the mapping relation obtained in the construction of the electric power field ontology and the metadata extraction of the local ontology: onto Meta Mapping and Oto Mapping. The metadata integration step in the power field is as follows:
step S410, setting the global metadata to be empty, inputting the local metadata, searching whether the corresponding global metadata exists according to Onto Meta Mapping and Oto Mapping, and if so, turning to step S420; if not, go to step S430.
Step S420, the information of the local metadata is directly added to the global metadata by referring to the mapping relation.
Step S430, finding global metadata corresponding to the local metadata according to the mapping relation, and establishing a corresponding relation between the global metadata and the local metadata; checking whether the attribute of the metadata exists in the global metadata, and if so, turning to step S440; if not, go to step S450.
Step S440, the general attribute and the foreign key attribute refer to the mapping relation to establish mapping; the type, the mark and other mark attributes take the maximum value of the mark attribute of the global metadata. Metadata entries accumulate values.
Step S450, adding the attribute of the local metadata into the global metadata by referring to the mapping relation, and establishing mapping.
Further, as shown in fig. 2, fig. 2 is a flow chart of an embodiment of step S100 shown in fig. 1, and in this embodiment, step S100 includes:
and step S110, comprehensively analyzing the local data sources to obtain the mode of the database.
Comprehensively analyzing a data source to obtain a mode of a database, thereby establishing an ER model of the database, and obtaining a local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure.
And step S120, an ER model of the database is established, a local ontology is obtained on the basis of the ER model, and a relation diagram between a local data source and the local ontology is formed, wherein the ER model is established with the relation between entities.
An ER model of the database is established, and the logical structure of the database can be represented by an ER diagram. And obtaining the local ontology based on the ER model.
Preferably, please refer to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S200 shown in fig. 1, in which step S200 includes:
step S210, on the basis of obtaining the mapping relation between the local ontologies, finding similar concepts in each local ontology.
Step S220, abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the integration method for multi-source heterogeneous power data sets provided in this embodiment, in step S300, a current ontology mapping algorithm is used to establish an ontology mapping. The cube body mapping algorithm comprises the following steps:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure SMS_15
(1)
in the formula (1),
Figure SMS_16
structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
Figure SMS_17
(2)
in the formula (2),
Figure SMS_18
for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e. the similarity of x, y exceeds a threshold.
According to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure SMS_19
(3)
in the formula (3),
Figure SMS_20
is comprehensive similarity, ->
Figure SMS_21
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Compared with the prior art, the integration method for the multi-source heterogeneous power data set is provided by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integration method for the multi-source heterogeneous power data set, provided by the embodiment, avoids the storage and transmission of a large amount of data, and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
As shown in fig. 6, fig. 6 is a functional block diagram of an embodiment of an integrated system for a multi-source heterogeneous power data set provided by the present invention, where in this embodiment, the integrated system for a multi-source heterogeneous power data set includes a local ontology construction module 10, a power domain ontology construction module 20, a local metadata extraction module 30, and a metadata integration module 40, where the local ontology construction module 10 is configured to construct a local ontology; the electric power domain ontology construction module 20 is used for generating an electric power domain ontology on the basis of the local ontology; the local metadata extraction module 30 is configured to extract local metadata based on the local ontology, and form a mapping relationship between the local ontology and the local metadata; the metadata integration module 40 is configured to integrate local metadata into global metadata under the guidance of the power domain ontology.
The local ontology construction module 10 analyzes the data sources and constructs local ontologies; before constructing the local ontology, comprehensively analyzing the data source to obtain a mode of a database, thereby establishing an ER (Entity Relationship Diagram, entity contact diagram) model of the database, and obtaining the local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure. The logical structure of the database may be represented by an ER (Entity Relationship Diagram, entity association) diagram. The data model is a knowledge of the real world, which is made up of a set of basic objects called entities and the links of these objects. An entity is a "thing" or an "object" that is distinguishable from other objects in the real world, described by a collection of attributes. A contact is an interrelation between entities. The set of all entities of the same type is referred to as an entity set and the set of all contacts of the same type is referred to as a contact set. By analyzing the data patterns and ontology concepts, a relationship between the ER model and the local ontology can be obtained, and FIG. 3 illustrates a relationship between the local data sources and the local ontology.
The power domain ontology construction module 20 finds similar concepts in each local ontology on the basis of obtaining the mapping relationship between the local ontologies. Abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if a class appears in only one data source, its class and attributes are directly put in the location corresponding to the power domain ontology. The local metadata extraction module 30 extracts local metadata based on the local ontology, and forms a mapping relationship metadata extraction rule between the local ontology and the local metadata as follows:
rule 1: for the class correspondence in the local ontology to be metadata, metadata = protocol type: the host name/root directory/class name.
Rule 2: the object properties in the local ontology are mapped as foreign key properties.
Rule 3: the primary key mapping in the data attribute in the local ontology is a metadata identification.
Rule 4: the other attributes except the primary key in the data attributes in the local ontology are mapped into general attributes.
Rule 5: the number of records in the class-corresponding table in each local ontology is added as a metadata entry into the corresponding metadata entry.
Rule 6: the type, length, etc. of the class in the local ontology will be generated to map to the tag properties of the metadata.
The metadata integration module 40 is configured according to the mapping relationship obtained in the construction of the power domain ontology and the metadata extraction of the local ontology: onto Meta Mapping and Oto Mapping.
Further, please refer to fig. 7, fig. 7 is a schematic functional block diagram of an embodiment of the local ontology building module shown in fig. 6, in which the local ontology building module 10 includes an analysis unit 11 and a building unit 12, wherein the analysis unit 11 is configured to perform overall analysis on a local data source to obtain a database schema; the establishing unit 12 is configured to establish an ER model of the database, obtain a local ontology based on the ER model, and form a relationship diagram between the local data source and the local ontology, where the ER model is established with a relationship between entities.
The analysis unit 11 comprehensively analyzes the data source to obtain a mode of a database, thereby establishing an ER model of the database and obtaining a local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure.
The building unit 12 builds an ER model of a database, the logical structure of which can be represented by an ER diagram. And obtaining the local ontology based on the ER model.
Preferably, fig. 8 is a schematic functional block diagram of an embodiment of the power domain ontology construction module shown in fig. 6, in which the power domain ontology construction module 20 includes a search unit 21 and an abstraction unit 22, where the search unit 21 is configured to find similar concepts in each local ontology based on obtaining a mapping relationship between the local ontologies; an abstract unit 22 for abstracting the class of the same concept into one class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the integrated system for multi-source heterogeneous power data set provided in this embodiment, the local metadata extraction module 30 uses a current ontology mapping algorithm to establish an ontology mapping. The cube body mapping algorithm comprises the following steps: respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure SMS_22
(4)
in the formula (4) of the present invention,
Figure SMS_23
structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
Figure SMS_24
(5)
in the formula (5) of the present invention,
Figure SMS_25
for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e. the similarity of x, y exceeds a threshold.
According to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure SMS_26
(6)
in the formula (6) of the present invention,
Figure SMS_27
is comprehensive similarity, ->
Figure SMS_28
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Compared with the prior art, the integrated system for the multi-source heterogeneous power data set provided by the embodiment adopts the local ontology construction module 10, the power domain ontology construction module 20, the local metadata extraction module 30 and the metadata integration module 40 to construct a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integrated system for the multi-source heterogeneous power data set, provided by the embodiment, avoids the storage and transmission of a large amount of data, and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. The integration method for the multi-source heterogeneous power data set is characterized by comprising the following steps of:
a local body construction step, namely constructing a local body;
a power domain ontology construction step of generating a power domain ontology on the basis of the local ontology;
a local metadata extraction step, namely extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata;
a metadata integration step of integrating the local metadata into global metadata under the guidance of the electric power domain ontology;
the local ontology construction step comprises the following steps:
comprehensively analyzing the local data sources to obtain the mode of the database;
and establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
2. The integration method for multi-source heterogeneous power data sets according to claim 1, wherein the power domain ontology construction step includes:
on the basis of obtaining the mapping relation between the local ontology, finding out similar concepts in each local ontology;
abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
3. The integration method for multi-source heterogeneous power data sets according to claim 1, wherein the power domain ontology construction step adopts a current ontology mapping algorithm to establish an ontology mapping.
4. The method of integrating a multisource heterogeneous power data set of claim 3, wherein the current ontology mapping algorithm comprises:
respectively are provided withCalculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure QLYQS_1
wherein (1)>
Figure QLYQS_2
Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
Figure QLYQS_3
wherein,,
Figure QLYQS_4
for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure QLYQS_5
wherein (1)>
Figure QLYQS_6
Is comprehensive similarity, ->
Figure QLYQS_7
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
and comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, establishing a relation between two elements, and forming a mapping relation between an electric power domain ontology and a local ontology.
5. An integrated system for a multi-source heterogeneous power data set, comprising:
a local ontology construction module (10) for constructing a local ontology;
the electric power domain ontology construction module (20) is used for generating an electric power domain ontology on the basis of the local ontology;
a local metadata extraction module (30) for extracting local metadata based on the local ontology and forming a mapping relationship between the local ontology and the local metadata;
a metadata integration module (40) for integrating the local metadata into global metadata under the direction of the power domain ontology;
the local ontology-building module (10) comprises:
the analysis unit (11) is used for comprehensively analyzing the local data sources to obtain the mode of the database;
and the establishing unit (12) is used for establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
6. The multi-source heterogeneous power dataset-oriented integrated system of claim 5, wherein the power domain ontology construction module (20) comprises:
the searching unit (21) is used for finding similar concepts in each local ontology on the basis of obtaining the mapping relation among the local ontologies;
an abstraction unit (22) for abstracting the class of the same concept into one class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
7. The multi-source heterogeneous power data set-oriented integrated system of claim 5, wherein the local metadata extraction module (30) employs a current ontology mapping algorithm to build an ontology mapping.
8. The integrated system for multisource heterogeneous power data set of claim 7, wherein the current ontology mapping algorithm comprises:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
Figure QLYQS_8
wherein (1)>
Figure QLYQS_9
Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
Figure QLYQS_10
wherein (1)>
Figure QLYQS_11
For semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
Figure QLYQS_12
wherein (1)>
Figure QLYQS_13
Is comprehensive similarity, ->
Figure QLYQS_14
Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
and comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, establishing a relation between two elements, and forming a mapping relation between an electric power domain ontology and a local ontology.
CN202310663877.8A 2023-06-06 2023-06-06 Integration method and system for multi-source heterogeneous power data set Withdrawn CN116383335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310663877.8A CN116383335A (en) 2023-06-06 2023-06-06 Integration method and system for multi-source heterogeneous power data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310663877.8A CN116383335A (en) 2023-06-06 2023-06-06 Integration method and system for multi-source heterogeneous power data set

Publications (1)

Publication Number Publication Date
CN116383335A true CN116383335A (en) 2023-07-04

Family

ID=86971704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310663877.8A Withdrawn CN116383335A (en) 2023-06-06 2023-06-06 Integration method and system for multi-source heterogeneous power data set

Country Status (1)

Country Link
CN (1) CN116383335A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971951A (en) * 2024-04-02 2024-05-03 北京大数据先进技术研究院 Heterogeneous registry-oriented digital object metadata interoperation method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633092A (en) * 2004-11-25 2005-06-29 武汉大学 Distributed GIS space information integration apparatus and method based on mobile Agent and GML

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633092A (en) * 2004-11-25 2005-06-29 武汉大学 Distributed GIS space information integration apparatus and method based on mobile Agent and GML

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAYANT MADHAVAN等: "《 1 Generic Schema Matching with Cupid 》", MICROSOFT RESEARCH *
冯勇;张丽颖;顾兆旭;马技;: "面向高校多源异构数据环境的元数据集成方法", 辽宁大学学报(自然科学版), no. 02, pages 135 - 141 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971951A (en) * 2024-04-02 2024-05-03 北京大数据先进技术研究院 Heterogeneous registry-oriented digital object metadata interoperation method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN107330125B (en) Mass unstructured distribution network data integration method based on knowledge graph technology
JP5092165B2 (en) Data construction method and system
CN112000725B (en) Ontology fusion preprocessing method for multi-source heterogeneous resources
JP2012520529A (en) System and method for knowledge research
CN102567314B (en) Device and method for inquiring knowledge
CN101436192A (en) Method and apparatus for optimizing inquiry aiming at vertical storage type database
CN104239501A (en) Mass video semantic annotation method based on Spark
CN109710767B (en) Multilingual big data service platform
CN113986873A (en) Massive Internet of things data modeling processing, storing and sharing method
CN111552813A (en) Power knowledge graph construction method based on power grid full-service data
CN111627552A (en) Medical streaming data blood relationship analysis and storage method and device
CN113535788A (en) Retrieval method, system, equipment and medium for marine environment data
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
Azri et al. Dendrogram clustering for 3D data analytics in smart city
CN103077216B (en) The method of subgraph match device and subgraph match
CN112148938B (en) Cross-domain heterogeneous data retrieval system and retrieval method
CN103365960A (en) Off-line searching method of structured data of electric power multistage dispatching management
CN111984745A (en) Dynamic expansion method, device, equipment and storage medium for database field
CN116383335A (en) Integration method and system for multi-source heterogeneous power data set
CN111695001B (en) Mixed data management system under big data scene
CN115981804A (en) Industrial big data calculation task scheduling management system
CN115186745A (en) Ontology-based digital twin workshop multi-dimensional information fusion method
CN114691700A (en) Kafaka cluster-based intelligent park retrieval method
CN111460046A (en) Scientific and technological information clustering method based on big data
Qin et al. A knowledge search algorithm based on multidimensional semantic similarity analysis in knowledge graph systems of power grid networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230704

WW01 Invention patent application withdrawn after publication