CN116383335A - Integration method and system for multi-source heterogeneous power data set - Google Patents
Integration method and system for multi-source heterogeneous power data set Download PDFInfo
- Publication number
- CN116383335A CN116383335A CN202310663877.8A CN202310663877A CN116383335A CN 116383335 A CN116383335 A CN 116383335A CN 202310663877 A CN202310663877 A CN 202310663877A CN 116383335 A CN116383335 A CN 116383335A
- Authority
- CN
- China
- Prior art keywords
- ontology
- local
- similarity
- metadata
- power domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000010354 integration Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000013507 mapping Methods 0.000 claims abstract description 61
- 238000010276 construction Methods 0.000 claims description 36
- 238000010586 diagram Methods 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004458 analytical method Methods 0.000 claims description 7
- 230000005484 gravity Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 abstract description 8
- 238000013500 data storage Methods 0.000 abstract description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000004075 alteration Effects 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Water Supply & Treatment (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Animal Behavior & Ethology (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses an integration method and system for a multisource heterogeneous power data set, which are implemented by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The invention avoids the storage and transmission of a large amount of data and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
Description
Technical Field
The invention relates to the technical field of power data management and control, and particularly discloses an integration method and system for a multi-source heterogeneous power data set.
Background
The ontology is initially a concept in the philosophy domain, and an objective description is made of the real world existence. The scholars in the field of artificial intelligence, neches et al, define ontologies (ontologies), namely: ontologies are basic terms and relationships that constitute related-art vocabularies, and definitions of rules that specify the extension of these vocabularies that are constructed using these terms and relationships: an ontology is a conceptual specification. Based on the characteristic that the ontology specification describes the concept, an ontology construction rule is utilized to establish the electric power field ontology, and from the establishment of the local ontology, the local ontology is integrated by utilizing an ontology mapping rule, so that the electric power field ontology is constructed, and a basis is provided for metadata integration of heterogeneous environments.
Metadata is data about data with which the distribution of the data in a database can be recorded. With the rapid development of network technology. Metadata has evolved from the original data description and indexing approach to one of the tools and methods necessary for data presentation, data conversion, data management, and data usage throughout the information transfer process. In the power multi-source heterogeneous data environment, the data format, content, quality and the like are greatly different. The metadata is used for carrying out uniform logic representation on heterogeneous data sources, so that the heterogeneous problem of each data source is solved, a uniform infrastructure is provided for data integration, and the unified infrastructure is used for describing integrated data and data sources.
Data integration is a process of logically or physically concentrating data in several scattered data sources into one data set, with the goal of achieving data sharing and information exchange, and the core task is to integrate the distributed heterogeneous data sources associated with each other. In order to better solve the problems of isomerism and the like, the metadata is integrated to form global metadata, so that the demand data of a user is ensured to be found more quickly, and meanwhile, the quality and the applicability of the data integration are improved.
At present, the informatization construction of the power system in China is gradually developed from various application development stages centering on a digital power system to a system integration stage centering on an intelligent power system. At present, various application systems are developed for each power system, and information technology is utilized to realize collection, processing and application of information such as power generation, power transmission, charging, office work and the like, but due to different business and division work of each department, each application system is isolated from each other, data cannot be communicated, the problems of information island and data redundancy are generated, and the value of the data cannot be fully developed and utilized.
In recent years, a plurality of scholars have achieved a relatively great scientific research result in the aspect of heterogeneous data integration, but the current working difficulty is that the storage cost and the transmission cost of multi-source heterogeneous data integration are relatively high, and the network bandwidth is required to be very high.
Therefore, the existing multi-source heterogeneous data integration method has higher storage cost and transmission cost, has high requirements on network bandwidth, and is a technical problem to be solved urgently at present.
Disclosure of Invention
The invention provides an integration method and system for a multi-source heterogeneous power data set, and aims to solve the technical problems of high storage cost and transmission cost and high requirement on network bandwidth in the existing multi-source heterogeneous data integration method.
One aspect of the invention relates to an integration method for a multi-source heterogeneous power data set, comprising the following steps:
a local body construction step, namely constructing a local body;
a power domain ontology construction step of generating a power domain ontology on the basis of the local ontology;
a local metadata extraction step, namely extracting local metadata based on a local body, and forming a mapping relation between the local body and the local metadata;
and a metadata integration step, namely integrating the local metadata into global metadata under the guidance of the power domain ontology.
Further, the local ontology construction step includes:
comprehensively analyzing the local data sources to obtain the mode of the database;
and establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
Further, the electric power domain ontology construction step includes:
on the basis of obtaining the mapping relation between the local ontology, finding out similar concepts in each local ontology;
abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the power domain ontology construction step, a current ontology mapping algorithm is adopted to establish an ontology mapping.
Further, the current ontology mapping algorithm includes:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:wherein (1)>Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
wherein,,for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:wherein,,is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Another aspect of the invention relates to an integrated system for a multi-source heterogeneous power data set, comprising:
the local body construction module is used for constructing a local body;
the power domain ontology construction module is used for generating a power domain ontology on the basis of the local ontology;
the local metadata extraction module is used for extracting the local metadata based on the local ontology and forming a mapping relation between the local ontology and the local metadata;
and the metadata integration module is used for integrating the local metadata into global metadata under the guidance of the electric power field ontology.
Further, the local ontology construction module includes:
the analysis unit is used for comprehensively analyzing the local data sources to obtain the mode of the database;
the establishing unit is used for establishing an ER model of the database, obtaining a local ontology based on the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the relation between the entities is established in the ER model.
Further, the power domain ontology construction module includes:
the searching unit is used for finding similar concepts in each local ontology on the basis of obtaining the mapping relation among the local ontologies;
an abstract unit for abstracting the class of the same concept into one class in the electric power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the local metadata extraction module, a current ontology mapping algorithm is used to establish an ontology mapping.
Further, the current ontology mapping algorithm includes:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
wherein,,structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2; calculating semantic similarity, wherein the calculated semantic similarity is as follows: />Wherein (1)>For semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
wherein,,is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
The beneficial effects obtained by the invention are as follows:
the invention provides an integration method and system for a multisource heterogeneous power data set, which are implemented by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integration method and the system for the multi-source heterogeneous power data set, provided by the invention, avoid the storage and transmission of a large amount of data and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
Drawings
FIG. 1 is a flow chart of an embodiment of an integration method for multi-source heterogeneous power data set according to the present invention;
FIG. 2 is a flow chart of an embodiment of the local ontology construction step shown in FIG. 1;
FIG. 3 is a diagram of the relationship between a local data source and a local ontology in the present invention;
FIG. 4 is a flowchart illustrating an embodiment of the power domain ontology construction step shown in FIG. 1;
FIG. 5 is a diagram illustrating a metadata structure according to the present invention;
FIG. 6 is a functional block diagram of one embodiment of an integrated system for multi-source heterogeneous power data sets provided by the present invention;
FIG. 7 is a functional block diagram of an embodiment of the local ontology-building module shown in FIG. 6;
FIG. 8 is a functional block diagram of an embodiment of the power domain ontology-building module shown in FIG. 6.
Reference numerals illustrate:
10. a local body construction module; 20. the power domain body building module; 30. a local metadata extraction module; 40. a metadata integration module; 11. an analysis unit; 12. a building unit; 21. a search unit; 22. an abstract unit.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1 and 2, a first embodiment of the present invention proposes an integration method for multi-source heterogeneous power data sets, including the following steps:
step S100, a local body construction step is performed, and a local body is constructed.
Analyzing a data source and constructing a local ontology; before constructing the local ontology, comprehensively analyzing the data source to obtain a mode of a database, thereby establishing an ER (Entity Relationship Diagram, entity contact diagram) model of the database, and obtaining the local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure. The logical structure of the database may be represented by an ER (Entity Relationship Diagram, entity association) diagram. The data model is a knowledge of the real world, which is made up of a set of basic objects called entities and the links of these objects. An entity is a "thing" or an "object" that is distinguishable from other objects in the real world, described by a collection of attributes. A contact is an interrelation between entities. The set of all entities of the same type is referred to as an entity set and the set of all contacts of the same type is referred to as a contact set. By analyzing the data patterns and ontology concepts, a relationship between the ER model and the local ontology can be obtained, and FIG. 3 illustrates a relationship between the local data sources and the local ontology.
Step S200, an electric power domain ontology construction step, wherein the electric power domain ontology is generated on the basis of the local ontology.
On the basis of obtaining the mapping relation between the local ontologies, similar concepts in each local ontology are found. Abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if a class appears in only one data source, its class and attributes are directly put in the location corresponding to the power domain ontology.
Step S300, a local metadata extraction step, namely extracting the local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata.
Based on the local ontology, extracting the local metadata, and forming a mapping relation metadata extraction rule between the local ontology and the local metadata as follows:
rule 1: for the class correspondence in the local ontology to be metadata, metadata = protocol type: the host name/root directory/class name.
Rule 2: the object properties in the local ontology are mapped as foreign key properties.
Rule 3: the primary key mapping in the data attribute in the local ontology is a metadata identification.
Rule 4: the other attributes except the primary key in the data attributes in the local ontology are mapped into general attributes.
Rule 5: the number of records in the class-corresponding table in each local ontology is added as a metadata entry into the corresponding metadata entry.
Rule 6: the type, length, etc. of the class in the local ontology will be generated to map to the tag properties of the metadata.
Step 400, a metadata integration step, in which local metadata is integrated into global metadata under the guidance of the power domain ontology.
According to the mapping relation obtained in the construction of the electric power field ontology and the metadata extraction of the local ontology: onto Meta Mapping and Oto Mapping. The metadata integration step in the power field is as follows:
step S410, setting the global metadata to be empty, inputting the local metadata, searching whether the corresponding global metadata exists according to Onto Meta Mapping and Oto Mapping, and if so, turning to step S420; if not, go to step S430.
Step S420, the information of the local metadata is directly added to the global metadata by referring to the mapping relation.
Step S430, finding global metadata corresponding to the local metadata according to the mapping relation, and establishing a corresponding relation between the global metadata and the local metadata; checking whether the attribute of the metadata exists in the global metadata, and if so, turning to step S440; if not, go to step S450.
Step S440, the general attribute and the foreign key attribute refer to the mapping relation to establish mapping; the type, the mark and other mark attributes take the maximum value of the mark attribute of the global metadata. Metadata entries accumulate values.
Step S450, adding the attribute of the local metadata into the global metadata by referring to the mapping relation, and establishing mapping.
Further, as shown in fig. 2, fig. 2 is a flow chart of an embodiment of step S100 shown in fig. 1, and in this embodiment, step S100 includes:
and step S110, comprehensively analyzing the local data sources to obtain the mode of the database.
Comprehensively analyzing a data source to obtain a mode of a database, thereby establishing an ER model of the database, and obtaining a local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure.
And step S120, an ER model of the database is established, a local ontology is obtained on the basis of the ER model, and a relation diagram between a local data source and the local ontology is formed, wherein the ER model is established with the relation between entities.
An ER model of the database is established, and the logical structure of the database can be represented by an ER diagram. And obtaining the local ontology based on the ER model.
Preferably, please refer to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S200 shown in fig. 1, in which step S200 includes:
step S210, on the basis of obtaining the mapping relation between the local ontologies, finding similar concepts in each local ontology.
Step S220, abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the integration method for multi-source heterogeneous power data sets provided in this embodiment, in step S300, a current ontology mapping algorithm is used to establish an ontology mapping. The cube body mapping algorithm comprises the following steps:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
in the formula (1),structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
in the formula (2),for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e. the similarity of x, y exceeds a threshold.
According to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
in the formula (3),is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Compared with the prior art, the integration method for the multi-source heterogeneous power data set is provided by constructing a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integration method for the multi-source heterogeneous power data set, provided by the embodiment, avoids the storage and transmission of a large amount of data, and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
As shown in fig. 6, fig. 6 is a functional block diagram of an embodiment of an integrated system for a multi-source heterogeneous power data set provided by the present invention, where in this embodiment, the integrated system for a multi-source heterogeneous power data set includes a local ontology construction module 10, a power domain ontology construction module 20, a local metadata extraction module 30, and a metadata integration module 40, where the local ontology construction module 10 is configured to construct a local ontology; the electric power domain ontology construction module 20 is used for generating an electric power domain ontology on the basis of the local ontology; the local metadata extraction module 30 is configured to extract local metadata based on the local ontology, and form a mapping relationship between the local ontology and the local metadata; the metadata integration module 40 is configured to integrate local metadata into global metadata under the guidance of the power domain ontology.
The local ontology construction module 10 analyzes the data sources and constructs local ontologies; before constructing the local ontology, comprehensively analyzing the data source to obtain a mode of a database, thereby establishing an ER (Entity Relationship Diagram, entity contact diagram) model of the database, and obtaining the local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure. The logical structure of the database may be represented by an ER (Entity Relationship Diagram, entity association) diagram. The data model is a knowledge of the real world, which is made up of a set of basic objects called entities and the links of these objects. An entity is a "thing" or an "object" that is distinguishable from other objects in the real world, described by a collection of attributes. A contact is an interrelation between entities. The set of all entities of the same type is referred to as an entity set and the set of all contacts of the same type is referred to as a contact set. By analyzing the data patterns and ontology concepts, a relationship between the ER model and the local ontology can be obtained, and FIG. 3 illustrates a relationship between the local data sources and the local ontology.
The power domain ontology construction module 20 finds similar concepts in each local ontology on the basis of obtaining the mapping relationship between the local ontologies. Abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if a class appears in only one data source, its class and attributes are directly put in the location corresponding to the power domain ontology. The local metadata extraction module 30 extracts local metadata based on the local ontology, and forms a mapping relationship metadata extraction rule between the local ontology and the local metadata as follows:
rule 1: for the class correspondence in the local ontology to be metadata, metadata = protocol type: the host name/root directory/class name.
Rule 2: the object properties in the local ontology are mapped as foreign key properties.
Rule 3: the primary key mapping in the data attribute in the local ontology is a metadata identification.
Rule 4: the other attributes except the primary key in the data attributes in the local ontology are mapped into general attributes.
Rule 5: the number of records in the class-corresponding table in each local ontology is added as a metadata entry into the corresponding metadata entry.
Rule 6: the type, length, etc. of the class in the local ontology will be generated to map to the tag properties of the metadata.
The metadata integration module 40 is configured according to the mapping relationship obtained in the construction of the power domain ontology and the metadata extraction of the local ontology: onto Meta Mapping and Oto Mapping.
Further, please refer to fig. 7, fig. 7 is a schematic functional block diagram of an embodiment of the local ontology building module shown in fig. 6, in which the local ontology building module 10 includes an analysis unit 11 and a building unit 12, wherein the analysis unit 11 is configured to perform overall analysis on a local data source to obtain a database schema; the establishing unit 12 is configured to establish an ER model of the database, obtain a local ontology based on the ER model, and form a relationship diagram between the local data source and the local ontology, where the ER model is established with a relationship between entities.
The analysis unit 11 comprehensively analyzes the data source to obtain a mode of a database, thereby establishing an ER model of the database and obtaining a local ontology on the basis; the analytical data source needs to determine the concepts required to build the ontology by analyzing its logical structure.
The building unit 12 builds an ER model of a database, the logical structure of which can be represented by an ER diagram. And obtaining the local ontology based on the ER model.
Preferably, fig. 8 is a schematic functional block diagram of an embodiment of the power domain ontology construction module shown in fig. 6, in which the power domain ontology construction module 20 includes a search unit 21 and an abstraction unit 22, where the search unit 21 is configured to find similar concepts in each local ontology based on obtaining a mapping relationship between the local ontologies; an abstract unit 22 for abstracting the class of the same concept into one class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
Further, in the integrated system for multi-source heterogeneous power data set provided in this embodiment, the local metadata extraction module 30 uses a current ontology mapping algorithm to establish an ontology mapping. The cube body mapping algorithm comprises the following steps: respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:
in the formula (4) of the present invention,structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
in the formula (5) of the present invention,for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e. the similarity of x, y exceeds a threshold.
According to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:
in the formula (6) of the present invention,is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
by comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, a relation between the two elements is established, and a mapping relation between the electric power domain ontology and the local ontology is formed.
Compared with the prior art, the integrated system for the multi-source heterogeneous power data set provided by the embodiment adopts the local ontology construction module 10, the power domain ontology construction module 20, the local metadata extraction module 30 and the metadata integration module 40 to construct a local ontology; generating an electric power domain body on the basis of the local body; extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata; under the guidance of the power domain ontology, the local metadata are integrated into global metadata. The integrated system for the multi-source heterogeneous power data set, provided by the embodiment, avoids the storage and transmission of a large amount of data, and can greatly reduce the cost of data storage and network bandwidth; the problems that due to the fact that business and division of departments are different, application systems are isolated from each other, data cannot be communicated, information islands and data redundancy are generated are solved, and the value of the data is fully developed and utilized.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (8)
1. The integration method for the multi-source heterogeneous power data set is characterized by comprising the following steps of:
a local body construction step, namely constructing a local body;
a power domain ontology construction step of generating a power domain ontology on the basis of the local ontology;
a local metadata extraction step, namely extracting local metadata based on the local ontology, and forming a mapping relation between the local ontology and the local metadata;
a metadata integration step of integrating the local metadata into global metadata under the guidance of the electric power domain ontology;
the local ontology construction step comprises the following steps:
comprehensively analyzing the local data sources to obtain the mode of the database;
and establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
2. The integration method for multi-source heterogeneous power data sets according to claim 1, wherein the power domain ontology construction step includes:
on the basis of obtaining the mapping relation between the local ontology, finding out similar concepts in each local ontology;
abstracting the class of the same concept into a class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
3. The integration method for multi-source heterogeneous power data sets according to claim 1, wherein the power domain ontology construction step adopts a current ontology mapping algorithm to establish an ontology mapping.
4. The method of integrating a multisource heterogeneous power data set of claim 3, wherein the current ontology mapping algorithm comprises:
respectively are provided withCalculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:wherein (1)>Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
wherein,,for semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:wherein (1)>Is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
and comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, establishing a relation between two elements, and forming a mapping relation between an electric power domain ontology and a local ontology.
5. An integrated system for a multi-source heterogeneous power data set, comprising:
a local ontology construction module (10) for constructing a local ontology;
the electric power domain ontology construction module (20) is used for generating an electric power domain ontology on the basis of the local ontology;
a local metadata extraction module (30) for extracting local metadata based on the local ontology and forming a mapping relationship between the local ontology and the local metadata;
a metadata integration module (40) for integrating the local metadata into global metadata under the direction of the power domain ontology;
the local ontology-building module (10) comprises:
the analysis unit (11) is used for comprehensively analyzing the local data sources to obtain the mode of the database;
and the establishing unit (12) is used for establishing an ER model of the database, obtaining a local ontology on the basis of the ER model, and forming a relation diagram between the local data source and the local ontology, wherein the ER model is established with the relation between entities.
6. The multi-source heterogeneous power dataset-oriented integrated system of claim 5, wherein the power domain ontology construction module (20) comprises:
the searching unit (21) is used for finding similar concepts in each local ontology on the basis of obtaining the mapping relation among the local ontologies;
an abstraction unit (22) for abstracting the class of the same concept into one class in the power domain ontology; abstracting the same attribute of the same class into the attribute of the corresponding class in the electric power field ontology; abstracting the relation between classes into a relation in the electric power domain ontology; if the class appears in only one data source, the class and the attribute of the data source are directly put into the corresponding position of the electric power domain body.
7. The multi-source heterogeneous power data set-oriented integrated system of claim 5, wherein the local metadata extraction module (30) employs a current ontology mapping algorithm to build an ontology mapping.
8. The integrated system for multisource heterogeneous power data set of claim 7, wherein the current ontology mapping algorithm comprises:
respectively calculating the structural similarity of each concept, wherein the calculated structural similarity of each concept is as follows:wherein (1)>Structural similarity for each concept; m1 and m2 are the corresponding concepts in the two ontologies respectively; c1 and C2 are the categories to which m1 and m2 belong respectively; ns (m 1, m 2) is the name similarity of m1, m 2;
calculating semantic similarity, wherein the calculated semantic similarity is as follows:
wherein (1)>For semantic similarity, m1 and m2 are the corresponding concepts in the two ontologies respectively; leave (m 1) is a leaf representing m 1; strongalink (x, y) is a strong connection, i.e., the similarity of x, y exceeds a threshold;
according to the semantic similarity and the structural similarity of each concept, calculating the comprehensive similarity among the concepts, wherein the calculated comprehensive similarity is as follows:wherein (1)>Is comprehensive similarity, ->Is a specific gravity coefficient, sim is a semantic similarity, lism is a structural similarity for calculating each concept;
and comparing the comprehensive similarity with a threshold value thaccept, if the comprehensive similarity is larger than the threshold value thaccept, establishing a relation between two elements, and forming a mapping relation between an electric power domain ontology and a local ontology.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310663877.8A CN116383335A (en) | 2023-06-06 | 2023-06-06 | Integration method and system for multi-source heterogeneous power data set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310663877.8A CN116383335A (en) | 2023-06-06 | 2023-06-06 | Integration method and system for multi-source heterogeneous power data set |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383335A true CN116383335A (en) | 2023-07-04 |
Family
ID=86971704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310663877.8A Withdrawn CN116383335A (en) | 2023-06-06 | 2023-06-06 | Integration method and system for multi-source heterogeneous power data set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383335A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117971951A (en) * | 2024-04-02 | 2024-05-03 | 北京大数据先进技术研究院 | Heterogeneous registry-oriented digital object metadata interoperation method, device, equipment and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1633092A (en) * | 2004-11-25 | 2005-06-29 | 武汉大学 | Distributed GIS space information integration apparatus and method based on mobile Agent and GML |
-
2023
- 2023-06-06 CN CN202310663877.8A patent/CN116383335A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1633092A (en) * | 2004-11-25 | 2005-06-29 | 武汉大学 | Distributed GIS space information integration apparatus and method based on mobile Agent and GML |
Non-Patent Citations (2)
Title |
---|
JAYANT MADHAVAN等: "《 1 Generic Schema Matching with Cupid 》", MICROSOFT RESEARCH * |
冯勇;张丽颖;顾兆旭;马技;: "面向高校多源异构数据环境的元数据集成方法", 辽宁大学学报(自然科学版), no. 02, pages 135 - 141 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117971951A (en) * | 2024-04-02 | 2024-05-03 | 北京大数据先进技术研究院 | Heterogeneous registry-oriented digital object metadata interoperation method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330125B (en) | Mass unstructured distribution network data integration method based on knowledge graph technology | |
JP5092165B2 (en) | Data construction method and system | |
CN112000725B (en) | Ontology fusion preprocessing method for multi-source heterogeneous resources | |
JP2012520529A (en) | System and method for knowledge research | |
CN102567314B (en) | Device and method for inquiring knowledge | |
CN101436192A (en) | Method and apparatus for optimizing inquiry aiming at vertical storage type database | |
CN104239501A (en) | Mass video semantic annotation method based on Spark | |
CN109710767B (en) | Multilingual big data service platform | |
CN113986873A (en) | Massive Internet of things data modeling processing, storing and sharing method | |
CN111552813A (en) | Power knowledge graph construction method based on power grid full-service data | |
CN111627552A (en) | Medical streaming data blood relationship analysis and storage method and device | |
CN113535788A (en) | Retrieval method, system, equipment and medium for marine environment data | |
CN113190687A (en) | Knowledge graph determining method and device, computer equipment and storage medium | |
Azri et al. | Dendrogram clustering for 3D data analytics in smart city | |
CN103077216B (en) | The method of subgraph match device and subgraph match | |
CN112148938B (en) | Cross-domain heterogeneous data retrieval system and retrieval method | |
CN103365960A (en) | Off-line searching method of structured data of electric power multistage dispatching management | |
CN111984745A (en) | Dynamic expansion method, device, equipment and storage medium for database field | |
CN116383335A (en) | Integration method and system for multi-source heterogeneous power data set | |
CN111695001B (en) | Mixed data management system under big data scene | |
CN115981804A (en) | Industrial big data calculation task scheduling management system | |
CN115186745A (en) | Ontology-based digital twin workshop multi-dimensional information fusion method | |
CN114691700A (en) | Kafaka cluster-based intelligent park retrieval method | |
CN111460046A (en) | Scientific and technological information clustering method based on big data | |
Qin et al. | A knowledge search algorithm based on multidimensional semantic similarity analysis in knowledge graph systems of power grid networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230704 |
|
WW01 | Invention patent application withdrawn after publication |