CN115269578A - Data index-based comprehensive energy enterprise data management method and system - Google Patents

Data index-based comprehensive energy enterprise data management method and system Download PDF

Info

Publication number
CN115269578A
CN115269578A CN202210920861.6A CN202210920861A CN115269578A CN 115269578 A CN115269578 A CN 115269578A CN 202210920861 A CN202210920861 A CN 202210920861A CN 115269578 A CN115269578 A CN 115269578A
Authority
CN
China
Prior art keywords
data
index
attribute
source
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210920861.6A
Other languages
Chinese (zh)
Inventor
高云龙
于瑞雪
李夏光
刘海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guoneng Wangxin Technology Beijing Co ltd
Xuzhou Tianlu Zhongkuang Mining Technology Co ltd
China Shenhua Energy Co Ltd
Original Assignee
Guoneng Wangxin Technology Beijing Co ltd
Xuzhou Tianlu Zhongkuang Mining Technology Co ltd
China Shenhua Energy Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guoneng Wangxin Technology Beijing Co ltd, Xuzhou Tianlu Zhongkuang Mining Technology Co ltd, China Shenhua Energy Co Ltd filed Critical Guoneng Wangxin Technology Beijing Co ltd
Priority to CN202210920861.6A priority Critical patent/CN115269578A/en
Publication of CN115269578A publication Critical patent/CN115269578A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data index-based comprehensive energy enterprise data management method and a system, wherein the method comprises the following steps: s1, creating and defining a corresponding attribute table set according to a business analysis target, and S2, constructing and obtaining an index data model based on the attribute table set; s3, acquiring a plurality of data sources, performing data fusion according to the index data model to generate a target service table, and performing data acquisition, data conversion, data verification and data fusion operation through a fusion engine in the data fusion process; and S4, storing the target service table into a database buffer table to prompt confirmation, and storing the confirmed data into a service database and issuing the data. The invention enables related authority personnel to participate in the data construction, cleaning and treatment processes, defines the conversion rule and the verification mechanism from multi-source data to target data, improves the data processing quality and efficiency and enhances the stability and the normalization of data assets.

Description

Data index-based comprehensive energy enterprise data management method and system
Technical Field
The invention relates to the field of energy enterprise data management, in particular to a data index-based comprehensive energy enterprise data management method and system.
Background
The enterprise digital transformation is a process of informationizing, digitizing and intelligentizing processes and data related to enterprise management and business operation, and aims to realize closed loop of business datamation and data business, further realize sustainable development of business resource allocation and realize automatic efficiency improvement of decision. At present, in the prior art, data development is realized from the perspective of technical research and development, data indexes and data analysis indexes are more involved in energy enterprise data, and data is also multi-source.
Disclosure of Invention
The invention aims to solve the technical problems in the background art, and provides a data index-based comprehensive energy enterprise data management method and a data index-based comprehensive energy enterprise data management system, which define conversion rules, standards and verification mechanisms from multi-source data to target data, enable all related authority personnel to participate in data construction, cleaning and management processes, ensure the accuracy of data processing rules and the quality of data, enhance the stability and normalization of data assets, and increase the speed and utilization efficiency of data fusion.
The purpose of the invention is realized by the following technical scheme:
a comprehensive energy enterprise data management method based on data indexes comprises the following steps:
s1, creating and defining a corresponding attribute table set according to a business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table;
s2, constructing an index data model based on the attribute table set;
s3, acquiring a plurality of data sources, performing data fusion according to the index data model to generate a target service table, and performing data acquisition, data conversion, data verification and data fusion operation through a fusion engine in the data fusion process;
and S4, storing the target service table into a database buffer table to prompt confirmation, and storing the confirmed data into a service database and issuing the data.
In order to better implement the invention, the invention provides the following preferred technical scheme: in S1, correspondingly constructing a blood-related directed graph by the attributes of all the tables in the attribute table set; and the index data definition of the index data model in the step S2 comprises field definition, internal and external source definition of data, conversion rule for acquiring warehouse entry from a data source, current data version and access authority.
The invention provides the following preferred technical scheme: in S3, a data dereferencing rule is set for data acquisition of the index data model, the data dereferencing rule is a description language defined during acquisition and storage, and the data acquisition of the index data model sets a confidence score table according to a source of a data source; and the data verification of the index data model is provided with a validation constraint rule, and the validation constraint rule comprises the blood-related directed graph matching degree.
The invention provides the following preferred technical scheme: in S4, the column attributes in the target service table correspond to the attribute table set, the row attributes correspond to the data after data fusion in the target service table, and the database buffer table gives a confirmation prompt according to the repeated, missing and error information of the data obtained by data self-inspection; for the repeated condition of the data, a confidence probability table is built according to the data sources, and confidence probabilities are assigned to the data of different data sources; for the missing condition of the data, providing associated original data collected by a data source for confirmation and addition, and creating a behavior recording rule table to record the data source and the original data position; for the error condition of the data, providing the original data of the column attribute and the row attribute collected by the data source for confirmation and correction, and creating a behavior recording rule table to record the mapping relation of the data source, the original data column attribute and the row attribute.
The invention provides the following preferred technical scheme: the data fusion method in step S2 is as follows:
taking data sources from different sources as different data sets, carrying out similarity measurement of a kernel function K () by taking data elements in the different data sets as support vector samples through a support vector machine model according to column attributes of a target service table, calculating a weighted sum value M according to the following formula, and carrying out decision making according to the size of the weighted sum value M:
M=sgn(∑a i y i k(x i x) + b), in which a) is i y i Is a weight value, k (x) i X) is a support vector x i Non-linear transformation of (2), x i For the input vector, b represents the parameters in the support vector machine model.
The invention provides the following preferred technical scheme: and continuously monitoring data increment change of a plurality of data sources, updating increment data according to the blood-related directed graph through an updating engine, inputting the updated increment data into a buffer table of the database to prompt whether to update and confirm, and correspondingly loading the confirmed increment data in a business database.
The invention provides the following preferred technical scheme: and step S4, confirming that the operation is provided with access authority, personnel authority and auditor authority.
A comprehensive energy enterprise data governance system based on data indexes comprises:
the basic information definition module is used for creating and defining a corresponding attribute table set according to the business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table, and basic attribute information of each table in the attribute table set is defined;
the index data model is constructed on the basis of the attribute table set;
the target service table generation module is used for acquiring a plurality of data sources to perform data fusion according to the index data model to generate a target service table, and data acquisition, data conversion, data verification and data fusion operation are performed through a fusion engine in the data fusion process;
and the confirming and releasing module is used for storing the target service table into the database buffer table and prompting to confirm, and storing the confirmed data into the service database and releasing the data.
Preferably, the attributes of each table in the attribute table set of the basic information definition module correspondingly construct a blood-related directed graph, and the index data definition of the index data model includes field definition, internal and external source definition of data, conversion rule for obtaining warehousing from a data source, current data version and access right.
Preferably, the index data model comprises a data dereferencing rule, the data dereferencing rule is a description language defined during acquisition and storage, and the index data model sets a confidence score table according to a source of a data source during data acquisition; the index data model comprises a validation constraint rule, and the validation constraint rule comprises the matching degree of the blood-related directed graph.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) The method starts from a business analysis target, performs index data definition on business data, and defines conversion rules, standards and verification mechanisms from multi-source data to target data, so that related authority personnel can participate in data construction, cleaning and treatment processes, and the accuracy of data processing rules and the quality of data are ensured; the data asset management method and the data asset management system have the advantages that enterprise associated personnel can participate in the management process of the data asset, the influence of data source change on products is reduced, the stability and the standardization of the data asset are enhanced, and meanwhile the speed and the utilization efficiency of data fusion are increased.
(2) In the multi-source data fusion process, preliminary automatic merging processing is respectively carried out on row and column data based on index data definition, for example: performing row merging based on the alignment of the primary keys, and selecting and verifying data points based on the confidence coefficient; when a conflict occurs, data authority personnel are required to confirm the decision, and a fault-tolerant table is formed, so that the future data processing is more automatic and standardized, the speed and quality of data fusion are increased, and the data enhancement and the reliability are ensured.
(3) According to the method, the target table is defined by index data, a directed acyclic graph is constructed for each table, when the multi-source data table is changed (fields or values are changed), the influence of the change on the existing data is automatically identified, the change notice is prompted to the authority personnel, and the influence of the change of the data source on the product is reduced.
Drawings
FIG. 1 is a method flow chart of the comprehensive energy enterprise data management method of the invention;
FIG. 2 is a block diagram of the integrated energy enterprise data management system of the present invention;
FIG. 3 is a schematic diagram of attributes of an exemplary set of attribute tables and an exemplary index data model;
FIG. 4 is a flow chart of the fusion and governance of multi-source data in the embodiment;
FIG. 5 is a schematic diagram of a support vector machine model in an embodiment;
fig. 6 is a schematic diagram of the integrated energy enterprise data management system in terms of index definition, rule definition, data editing, and other functions.
Detailed Description
The present invention will be described in further detail with reference to the following examples:
examples
As shown in fig. 1 to 6, a data index-based comprehensive energy enterprise data management method includes the following steps:
s1, creating and defining a corresponding attribute table set according to a business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table. As shown in fig. 3, the basic attributes of different service analysis targets are also different, and generally, basic attribute information required by a service is defined according to the service analysis target; for example: if the business analysis target is the power generation amount (data recording is performed by using a main data table), the required basic attribute table includes 1 (power generation unit, which relates to unit dimension in the dimension table, and the dimension table also includes time dimension, such as the attribute of the time dimension includes year, month, and day), table 2 (index table, the attribute in the index table includes report period, energy composition, and the like), table 3 (power generation type, the attribute of the power generation type includes thermal power, hydroelectric power, wind power, solar energy, biomass, geothermal energy, and tide), and table 4 (numerical table), and fig. 3 also lists the index definitions of the attribute of each table in the attribute table set.
In some embodiments, in step S1, the attributes of each table in the attribute table set correspondingly construct a blood-related directed graph; in the step S2, the index data definition of the index data model comprises field definition, internal and external source definition of data, conversion rule for acquiring warehousing from a data source, current data version and access authority; for the index data definition, the index definition, the conversion rule definition and the constraint rule definition listed in fig. 3 can be referred to.
S2, constructing and obtaining an index data model based on the attribute table set; an example of the definition of the index data model (fig. 6 provides an illustration of the integrated energy enterprise data management system in the functions of index definition, rule definition, data editing, and the like) is as follows:
org _ c, unit coding
prt _ pd, report period
idx _ v, value
idx _ t, time dimension
idx _ u, unit
dimension of dimension
buz _ t _ c, topic Domain
If pk is the primary key, the system automatically records and judges the weight according to the pk combination;
fft, a fault-tolerant table corresponding to the field, for standardized use of the field value;
etl, field dereferencing logic is defined, and a plurality of data sources can be defined;
source, data source and mapping field definitions;
check, constraint and check logic definition, such as: non null (notNull), a point value (ref) contained in another table, a boundary interval value (gt =10, lt = 20), and the like;
search, whether the field can be searched, query (query) and text search (index);
confidence, data quality evaluation, value confidence, user can set as 1, or after multi-source comparison, the confidence probability is confirmed; for cross-organization data, the data may be verified through a data security algorithm (e.g., federal machine learning).
ver, version definition, used for managing whether the version is released to the gold copy;
auth, access rights definition, including: read only (readonly), editable (edit), exportable (export).
Preferably, in addition to structured data generated by a business system, data fusion from a plurality of data sources inside and outside needs editing and confirming work aiming at table records and column values to ensure high quality and credibility of the data for product and analysis.
And S3, acquiring a plurality of data sources, performing data fusion according to the index data model to generate a target service table, and performing data acquisition, data conversion, data verification and data fusion operation through a fusion engine in the data fusion process. As shown in fig. 4, the overall data flow of the data source includes an external multi-path data source and an internal multi-path data source, the external multi-path data source includes a land database, API docking, crawler collection, etc., and for various data sources, full volume, incremental processing, data auditing, data sustainability and alternative supplementary data sources need to be considered; the internal multi-path data source comprises user behavior and transaction data precipitated by a business system, internal documents of enterprises accumulated for a long time, and the like. The obtained data source is stored by adopting a data lake technology, and the data lake can store structured data (such as tables in a relational database), semi-structured data (such as CSV, XML and JSON), unstructured data (such as documents, PDF, images, audio and video) and the like. In this embodiment, the architecture of fig. 4 may be used to acquire, store, and manage data, where the architecture includes an access layer, data storage, data processing, rights management, data classification, and data quality; when mining targeted business data for an enterprise, it is often known where the data came from, how to standardize the process, who has access to and change each of the data sets, and what the data was applied to model modeling.
In some embodiments, in step S3, a data dereferencing rule is set for data acquisition of the index data model, where the data dereferencing rule is a description language defined during acquisition and storage (see the example in fig. 3), and the data acquisition of the index data model sets a confidence score table according to a source of a data source; and the data verification of the index data model is provided with a validation constraint rule, and the validation constraint rule comprises the blood-related directed graph matching degree.
In this embodiment, after determining a main data table corresponding to a target service, for each data table, index data modeling needs to be performed; specifically, as shown in fig. 3, the index data definition content includes: the method comprises the following steps of field definition, primary key definition, internal and external source definition of data, conversion rule for acquiring warehousing from a data source, current data version, access authority and the like. For the data value taking rule, business personnel or data processing and analyzing personnel are allowed to automatically take values from a data source by defining a description language similar to a storage process; the embodiment can also define the constraint rule of the field, automatically verify the accuracy of the data through the constraint rule, or adopt the data confidence score. And based on the index data model definition, performing preliminary automatic merging processing on the row data and the column data of the data source data respectively. Such as: row merging is carried out based on the alignment of the main keys, and data points can be selected and verified according to the data confidence score; when the conflict occurs, data processing personnel or service personnel or related personnel are required to participate in manual decision, and a fault-tolerant table is formed, so that the future data processing is more automatic and standardized. By the method, the speed and the quality of data fusion are increased, and data enhancement and credibility are ensured.
And S4, storing the target service table into a database buffer table to prompt confirmation (for service personnel, data personnel, auditors or other related personnel), and storing the confirmed data into the service database and issuing the data. Preferably, the confirmation operation is provided with access authority, confirmation personnel authority and auditor authority.
In some embodiments, in step S4, the column attributes in the target service table correspond to the attribute table set, the row attributes correspond to the data after data fusion in the target service table, and the database buffer table gives a confirmation prompt according to the repeat, missing, and error information of the data obtained by the data self-inspection. For the repeated condition of the data, a confidence probability table is built according to the data sources, and confidence probabilities are assigned to the data of different data sources, so that deletion and combination are facilitated. And for the missing condition of the data, providing associated original data acquired by the data source for confirmation and addition, creating a behavior record rule table to record the data source and the original data position, carrying out value adjustment on the column attribute field or editing a new value source and a new value rule, and recording in the behavior record rule table so as to carry out confidence recommendation for subsequent confirmation. For the error condition of data, providing the original data of the column attribute and the row attribute collected by the data source for confirmation and correction, and creating a behavior record rule table to record the mapping relation of the data source, the original data column attribute and the row attribute.
In order to realize the timely monitoring and data updating of the incremental data, the invention continuously monitors the data incremental change of a plurality of data sources, updates the incremental data according to the blood-edge directed graph by the updating engine, inputs the updated incremental data into the buffer table of the database to prompt whether to update and confirm the updated incremental data, provides confirmation work for business personnel, data personnel, auditors or other related authority personnel, and correspondingly loads the confirmed incremental data in the business database.
In some embodiments, the data fusion method in step S2 of the present invention is as follows:
taking data sources of different sources as different data sets, carrying out similarity measurement of a kernel function K () by taking data elements in the different data sets as support vector samples through a support vector machine model according to column attributes of a target service table, calculating a weighted sum value M according to the following formula, and making a decision according to the size of the weighted sum value M:
M=sgn(∑a i y i k(x i x) + b), where a) i y i Is a weight value, k (x) i X) is a support vector x i Non-linear transformation of (2), x i For the input vector, b represents the parameters in the support vector machine model. As shown in fig. 5, the input vector X = (X) in fig. 5 1 ,x 2 …x i ) Deriving K (x) based on the variation of the support vector kernel K () i X), the support vector machine model adopts a support vector machine model analysis method of multi-source data fusion, data from different sources are regarded as different data sets, and the decision process of the SVM is as follows: firstly, taking a plurality of data elements as input samples to perform similarity comparison on a series of templates x, wherein the template samples are support vectors determined in a training process, and the adopted similarity measurement is a kernel function K (); secondly, after the samples pass through the kernel function, the samples are weighted and summed with scores obtained after being compared with all the support vector samples, and the weight value is the product a of the coefficient a of each support vector obtained in the training process and the class label y i y i (ii) a And finally, making a decision according to the size of the weighted sum value M.
A comprehensive energy enterprise data governance system based on data indexes comprises:
the basic information definition module is used for creating and defining a corresponding attribute table set according to the business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table, and basic attribute information of each table in the attribute table set is defined; the attributes of each table in the attribute table set of the preferred basic information definition module correspondingly construct a blood-related directed graph, and the index data definition of the index data model comprises field definition, internal and external source definition of data, conversion rule for acquiring warehousing from a data source, current data version and access right.
The index data model is constructed on the basis of the attribute table set; the optimized index data model comprises a data dereferencing rule, the data dereferencing rule is a description language defined during acquisition and storage, and the index data model sets a confidence coefficient rating table according to the source of a data source during data acquisition; the index data model comprises a check constraint rule, and the check constraint rule comprises the matching degree of the blood margin directed graph.
An example of the definition of the basic information definition module and the index data model (see fig. 6) is as follows:
org _ c, unit coding. prt _ pd, report period. idx _ v, numerical value. idx _ t, time dimension. idx _ u, units. dimension, dimension. buz _ t _ c, subject field. And if pk is the main key, the system automatically performs recording and judging according to the pk combination.
fft, a fault-tolerant table corresponding to the field, for standardizing the value of the field.
etl, defining field value logic, can define multiple data sources. source, data source and mapping field definitions.
check, constraint and check logic definition, such as: non null (notNull), a point value (ref) included in another table, a boundary interval value (gt =10, lt = 20), and the like.
search, whether a field can be retrieved, query (query), text retrieval (index).
confidence, data quality evaluation, value confidence, user can set as 1, or after multi-source comparison, the confidence probability is confirmed; for cross-organization data, the data may be verified through a data security algorithm (e.g., federal machine learning).
ver, version definition, which is used to manage whether the version is released to the gold copy.
auth, access rights definition, comprising: read only (readonly), editable (edit), exportable (export).
Preferably, in addition to the structured data generated by the business system, the data from the internal and external data sources are merged, and in order to be used for products and analysis, editing and confirming work needs to be carried out on table records and column values to guarantee high quality and reliability of the data. And opening a data table in the project, storing and generating a data view after an index data model is defined for the first time, switching to a corresponding data view page, displaying the data list, searching according to the field with the set search attribute, clicking the record needing to be edited and confirmed, and entering a data editing page. The system firstly carries out data fusion of a plurality of data sources according to the definition of index data, for example, the multi-source data is merged according to the primary key information, but in the merging process, if the fact that merging cannot be carried out due to the similarity difference of the primary keys is found, business personnel or data maintenance personnel can manually confirm merging or deleting one of the data. For example, for the service table a, only the data from the two data sources need to be de-overlapped according to the primary key information of the table a, but it is possible that different primary keys correspond to the same record due to different definitions of some field values of the data sources, so that in this case, a service person or a data maintainer may manually confirm to merge or delete one of the records, and simultaneously map the different field values to the field values of the unified specification, and store the field values into the service data fault-tolerant table for automatic identification and standardized value taking of future programs.
The index data model includes: (1) column names: listing important fields of the business table A to be edited and confirmed; (2) data source: listing values of all data sources on each field; (3) to-commit (pre-master commit): automatically generating a value according to a value rule of each field in the index data definition, verifying, and prompting manual confirmation if the verification rule is not met; for example: when the values of the data sources A and B are different, (4) editing: the values of all the fields can be manually modified; (5) confidence probability (confidence): the confidence probability of each field value can be actively predicted in the early stage of a user, the reliability of the confirmed data is set to be 1, after a period of confirmation, the platform compares each table data field value of each data source with the final confirmed value to obtain respective accuracy SP, namely: SP = the same value times/total times of comparison with the field, and when extracting data, if the field appears in more than 2 data sources, the data time point co-occurrence probability OP is calculated, namely: OP = number of co-occurrences/number of data sources, and final confidence probability = SP × OP. After confirming each field, the storage work of the whole record is carried out, and after the storage, the version number is added with 1.
The target service table generation module is used for acquiring a plurality of data sources to perform data fusion according to the index data model to generate a target service table, and data acquisition, data conversion, data verification and data fusion operation are performed through a fusion engine in the data fusion process;
and the confirming and releasing module is used for storing the target service table into the database buffer table and prompting to confirm, and storing the confirmed data into the service database and releasing the data.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (10)

1. A comprehensive energy enterprise data management method based on data indexes is characterized by comprising the following steps: the method comprises the following steps:
s1, creating and defining a corresponding attribute table set according to a business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table;
s2, constructing and obtaining an index data model based on the attribute table set;
s3, acquiring a plurality of data sources, performing data fusion according to the index data model to generate a target service table, and performing data acquisition, data conversion, data verification and data fusion operation through a fusion engine in the data fusion process;
and S4, storing the target service table into a database buffer table to prompt confirmation, and storing the confirmed data into a service database and issuing the data.
2. The method for governing the data of the integrated energy enterprise based on the data index as claimed in claim 1, wherein: in S1, constructing a blood-related directed graph corresponding to the attributes of each table in the attribute table set; and the index data definition of the index data model in the step S2 comprises field definition, internal and external source definition of data, conversion rule for acquiring warehouse entry from a data source, current data version and access authority.
3. The method for governing the data of the integrated energy enterprise based on the data index as claimed in claim 1, wherein: in S3, a data dereferencing rule is set for data acquisition of the index data model, the data dereferencing rule is a description language defined during acquisition and storage, and the data acquisition of the index data model sets a confidence score table according to a source of a data source; and the data verification of the index data model is provided with a verification constraint rule, and the verification constraint rule comprises the blood-related directed graph matching degree.
4. The data index-based comprehensive energy enterprise data management method according to claim 1, characterized in that: in S4, the column attributes in the target service table correspond to the attribute table set, the row attributes correspond to the data after data fusion in the target service table, and the database buffer table gives a confirmation prompt according to the repeated, missing and error information of the data obtained by data self-inspection; for the repeated condition of the data, establishing a confidence probability table according to the data sources, and assigning confidence probabilities to the data of different data sources; for the missing condition of the data, providing associated original data collected by a data source for confirmation and addition, and creating a behavior recording rule table to record the data source and the original data position; for the error condition of the data, providing the original data of the column attribute and the row attribute collected by the data source for confirmation and correction, and creating a behavior recording rule table to record the mapping relation of the data source, the original data column attribute and the row attribute.
5. The data index-based comprehensive energy enterprise data management method according to claim 1, characterized in that: the data fusion method in step S2 is as follows:
taking data sources from different sources as different data sets, carrying out similarity measurement of a kernel function K () by taking data elements in the different data sets as support vector samples through a support vector machine model according to column attributes of a target service table, calculating a weighted sum value M according to the following formula, and carrying out decision making according to the size of the weighted sum value M:
M=sgn(∑a i y i k(x i x) + b), where a) i y i Is a rightWeight value, k (x) i X) is a support vector x i Non-linear transformation of (2), x i For the input vector, b represents the parameters in the support vector machine model.
6. The data index-based comprehensive energy enterprise data management method according to claim 2, characterized in that: and continuously monitoring data increment change of the plurality of data sources, updating the increment data according to the blood-related directed graph by an updating engine, inputting the updated increment data into a database buffer table to prompt whether to update and confirm, and correspondingly loading the confirmed increment data in a business database.
7. The method for governing the data of the integrated energy enterprise based on the data index as claimed in claim 1, wherein: and step S4, confirming that the operation is provided with access authority, personnel authority and auditor authority.
8. The utility model provides an integrated energy enterprise data governance system based on data index which characterized in that: the method comprises the following steps:
the basic information definition module is used for creating and defining a corresponding attribute table set according to the business analysis target, wherein the attribute table set comprises a main data table, an index table and a dimension table, and basic attribute information of each table in the attribute table set is defined;
the index data model is constructed on the basis of the attribute table set;
the target service table generation module is used for acquiring a plurality of data sources to perform data fusion according to the index data model to generate a target service table, and data acquisition, data conversion, data verification and data fusion operation are performed through a fusion engine in the data fusion process;
and the confirming and releasing module is used for storing the target service table into the database buffer table and prompting to confirm, and storing the confirmed data into the service database and releasing.
9. The integrated energy enterprise data management system based on data indicators of claim 8, wherein: and the attributes of each table in the attribute table set of the basic information definition module correspondingly construct a blood-related directed graph, and the index data definition of the index data model comprises field definition, internal and external source definition of data, conversion rule for acquiring the data from a data source and putting the data in storage, current data version and access right.
10. The integrated energy enterprise data management system based on data indicators of claim 8, wherein: the index data model comprises a data dereferencing rule, the data dereferencing rule is a description language defined during acquisition and storage, and a confidence rating table is set according to a source of a data source during data acquisition; the index data model comprises a validation constraint rule, and the validation constraint rule comprises the matching degree of the blood-related directed graph.
CN202210920861.6A 2022-08-02 2022-08-02 Data index-based comprehensive energy enterprise data management method and system Pending CN115269578A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210920861.6A CN115269578A (en) 2022-08-02 2022-08-02 Data index-based comprehensive energy enterprise data management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210920861.6A CN115269578A (en) 2022-08-02 2022-08-02 Data index-based comprehensive energy enterprise data management method and system

Publications (1)

Publication Number Publication Date
CN115269578A true CN115269578A (en) 2022-11-01

Family

ID=83747651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210920861.6A Pending CN115269578A (en) 2022-08-02 2022-08-02 Data index-based comprehensive energy enterprise data management method and system

Country Status (1)

Country Link
CN (1) CN115269578A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
US20150286969A1 (en) * 2014-04-08 2015-10-08 Northrop Grumman Systems Corporation System and method for providing a scalable semantic mechanism for policy-driven assessment and effective action taking on dynamically changing data
CN114356933A (en) * 2022-01-04 2022-04-15 执中数据科技(苏州)有限责任公司 Enterprise data management method and device based on metadata

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599126A (en) * 2009-04-22 2009-12-09 哈尔滨工业大学 Utilize the support vector machine classifier of overall intercommunication weighting
US20150286969A1 (en) * 2014-04-08 2015-10-08 Northrop Grumman Systems Corporation System and method for providing a scalable semantic mechanism for policy-driven assessment and effective action taking on dynamically changing data
CN114356933A (en) * 2022-01-04 2022-04-15 执中数据科技(苏州)有限责任公司 Enterprise data management method and device based on metadata

Similar Documents

Publication Publication Date Title
WO2021103492A1 (en) Risk prediction method and system for business operations
Lenarduzzi et al. MVP explained: A systematic mapping study on the definitions of minimal viable product
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN113515644B (en) Knowledge-graph-based hospital science and technology portrait method and system
Lee et al. Automatic generation of concept hierarchies using WordNet
CN115547466B (en) Medical institution registration and review system and method based on big data
Danping et al. The data mining of the human resources data warehouse in university based on association rule
Xu et al. CET-4 score analysis based on data mining technology
Bâra et al. Improving decision support systems with data mining techniques
CN115982429B (en) Knowledge management method and system based on flow control
Jiang et al. Research on BIM-based Construction Domain Text Information Management.
CN116894152A (en) Multisource data investigation and real-time analysis method
CN116467403A (en) Enterprise identity information data fusion method and device
Hou et al. Research on unstructured data processing technology in executing audit based on big data budget
Dong et al. Scene-based big data quality management framework
CN115269578A (en) Data index-based comprehensive energy enterprise data management method and system
Tytenko et al. Software and information support for business analysis in enterprise management
Liu et al. A Preliminary Approach of Constructing a Knowledge Graph-based Enterprise Informationized Audit Platform
Yang et al. Evaluation and assessment of machine learning based user story grouping: A framework and empirical studies
CN111814457A (en) Power grid engineering contract text generation method
Jin et al. Construction and application of knowledge graph of domestic operating system testing
Olegovich Dorodnykh et al. Using the Semantic Annotation of Web Table Data for Knowledge Base Construction
Widad et al. Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis
CN117934209B (en) Regional power system carbon emission big data analysis method based on knowledge graph
Zhang Application and Analysis of Big Data Mining in the Foreign Affairs Translation System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221101