CN117520352A - Multi-source heterogeneous data management system and method for complex industrial process - Google Patents

Multi-source heterogeneous data management system and method for complex industrial process Download PDF

Info

Publication number
CN117520352A
CN117520352A CN202410002248.5A CN202410002248A CN117520352A CN 117520352 A CN117520352 A CN 117520352A CN 202410002248 A CN202410002248 A CN 202410002248A CN 117520352 A CN117520352 A CN 117520352A
Authority
CN
China
Prior art keywords
data
industrial
module
industrial data
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410002248.5A
Other languages
Chinese (zh)
Inventor
杨灵运
赵千川
杨文峰
李鑫
赵紫怡
王明慧
肖应强
王雄
陈竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Casicloud Technology Co ltd
Tsinghua University
Original Assignee
Guizhou Casicloud Technology Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Casicloud Technology Co ltd, Tsinghua University filed Critical Guizhou Casicloud Technology Co ltd
Priority to CN202410002248.5A priority Critical patent/CN117520352A/en
Publication of CN117520352A publication Critical patent/CN117520352A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data management, in particular to a multi-source heterogeneous data management system and method for a complex industrial process. The system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring and preprocessing industrial data generated by each node and adding unique data identification to the industrial data according to a data source node; the data checking module is used for probing the industrial data, generating an industrial data probing report and judging the rule of the industrial data probing report; the data planning module is used for carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement; the data auditing module is used for judging the accuracy, the integrity, the consistency and the effectiveness requirements of the corresponding levels and types of industrial data in the data resource pool; and the data calculation module is used for acquiring the data application requirements, and collecting data from the data resource pool according to the data application requirements to perform processing calculation to generate an application data set.

Description

Multi-source heterogeneous data management system and method for complex industrial process
Technical Field
The invention relates to the technical field of data management, in particular to a multi-source heterogeneous data management system and method for a complex industrial process.
Background
For complex industrial processes, huge historical/real-time space-time big data is brought, wherein the large data comprises three parts, namely business data related to enterprise operation, and the business data mainly originate from an enterprise internal informatization management system, including PLM, ERP, valley MES, SCM, CRM and the like. Such data, such as product, industrial, production, purchase, order, service, etc., are the core data assets of the enterprise, with structured data being the dominant. The second part is the interconnection data of production line equipment, mainly refers to the data of working conditions (such as pressure, temperature, vibration, stress and the like), running state, environmental parameters and the like of production lines, equipment, logistics and the like in the production process, and is generally collected from equipment PLC, SCADA and part of external sensors, wherein the data mainly comprise time sequence data, the data volume is large, and the collection frequency is high. The third part is enterprise external data, including relevant data such as working conditions, operation and maintenance after the product is delivered to the user, and also includes a large amount of data from external environments such as internet markets, environments, supply chains, network communities and the like. The product operation service data is mainly structured data, such as business data fusion inside enterprises.
The data generated in the complex industrial process has the problems of large data volume, multiple sources, wide distribution, multiple data types, complex structure, poor data quality caused by various factors such as sensors, human operation factors, system errors, multi-heterogeneous data, network transmission and the like. Poor data quality affects decisions and operations of enterprises, so how to perform standardized and unified management on complex data in an industrial process is a problem to be solved urgently.
Disclosure of Invention
The technical problem solved by the invention is to provide a multi-source heterogeneous data management system and a multi-source heterogeneous data management method for a complex industrial process, which can provide unified and normative data management for massive complex data generated in the industrial process.
The basic scheme provided by the invention is as follows: a multi-source heterogeneous data management system for a complex industrial process comprises a data acquisition module, a data checking module, a data planning module, a data auditing module and a data calculation module;
the data acquisition module is used for acquiring and preprocessing industrial data generated by each node, and adding unique data identification to the industrial data according to the data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data;
the data checking module is used for probing the industrial data, generating an industrial data probing report, judging the rule of the industrial data probing report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to a preset rule;
the data planning module is used for carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement, and adding classification labels from different dimensions to establish a corresponding database for storage to form a data resource pool;
the data auditing module is used for judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool, and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool;
the data calculation module is used for acquiring data application requirements, collecting data from the data resource pool according to the data application requirements, and processing and calculating to generate an application data set
The principle of the invention is as follows: first, industrial data generated by each node of the industrial process is collected, preprocessed, and marked with a unique data identifier. The industrial data comprises business data related to enterprise operation, interconnection data of production line equipment and enterprise external data, and all data generated by the industrial process are collected. And then, carrying out exploration analysis on the acquired industrial data to generate an exploration report, finding out the preliminary quality problems in the original data of the acquired industrial data, aiming at the preliminary quality problems of the data obtained by exploration, such as removing abnormal values, filling up missing values, converting data formats and the like, improving the data quality, fusing the data from different sensors, systems or networks, eliminating data redundancy and noise, and improving the data reliability. And the data meaning is known by preliminarily probing the existing data resource conditions, field conditions, data quality and the like in a data probing mode. And planning the data, carrying out unified planning on industrial data logic and logistics storage through data planning, realizing data resource classification and library building according to an organization scheme of unified and flow specification of data definition standard and according to data application requirements, dividing corresponding physical databases according to a data logic layer, carrying out classification label definition on the data in each physical database to form a data resource pool, and then carrying out accuracy, integrity, consistency and timeliness judgment processing on the industrial data in the data resource pool to ensure that the data quality is further ensured, and setting different judgment requirements for different types of data in different layers so as to meet the optimization of calculation power. And when the data application requirement is acquired, processing and utilizing the industrial data in the data resource pool after the data quality judgment, thereby ensuring the high data quality when the industrial data is used.
Compared with the prior art, in the scheme, the data quality inspection is completed through two aspects, in the first stage, the primary quality inspection is performed on the original data just acquired, the obtained primary quality problem which can be inspected through simple rule judgment is checked, the existing abnormal value and the missing value are processed, and the redundant data are fused. And then, storing the industrial data in a hierarchical manner through different databases, and performing type division on the industrial data in the different hierarchical manners to form a data resource pool. And then judging and processing the accuracy, the integrity, the consistency and the timeliness of the industrial data of the data resource pool, and screening out the industrial resource data which does not meet any one of the data. Therefore, aiming at different levels and different types of industrial data, different requirements are judged at the corresponding levels and types of the industrial data, so that the generated industrial data is in the corresponding levels and requirements, meanwhile, the whole process from data acquisition to data resource pool generation is ensured, the data abnormality caused by various factors, such as sensor failure, personnel filling errors, system faults and the like, is ensured, and the reality and reliability of the data are ensured.
Further, the data checking module comprises a probing configuration module and a data probing module;
the probe configuration module is used for configuring data sources for probing and probe rules;
and the data exploration module is used for exploration of one or more of a data set, a data field and data quality of the industrial data in the selected data sources according to exploration rules.
By configuring the data source to be probed and the probing rules, the data set, the data field and the data quality of the industrial data of the selected data source are probed according to the probing rules, the data set is composed of data total quantity, data updating condition and business meaning, the data field is composed of field format, value distribution and the data quality is composed of null values.
Further, the data planning module comprises a hierarchy dividing module;
the hierarchical division module is used for dividing the industrial data according to a data hierarchy, wherein the data hierarchy comprises original data, basic data, theme data and knowledge data.
The industrial data are divided according to different levels, wherein the original data are unprocessed data which are not modified, cleaned and screened, the basic data are data which are divided according to the basic information of the data after the original data are processed, the subject data are data which are divided according to the subject of the data after the original data are processed, and the knowledge data are knowledge type data such as rules, basic knowledge and the like.
Further, the data planning module comprises a type classification module, wherein the type classification module comprises a security classification module, a source classification module and an influence classification module;
the secret classification module is used for classifying the industrial data according to secret levels, wherein the secret levels comprise public, general, confidential and secret;
the source classification module is used for classifying industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet;
the influence classification module is used for classifying according to event influence grades associated with industrial data, wherein the event influence comprises a primary influence event, a secondary influence event and a tertiary influence event.
The industrial data is classified according to application requirements and different dimensions, classification labels can be added according to the secret level, the source and the influence of the industrial data, and the industrial data can be utilized in various manners through the multi-dimensional classification form.
Further, the system also comprises a data cataloging module;
and the data cataloging module is used for cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data and then putting the industrial data into the data resource pool.
The system also comprises a data cataloging module;
and the data cataloging module is used for cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data and then putting the industrial data into the data resource pool.
And cataloging each class of each level, so that the data in the data resource pool is more convenient to view and the industrial data in the data resource pool is easier to select.
The invention also discloses a multi-source heterogeneous data treatment method of the complex industrial process, which comprises the following steps:
s100: collecting and preprocessing industrial data generated by each node, adding unique data identification to the industrial data according to a data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data, and storing the industrial data;
s200: probing the industrial data, generating an industrial data probe report, judging the rule of the industrial data probe report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to a preset rule;
s300: carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement, and adding classification labels from different dimensions to establish a corresponding database for storage to form a data resource pool;
s400: judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool, and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool;
s500: and acquiring data application requirements, and collecting data from the data resource pool according to the data application requirements to perform processing calculation to generate an application data set.
Further, the step S200 includes the steps of:
s210: configuring data sources for probing and a probing rule;
s220: one or more of the data set, the data field, and the data quality of the industrial data in the selected data source is probed according to the probing rules.
Further, the step S300 includes the steps of:
s310: the industrial data is divided according to a data hierarchy, wherein the data hierarchy comprises original data, basic data subject data and knowledge data.
Further, the step S300 further includes the steps of:
s321: classifying the industrial data according to a secret level, wherein the secret level comprises public, general, confidential and extremely confidential;
s322: classifying industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet;
s323: the event impacts associated with the industrial data are classified hierarchically, including primary impact events, secondary impact events, and tertiary impact events.
Further, the method also comprises the following steps:
s600: cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data, and then putting the industrial data into a data resource pool.
Drawings
FIG. 1 is a logic block diagram of an embodiment of a multi-source heterogeneous data remediation system for a complex industrial process of the present invention;
FIG. 2 is a logic block diagram of a data inventory module in an embodiment of a multi-source heterogeneous data management system for a complex industrial process according to the present invention;
FIG. 3 is a logic block diagram of a data planning module in an embodiment of a multi-source heterogeneous data management system for a complex industrial process according to the present invention.
Detailed Description
The following is a further detailed description of the embodiments:
an example is substantially as shown in figure 1:
a multi-source heterogeneous data management system for a complex industrial process comprises a data acquisition module, a data judgment module, a data planning module, a data auditing module, a data cataloging module and a data calculation module.
The data acquisition module is used for acquiring and preprocessing industrial data generated by each node, and adding unique data identification to the industrial data according to the data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data. Specifically, the data acquisition module acquires industrial data generated by each node, the acquired data sources comprise business data from an enterprise internal informatization system, production line equipment interconnection data acquired by various sensors on production line, equipment, physical working conditions, operation environments and other data in the generation process, and enterprise external data acquired by an external system and the Internet.
In this embodiment, the data acquisition modes include the following modes:
1. the relational database collection supports the collection of data by incremental and full data collection strategies, supports the collection of data of domestic and foreign mainstream databases such as MySQL, oracle, sybase, sqlSever, postgreSQL, kingbaseES, dream of arrival (DM), general south China (GBase) and the like, and is suitable for the collection of structured relational databases.
2. File data collection, which supports file collection from a file server (FTP, SFTP, samba, NFS, file directory), and can analyze and process the file, load the file into a target storage, and file analysis supports XML, JSON, CSV, excel, text, word and other formats. The method is suitable for structured and unstructured file data acquisition.
3. The interface data acquisition supports two modules, namely active acquisition and reporting acquisition, the active acquisition supports REST and SOAP, the extraction of text message data and JSON, CSV, XML can be customized, the reporting acquisition mainly configures an interface mode to directly report data to a third party application, and the interface data acquisition system is suitable for acquiring data in a borrowing mode.
4. The data collection of the message queue supports the data of the message queue, analyzes and receives the data, loads the data into the target storage, supports the RabbitMQ, kafaka, activeMQ common message queues and the like, and is suitable for transmitting the data by adopting the message queue.
5. The service data is filled and collected, a data collection form is configured based on the form, a user can directly fill in data and report the data to a database, and the method is suitable for filling and collecting small-batch service data.
NoSQL data collection provides for collection of commonly used NoSQL database (ElasticSearch, mongoDB) data and loading into a target store, suitable for commonly used NoSQL databases.
When the industrial data is acquired, a unique data identifier is added to the industrial data according to the data source.
The data checking module is used for probing the centered industrial data, generating an industrial data probing report, judging the rule of the industrial data probing report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to the preset rule.
The data checking module comprises a probing configuration module and a data probing module.
And the exploration configuration module is used for configuring the data source and exploration rules for exploration.
And the data exploration module is used for exploration of one or more of a data set, a data field and data quality of the industrial data in the selected data sources according to exploration rules. The data set is data total amount and update condition, the data field is field format and value distribution, and the data quality is whether the field is null value or not.
Specifically, the method comprises the steps of firstly performing newly added data exploration configuration, selecting a data source name and a database type, editing a data source description, and then configuring exploration rules, such as data set exploration, specifically configured active data sources, including Schema, exclusion table rules, whether data volume statistics is enabled, whether timing connection test is enabled, and whether data exploration is enabled. After the probing rule is configured, the probing rule is executed, and finally a probing report is output. The report is presented in the form of a table, and in this embodiment, the header includes a sequence number, an original table name, an original database name, a Schema, a data amount (record), a storage amount (estimate), a field number, and a search time. Also for example, field probing, configuring an applicable table of field probing, applicable fields, and probing rules, such as null rate probing, repeated value probing, etc. The output probe report includes field name, chinese name, field type, field constraint, and the probe condition, in this example the probe rule is a null rate probe, and the null rate rule is satisfied for the generation of the probe condition for the field where the probe has a null value. In the scheme, common problems of basic data quality of data which can be found through simple rule judgment such as abnormal values, redundant data and the like can be detected.
After the data exploration is completed, the industrial data with preliminary data anomalies are simply processed, such as removing outliers, filling missing values, converting data formats, fusing data from different sensors, systems and networks, eliminating data redundancy, noise and the like, and can be completed through the existing data processing algorithm, which is not repeated herein.
And the data planning module is used for dividing the industrial data subjected to the primary quality problem treatment into data logic layers according to the data application requirements, establishing corresponding databases for storage and forming, and adding classification labels to the industrial data in each database from different dimensionalities to form a data resource pool.
The data planning module comprises a hierarchy dividing module.
The hierarchical division module is used for dividing the industrial data according to a data hierarchy, wherein the data hierarchy comprises original data, basic data subject data and knowledge data.
The data planning number includes a data logic planning and a data physical planning, wherein the data logic planning can be performed according to a data hierarchy, in this embodiment, a data hierarchy is planned by using raw data, basic data subject data and knowledge data, for example, a hierarchy domain under the raw data hierarchy includes an internal system, an external system and an equipment sensor. The hierarchical fields under the base data hierarchy include factory base data, factory level data, factory verification data, factory security data, factory related data, production order data, incident survey data, and the like.
The data planning module comprises a type classification module, wherein the type classification module comprises a security classification module, a source classification module and an influence classification module.
The secret classification module is used for classifying the industrial data according to secret levels, the classification dimension is used for defining the secret levels of the data, the design password data is suitable for safety management according to management requirements, and the secret levels comprise public, general, confidential and secret-proof. The disclosure refers to data which can be disclosed externally, generally refers to data which can be disclosed to an internal unit, the confidential refers to data with secret information, the data which can be disclosed in a limited range, the confidential refers to the confidential information, and only a part of mastered personnel can view and understand the confidential information. For the division of the secret level, the secret level setting may be performed in a preset manner according to specific requirements in this embodiment, for example, the contract order data may be set as the secret level, and when the industrial data of the contract order is acquired, it is classified as the secret data. The recipe may be set as confidential data, and sales may be set as public data.
And the source classification module is used for classifying the industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet.
The influence classification module is used for classifying the influence of the event associated with the industrial data according to the classification, and the classification dimension may have potential influence on industrial production, economic benefit and the like after the industrial data of different categories are tampered, destroyed or illegally utilized. The event impact includes a primary impact event, a secondary impact event, and a tertiary impact event. The first-level influence event means that after being tampered, destroyed or illegally utilized, the influence on the normal operation of an industrial control system, equipment, an industrial internet platform and the like is smaller, the negative influence on enterprises is smaller, or the direct economic loss is smaller, and the cost for recovering industrial data or eliminating the negative influence is smaller. The secondary influence event means that after being tampered, destroyed or illegally utilized, a large or serious production safety accident or an emergency environmental event is easily caused, a large negative influence is caused to enterprises, or direct economic loss is large, cascading effect is obviously caused, the influence range relates to a plurality of industries, areas or a plurality of enterprises in the industries, or the influence duration event is long, or a large number of suppliers and customer resources can be illegally acquired or a large amount of industrial data of personal information is leaked. The third-level influence event refers to industrial data which is easy to cause particularly serious production safety accidents or sudden environmental events after being tampered, destroyed or illegally utilized, or causes particularly huge direct economic loss and serious influence on national economy, industry development, public benefits, social order and even international safety. In this embodiment, the setting is also performed in a preset manner according to specific requirements.
The data logic planning of the industrial data is completed by carrying out hierarchical division on the industrial data and classifying the industrial data. And then, according to the establishment of the physical database, storing data of different levels and different classifications by different databases, completing the data physical planning of industrial data so as to form a data resource pool, and when the data is newly added, adding a target table in the corresponding database.
And the data auditing module is used for judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool.
And then, sequentially auditing the industrial data in each level and class in the data resource pool, and judging the accuracy, the integrity, the consistency and the timeliness of the industrial data of each level and each type according to the requirements.
1. The accuracy of the data reflects the authenticity and reliability of the statistical data, and the accuracy of the data is influenced by two factors, namely the reliability of data sources and the accuracy of data recording and processing. In this embodiment, accuracy of industrial data is determined by three manners of external data verification, data mutual verification and data rationality determination, wherein the external data verification is to industrial data with a plurality of data sources, when the industrial data is verified for accuracy, the same type of data extracted from original data to the plurality of data sources is compared with each other, and when the data source ratio is higher, the accuracy of the data is determined to pass. The data mutual verification is to compare the industrial data after data processing, for example, the average value of the same index before and after data processing, and the huge difference before and after data processing is generated and does not accord with logic, so that the data processing is indicated to be problematic. And judging the rationality of the data, and if the value which is not in the reasonable range appears, the data accuracy is not met by presetting the reasonable range of the data or setting the reasonable range according to the data distribution.
2. And judging whether the data integrity value data is complete or not, wherein the data integrity is closely related to the data accuracy, and only the data integrity can ensure the data accuracy. In this embodiment, the data integrity is judged by an external data certificate. And comparing the data of the plurality of data sources to judge whether the industrial data is complete.
3. Judging data consistency, wherein the data consistency refers to whether the same data is consistent among different links or different event points, and the data consistency is an important representation of data quality. In this embodiment, the consistency of data is determined by a KDE (kernel density estimation) distribution map, and the distribution characteristics of the data sample itself can be intuitively seen by the kernel density estimation map.
4. And (3) judging timeliness, wherein the data timeliness value is the speed of data generation, acquisition and processing and the response timeliness. In the scheme, the timeliness of the data is judged through the time stamp during the generation, collection and processing of the data.
For industrial data of different levels and different classifications, the respective judging requirements of data accuracy, integrity, consistency and timeliness can be set to be different. For example, for industrial data with the density grade being disclosed, the accuracy and timeliness judging requirements can be low, and for serious events with the influence classification, the accuracy requirement is high, the integrity requirement is high, the consistency requirement is high and the timeliness requirement is high. By setting different levels, after the requirements of data accuracy, integrity, consistency and timeliness of different classified industrial data, the quality of the industrial data in the corresponding database is judged according to the corresponding requirements, so that reasonable distribution of system calculation force is achieved, and unnecessary resource consumption is avoided.
And the data calculation module is used for acquiring the data application requirements, and collecting data from the data resource pool according to the data application requirements to perform processing calculation to generate an application data set. The method comprises the steps of extracting data features by establishing a model for industrial data in each database in advance, carrying out semantic analysis on the data application requirements according to the data application requirements when the data application requirements are met, carrying out matching with the extracted data extraction according to semantic analysis results, and extracting data corresponding to feature matching to form a new data set which is used as an application data set for application.
And the data cataloging module is used for cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data and then putting the industrial data into the data resource pool. And the catalog is written for the data, so that the data is convenient to view.
The embodiment also discloses a multi-source heterogeneous data treatment method of the complex industrial process, which comprises the following steps:
s100: collecting and preprocessing industrial data generated by each node, and adding unique data identification to the industrial data according to a data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data;
s200: probing the industrial data, generating an industrial data probe report, judging the rule of the industrial data probe report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to a preset rule;
s300: carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement, and adding classification labels from different dimensions to establish a corresponding database for storage to form a data resource pool;
s400: judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool, and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool;
s500: and acquiring data application requirements, and collecting data from the data resource pool according to the data application requirements to perform processing calculation to generate an application data set.
The step S200 includes the steps of:
s210: configuring data sources for probing and a probing rule;
s220: one or more of the data set, the data field, and the data quality of the industrial data in the selected data source is probed according to the probing rules.
The step S300 includes the steps of:
s310: the industrial data is divided according to a data hierarchy, wherein the data hierarchy comprises original data, basic data subject data and knowledge data.
The step S300 further includes the steps of:
s321: classifying the industrial data according to a secret level, wherein the secret level comprises public, general, confidential and extremely confidential;
s322: classifying industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet;
s323: the event impacts associated with the industrial data are classified hierarchically, including primary impact events, secondary impact events, and tertiary impact events.
The method also comprises the following steps:
s600: cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data, and then putting the industrial data into a data resource pool.
Example two
The difference between the present embodiment and the first embodiment is that, in the present embodiment, the apparatus further includes a belonging type dividing module further includes an importance level module,
the importance level module is used for identifying the importance level of the industrial data according to the classification of each dimension of the industrial data and generating a dimension importance score, wherein in the secret level classification dimension, the importance level is in turn from high to low as secret, general and public, in the data source classification dimension, the importance level is in turn from high to low as an internal system, an internal device, an external unit and the Internet, the importance level is in turn from high to low as an event influence classification dimension, the importance level is in turn from high to low as a three-level influence event, a two-level influence event and a one-level influence event, and the comprehensive score is calculated according to preset weight of each dimension and the importance score of each dimension;
specifically, the importance degree score is performed according to the classified dimension of the secret level, the importance score of the industrial data with the highest confidentiality is 4 points, the confidentiality is 3 points, generally 2 points, and the disclosure is 1 point. In the source classification dimension, the internal system is 4 minutes, the internal device is 3 minutes, the external unit is 2 minutes, and the Internet is 1 minute. In the influence classification dimension, the three-level influence event is 3 points, the two-level influence event is 2 points, and the one-level influence event is 1 point.
When one industry data secret level is confidential, the source is internal equipment, and the impact is a secondary impact event, the importance score of each dimension can be recorded as MM3-LY2-YX2.
And then calculating comprehensive scores according to the weight of each dimension, for example, the secret grade classification dimension weight is 0.3, the source classification dimension weight is 0.2, and the influence classification dimension weight is 0.5, so that the comprehensive scores in calculation are calculated.
And the data auditing module is also used for immediately re-acquiring the industrial data for filling when the industrial data with the comprehensive score higher than the preset threshold value is screened out, counting the screened industrial data into a supplementary form when the industrial data with the comprehensive score lower than the preset threshold value is screened out, periodically re-acquiring the industrial data for supplementing the industrial data recorded in the supplementary form, and taking the industrial data out of the supplementary form after the industrial data is acquired.
Specifically, after the industrial data is screened out, according to the comprehensive score, if the comprehensive score is higher, the industrial data needs to be immediately supplemented, if the comprehensive score is lower, the industrial data can be recorded, which data is screened out is recorded, the data with lower comprehensive score is regularly and uniformly acquired, and the system load is reduced.
The embodiment also discloses a multi-source heterogeneous data management method for a complex industrial process, which is different from the first embodiment in that the step S300 further includes the following steps:
s330: identifying the importance level of the industrial data according to the classification of each dimension of the industrial data, generating dimension importance scores, wherein in the secret level classification dimension, the importance level is in turn secret, confidential, general and public from high to low, in the data source classification dimension, the importance level is in turn an internal system, an internal device, an external unit and the Internet from high to low, the importance level is in turn a three-level influence event, a two-level influence event and a one-level influence event from high to low in the event classification dimension, and according to preset weights of each dimension and the importance scores of each dimension, calculating comprehensive scores;
s400 further comprises the steps of:
s410: and when the industrial data with the comprehensive score being higher than the preset threshold value is screened out, immediately re-acquiring the industrial data to be filled, when the industrial data with the comprehensive score being lower than the preset threshold value is screened out, counting the screened out industrial data into a supplementary form, periodically re-acquiring the industrial data recorded in the supplementary form to supplement the industrial data, and taking the industrial data out of the supplementary form after the industrial data is acquired.
The foregoing is merely exemplary of the present invention, and the specific structures and features well known in the art are not described in any way herein, so that those skilled in the art will be able to ascertain all prior art in the field, and will not be able to ascertain any prior art to which this invention pertains, without the general knowledge of the skilled person in the field, before the application date or the priority date, to practice the present invention, with the ability of these skilled persons to perfect and practice this invention, with the help of the teachings of this application, with some typical known structures or methods not being the obstacle to the practice of this application by those skilled in the art. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (8)

1. A multi-source heterogeneous data management system for a complex industrial process, characterized by: the system comprises a data acquisition module, a data checking module, a data planning module, a data auditing module and a data calculation module;
the data acquisition module is used for acquiring and preprocessing industrial data generated by each node, and adding unique data identification to the industrial data according to the data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data;
the data checking module is used for probing the industrial data, generating an industrial data probing report, judging the rule of the industrial data probing report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to a preset rule;
the data planning module is used for carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement, and adding classification labels from different dimensions to establish a corresponding database for storage to form a data resource pool;
the data auditing module is used for judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool, and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool;
the data computing module is used for acquiring data application requirements, collecting data from the data resource pool according to the data application requirements, and processing and computing to generate an application data set;
the data planning module comprises a type classification module, wherein the type classification module comprises a security classification module, a source classification module and an influence classification module;
the secret classification module is used for classifying the industrial data according to secret levels, wherein the secret levels comprise public, general, confidential and secret;
the source classification module is used for classifying industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet;
the influence classification module is used for classifying event influences related according to industrial data in a grading manner, wherein the event influences comprise primary influence events, secondary influence events and tertiary influence events;
the importance level module is used for identifying the importance level of the industrial data according to the classification of each dimension of the industrial data and generating a dimension importance score, wherein in the secret level classification dimension, the importance level is in turn from high to low as secret, general and public, in the data source classification dimension, the importance level is in turn from high to low as an internal system, an internal device, an external unit and the Internet, the importance level is in turn from high to low as an event influence classification dimension, the importance level is in turn from high to low as a three-level influence event, a two-level influence event and a one-level influence event, and the comprehensive score is calculated according to preset weight of each dimension and the importance score of each dimension;
and the data auditing module is also used for immediately re-acquiring the industrial data for filling when the industrial data with the comprehensive score higher than the preset threshold value is screened out, counting the screened industrial data into a supplementary form when the industrial data with the comprehensive score lower than the preset threshold value is screened out, periodically re-acquiring the industrial data for supplementing the industrial data recorded in the supplementary form, and taking the industrial data out of the supplementary form after the industrial data is acquired.
2. A multi-source heterogeneous data management system for complex industrial processes according to claim 1 wherein: the data checking module comprises a probing configuration module and a data probing module;
the probe configuration module is used for configuring data sources for probing and probe rules;
and the data exploration module is used for exploration of one or more of a data set, a data field and data quality of the industrial data in the selected data sources according to exploration rules.
3. A multi-source heterogeneous data management system for complex industrial processes according to claim 1 wherein: the data planning module comprises a hierarchy dividing module;
the hierarchical division module is used for dividing the industrial data according to a data hierarchy, wherein the data hierarchy comprises original data, basic data, theme data and knowledge data.
4. A multi-source heterogeneous data management system for complex industrial processes according to claim 1 wherein: the system also comprises a data cataloging module;
and the data cataloging module is used for cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data and then putting the industrial data into the data resource pool.
5. The multi-source heterogeneous data management method for the complex industrial process is characterized by comprising the following steps of: the method comprises the following steps:
s100: collecting and preprocessing industrial data generated by each node, and adding unique data identification to the industrial data according to a data source node, wherein the industrial data comprises business data, equipment interconnection data and enterprise external data;
s200: probing the industrial data, generating an industrial data probe report, judging the rule of the industrial data probe report, acquiring the preliminary quality problem in the industrial data, and processing the preliminary quality problem of the industrial data with the preliminary quality problem according to a preset rule;
s300: carrying out data logic hierarchical division on the industrial data processed by the primary quality problem according to the data application requirement, and adding classification labels from different dimensions to establish a corresponding database for storage to form a data resource pool;
s400: judging the accuracy, the integrity, the consistency and the timeliness requirements of the corresponding levels and types of the industrial data in the data resource pool, and screening out the industrial data which does not meet any one of the accuracy, the integrity, the consistency and the timeliness from the data resource pool;
s500: acquiring data application requirements, and collecting data from a data resource pool according to the data application requirements to perform processing calculation to generate an application data set;
the step S300 further includes the steps of:
s321: classifying the industrial data according to a secret level, wherein the secret level comprises public, general, confidential and extremely confidential;
s322: classifying industrial data according to data sources, wherein the data sources comprise an internal system, internal equipment, an external unit and the Internet;
s323: classifying event influences associated according to industrial data, the event influences including primary influence events, secondary influence events and tertiary influence events;
s300 further comprises the steps of:
s330: identifying the importance level of the industrial data according to the classification of each dimension of the industrial data, generating dimension importance scores, wherein in the secret level classification dimension, the importance level is in turn secret, confidential, general and public from high to low, in the data source classification dimension, the importance level is in turn an internal system, an internal device, an external unit and the Internet from high to low, the importance level is in turn a three-level influence event, a two-level influence event and a one-level influence event from high to low in the event classification dimension, and according to preset weights of each dimension and the importance scores of each dimension, calculating comprehensive scores;
s400 further comprises the steps of:
s410: and when the industrial data with the comprehensive score being higher than the preset threshold value is screened out, immediately re-acquiring the industrial data to be filled, when the industrial data with the comprehensive score being lower than the preset threshold value is screened out, counting the screened out industrial data into a supplementary form, periodically re-acquiring the industrial data recorded in the supplementary form to supplement the industrial data, and taking the industrial data out of the supplementary form after the industrial data is acquired.
6. The method for multi-source heterogeneous data governance of a complex industrial process of claim 5, wherein: the step S200 includes the steps of:
s210: configuring data sources for probing and a probing rule;
s220: one or more of the data set, the data field, and the data quality of the industrial data in the selected data source is probed according to the probing rules.
7. The method for multi-source heterogeneous data governance of a complex industrial process of claim 6, wherein: the step S300 includes the steps of:
s310: the industrial data is partitioned according to a data hierarchy that includes raw data, base data, subject data, and knowledge data.
8. The method for multi-source heterogeneous data governance of a complex industrial process of claim 7, wherein: the method also comprises the following steps:
s600: cataloging the industrial data according to the level division and the type division, generating an industrial data resource catalog, cataloging the industrial data, and then putting the industrial data into a data resource pool.
CN202410002248.5A 2024-01-02 2024-01-02 Multi-source heterogeneous data management system and method for complex industrial process Pending CN117520352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410002248.5A CN117520352A (en) 2024-01-02 2024-01-02 Multi-source heterogeneous data management system and method for complex industrial process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410002248.5A CN117520352A (en) 2024-01-02 2024-01-02 Multi-source heterogeneous data management system and method for complex industrial process

Publications (1)

Publication Number Publication Date
CN117520352A true CN117520352A (en) 2024-02-06

Family

ID=89766755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410002248.5A Pending CN117520352A (en) 2024-01-02 2024-01-02 Multi-source heterogeneous data management system and method for complex industrial process

Country Status (1)

Country Link
CN (1) CN117520352A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699175A (en) * 2021-01-15 2021-04-23 广州汇智通信技术有限公司 Data management system and method thereof
CN114911908A (en) * 2022-06-01 2022-08-16 国家石油天然气管网集团有限公司 Method and device for pipe network data security management
CN115274122A (en) * 2022-07-04 2022-11-01 中国信息通信研究院 Health medical data management method, system, electronic device and storage medium
CN115422173A (en) * 2022-08-17 2022-12-02 天元大数据信用管理有限公司 Data management method and system in financial credit field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699175A (en) * 2021-01-15 2021-04-23 广州汇智通信技术有限公司 Data management system and method thereof
CN114911908A (en) * 2022-06-01 2022-08-16 国家石油天然气管网集团有限公司 Method and device for pipe network data security management
CN115274122A (en) * 2022-07-04 2022-11-01 中国信息通信研究院 Health medical data management method, system, electronic device and storage medium
CN115422173A (en) * 2022-08-17 2022-12-02 天元大数据信用管理有限公司 Data management method and system in financial credit field

Similar Documents

Publication Publication Date Title
CN106570778A (en) Big data-based data integration and line loss analysis and calculation method
Wang et al. Data-driven risk assessment on urban pipeline network based on a cluster model
Tan et al. Quality analytics in a big data supply chain: commodity data analytics for quality engineering
US20230123527A1 (en) Distributed client server system for generating predictive machine learning models
CN114281046A (en) Safety evaluation model, and visual monitoring system and method for initiating explosive device area
CN113723822A (en) Power supply service data management system
CN116993306A (en) Knowledge graph-based construction method and device of network collaborative manufacturing system
CN110337640A (en) Method and system for problem alert polymerization
CN116579697A (en) Cold chain full link data information management method, device, equipment and storage medium
Sarker et al. A comprehensive review on big data for industries: challenges and opportunities
CN116975990B (en) Management method and system for three-dimensional model of oil-gas chemical engineering wharf
CN116205636B (en) Subway facility maintenance management method and system based on Internet of things technology
CN117312290A (en) Method for improving heterogeneous system data quality
CN117520352A (en) Multi-source heterogeneous data management system and method for complex industrial process
Xie et al. A big data technique for internet financial risk control
JP7062505B2 (en) Equipment management support system
Bond et al. A hybrid learning approach to prognostics and health management applied to military ground vehicles using time-series and maintenance event data
CN115659214A (en) Energy industry data credible evaluation method based on PaaS platform
KR102437917B1 (en) Equipment operation system
CN116485214A (en) Continuous production-oriented process evaluation method and system
CN112966897A (en) Multi-dimensional data analysis method based on maintenance platform
Raj et al. On the Impact of ML use cases on Industrial Data Pipelines
CN112396349A (en) Data quality monitoring method based on business entity
Sun et al. Design and Analysis of Bridge Inspection System Based on Wireless Communication and Internet of Things Technology
Yang et al. Design and Application of Big Data Technology Management for the Analysis System of High Speed Railway Operation Safety Rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination