CN114003596A

CN114003596A - Multi-source heterogeneous data processing system and method based on industrial system

Info

Publication number: CN114003596A
Application number: CN202111355901.9A
Authority: CN
Inventors: 许丰娟; 李俊; 郝志强; 高建磊; 李耀兵; 江浩; 巩天宇; 赵千; 李赟
Original assignee: China Industrial Control Systems Cyber Emergency Response Team
Current assignee: China Industrial Control Systems Cyber Emergency Response Team
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-01
Anticipated expiration: 2041-11-16
Also published as: CN114003596B

Abstract

According to the multisource heterogeneous data processing system and method based on the industrial system, the edge computing module is adopted to complete a part of computing tasks (data cleaning, screening, encryption processing and the like are carried out on preprocessed data), the computing pressure of the cloud data center can be effectively relieved, heterogeneous data are coded in a multi-path parallel mode to form a unified identifier, subsequent computing is facilitated, the processing speed can be increased, the edge computing module is adopted to carry out data screening based on unified coding, the data storage expense of the cloud data center can be greatly saved, meanwhile, the data screening of the edge computing module is also an efficient data cleaning mode, the computing burden of the cloud data center can be reduced, in addition, leak data are detected in real time, and the timeliness requirement of abnormal alarm can be met by directly uploading the data to the cloud data center.

Description

Multi-source heterogeneous data processing system and method based on industrial system

Technical Field

The invention relates to the technical field of industrial data processing, in particular to a multisource heterogeneous data processing system and method based on an industrial system.

Background

The rapidity, timeliness and professional requirements of enterprises on industrial data acquisition are increasingly enhanced. The traditional industrial informatization is that data acquisition is carried out on site, data transmission is mainly carried out in a local area network, the trend that industrial data gradually migrate to public clouds is great at present, the high-speed transmission of the clouds on the data is challenging, and the traditional wireless data acquisition technology is difficult to be superior to the industrial scene data acquisition with high precision and low time delay, so that the real-time monitoring requirement of a highly-automatic production process cannot be met.

With the continuous development of industrial automation and internet application, especially the development and application of 5G technology, industrial internet becomes a necessary development trend of modern industry, the quantity of data generated in industrial fields is greatly increased, and industrial data is necessarily increased by geometric multiples. Industrial data is the basis for the development of industrial internet, which is a soul for industrial internet applications and controls. However, the large amount of industrial data entails difficulties in analysis and application, especially in situations where current data processing devices are very lagged. Meanwhile, in order to ensure normal and stable operation of the industrial system, historical data which needs to be recorded is more diversified, if the data are directly stored or sent to a data center from the network edge for processing, a large amount of storage space is wasted, and query, transmission and calling of the data become very troublesome, so that a certain means is urgently needed to be adopted for screening and compressing the data, so as to solve the problems in the prior art.

Disclosure of Invention

The invention aims to provide a multisource heterogeneous data processing system and method based on an industrial system, which can greatly shorten the waiting time, improve the processing efficiency and the analysis efficiency of data and further solve the problems of data real-time performance and reliability caused by a large number of heterogeneous devices and networks on the site of an industrial internet.

In order to achieve the purpose, the invention provides the following scheme:

an industrial system based multi-source heterogeneous data processing system comprising:

the multi-channel data acquisition terminal is used for acquiring data of each device in the industrial system; an apparatus in an industrial system comprising: industrial host equipment, production control equipment, network equipment, safety equipment, office equipment and industrial auxiliary equipment;

the acquisition preprocessing terminal is connected with the multi-path data acquisition terminal and is used for preprocessing the acquired data of each device in the industrial system; the pretreatment comprises the following steps: coding processing, classification processing and vulnerability data detection;

the edge calculation module is connected with the acquisition preprocessing terminal and is used for carrying out data cleaning, screening and encryption processing on the preprocessed data;

and the cloud data center is respectively connected with the acquisition preprocessing terminal and the edge computing module and is used for storing the preprocessed data and the data subjected to data cleaning, screening and encryption processing.

Preferably, the acquisition preprocessing terminal includes:

the encoding unit is connected with the multi-path data acquisition end and is used for encoding the acquired data of each device in the industrial system to obtain encoded data;

the classification unit is connected with the coding unit and is used for classifying the coded data to obtain classified data; the classification data includes: control data, network data, platform data, log data, traffic data, asset data, tool data, production data, or vulnerability data;

the cache unit comprises a plurality of buffer areas, is respectively connected with the classification unit and the edge calculation module, and is used for caching the classification data, transmitting the cached classification data to the edge calculation module when any one of the buffer areas is full, and simultaneously clearing the cached data in the full buffer area;

and the vulnerability detection unit is connected with the classification unit and the cloud data center and is used for detecting whether vulnerability data exist in the classification data, encrypting the existing vulnerability data and uploading the encrypted vulnerability data to the cloud data center when the vulnerability data exist, and simultaneously generating an alarm signal.

Preferably, the method further comprises the following steps:

the alarm module is connected with the vulnerability detection unit and used for receiving the alarm signal and then sending an alarm; the mode of receiving the alarm signal is a short message, an email or an alarm mode.

Preferably, the plurality of buffers includes: a production data cache region, a control data cache region, a log data cache region, a network data cache region, a traffic data cache region, an asset data cache region, a tool data cache region, a platform data cache region, and a vulnerability data cache region.

Preferably, the edge calculation module includes:

the data cleaning unit is connected with the acquisition preprocessing terminal and is used for cleaning the preprocessed data;

the data supplementing unit is connected with the data cleaning unit and used for supplementing the cleaned data by adopting an interpolation method to obtain supplemented data; the interpolation method comprises the following steps: random interpolation and linear interpolation;

the data screening unit is connected with the data cleaning unit and used for screening the supplementary data by adopting a distribution measurement-based downsampling method to obtain useful data;

and the encryption unit is connected with the data screening unit and is used for encrypting the useful data.

Preferably, the data screening unit includes:

the data distance determining subunit is connected with the data supplementing unit and is used for measuring the distance between any two data in the supplementing data by adopting the Euclidean distance;

the distribution metric determining subunit is connected with the data distance determining subunit and used for determining the distribution metric of each data according to the distance based on the neighborhood of each data in the supplementary data; the neighborhood is a hyper-sphere formed by taking any data point in the supplementary data as a center and taking a preset value as a radius;

the data sorting subunit is connected with the distribution metric determining subunit and is used for sorting the data in the supplementary data in a descending order based on the distribution metric to obtain sorted data;

the first judgment subunit is connected with the data sorting subunit and is used for judging whether the distribution metric of each data in the arrangement data is greater than a preset threshold value or not to obtain a first judgment result;

the first useful data determining subunit is connected with the judging subunit and is used for reserving the data corresponding to the distribution metric and judging the data as useful data when the first judging result is that the distribution metric is greater than the preset threshold;

the second judging subunit is connected with the judging subunit and used for judging whether the data corresponding to the distribution metric is in the neighborhood of the existing useful data or not when the first judging result is that the distribution metric is smaller than or equal to the preset threshold value, so as to obtain a second judging result;

a second useful data determining subunit, connected to the second judging subunit, and configured to determine, when the second judgment result indicates that the data corresponding to the distribution metric is not in a neighborhood of existing useful data, that the data corresponding to the distribution metric is useful data;

and the redundant data determining subunit is connected with the second judging subunit and used for determining that the data corresponding to the distribution metric is useful data when the second judging result is that the data corresponding to the distribution metric is in the neighborhood of the available data.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the multisource heterogeneous data processing system based on the industrial system, the edge computing module is adopted to complete a part of computing tasks, so that the computing pressure of the cloud data center can be effectively relieved, heterogeneous data are coded in a multipath parallel mode to form a unified identifier, subsequent computing is facilitated, the processing speed can be increased, the edge computing module is adopted to screen data based on the unified code, the data storage expense of the cloud data center can be greatly saved, meanwhile, the data screening of the edge computing module is also an efficient data cleaning mode, the computing burden of the cloud data center can be reduced, in addition, leak data are detected in real time, and the timeliness requirement of abnormal alarm can be met by directly uploading the data to the cloud data center.

Corresponding to the multi-source heterogeneous data processing system based on the industrial system, the invention also provides a multi-source heterogeneous data processing method based on the industrial system, and the method comprises the following steps:

collecting data of each device in an industrial system; an apparatus in an industrial system comprising: industrial host equipment, production control equipment, network equipment, safety equipment, office equipment and industrial auxiliary equipment;

preprocessing acquired data of each device in the industrial system; the pretreatment comprises the following steps: coding processing, classification processing and vulnerability data detection;

carrying out data cleaning, screening and encryption processing on the preprocessed data;

and storing the preprocessed data and the data subjected to data cleaning, screening and encryption.

Preferably, the preprocessing the acquired data of each device in the industrial system specifically includes:

encoding the acquired data of each device in the industrial system to obtain encoded data;

classifying the coded data to obtain classified data; the classification data includes: control data, network data, platform data, log data, traffic data, asset data, tool data, production data, or vulnerability data;

caching the classified data, transmitting the cached classified data to the edge computing module when the cache is full, and simultaneously clearing the cached data in the full cache region;

and detecting whether vulnerability data exists in the classified data, encrypting the existing vulnerability data and uploading the encrypted vulnerability data to the cloud data center when the vulnerability data exists, and generating an alarm signal at the same time.

Preferably, the data cleaning, screening and encrypting the preprocessed data specifically includes:

carrying out data cleaning on the preprocessed data;

supplementing the cleaned data by adopting an interpolation method to obtain supplemented data;

screening the supplementary data by adopting a distribution measurement-based downsampling method to obtain useful data;

and encrypting the useful data.

Preferably, the screening of the supplementary data by using a downsampling method based on distribution metric to obtain useful data specifically includes:

measuring the distance between any two data in the supplementary data by adopting a Euclidean distance;

determining distribution measurement of each data according to the distance based on the neighborhood of each data in the supplementary data; the neighborhood is a hyper-sphere formed by taking any data point in the supplementary data as a center and taking a preset value as a radius;

sorting the data in the supplementary data in a descending order based on the distribution measurement to obtain sorted data;

judging whether the distribution metric of each data in the arrangement data is larger than a preset threshold value or not to obtain a first judgment result;

when the first judgment result is that the distribution metric is larger than the preset threshold, retaining data corresponding to the distribution metric and judging the data to be useful data;

when the first judgment result is that the distribution metric is less than or equal to the preset threshold, judging whether the data corresponding to the distribution metric is in the neighborhood of the existing useful data or not to obtain a second judgment result;

when the second judgment result is that the data corresponding to the distribution metric is not in the neighborhood of the existing useful data, determining that the data corresponding to the distribution metric is useful data;

and when the second judgment result is that the data corresponding to the distribution metric is in the neighborhood of the existing useful data, determining that the data corresponding to the distribution metric is the useful data.

The technical effect achieved by the multisource heterogeneous data processing method based on the industrial system provided by the invention is the same as that achieved by the multisource heterogeneous data processing system based on the industrial system provided by the invention, so that the detailed description is omitted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic diagram of an industrial system based multi-source heterogeneous data processing system according to the present invention;

fig. 2 is a flowchart of a multi-source heterogeneous data processing method based on an industrial system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the multi-source heterogeneous data processing system based on an industrial system provided by the present invention includes: the system comprises a multi-path data acquisition terminal, an acquisition preprocessing terminal, an edge computing module and a cloud data center.

The multi-path data acquisition end is used for acquiring data of each device in the industrial system. The multi-channel data acquisition end comprises a plurality of data acquisition devices, and the data acquisition devices acquire data of various different types on an industrial field. Industrial field device objects include industrial host devices, production control devices, network devices, security devices, office devices, industrial auxiliary devices, and the like.

The acquisition preprocessing terminal is connected with the multi-path data acquisition terminal and is used for preprocessing acquired data of each device in the industrial system. The pretreatment comprises the following steps: coding processing, classification processing and vulnerability data detection. The data after being coded are classified and cached by the collection preprocessing terminal, and after the vulnerability data cache region is full, vulnerability data are directly encrypted and uploaded to the cloud data center.

The edge calculation module is connected with the acquisition preprocessing terminal and is used for carrying out data cleaning, screening and encryption processing on the preprocessed data. The edge calculation module has certain calculation capacity, and performs data cleaning processing on uniformly coded data, and the specific content is as follows:

the edge calculation module performs data cleaning on the uniformly coded data and supplements missing values by adopting a difference method, and the specific method comprises the following steps: random interpolation, newton interpolation. And directly deleting abnormal data beyond the value range.

The edge computing module computes useful data and redundant data by adopting a distribution measurement-based downsampling method for the cleaned data, stores the useful data into a data cache region of the edge computing module, encrypts the data after the cache region is full, and uploads the encrypted data to a cloud data center.

The edge calculation module and the acquisition preprocessing terminal are encrypted before uploading data so as to ensure the safety of the data.

And the cloud data center is respectively connected with the acquisition preprocessing terminal and the edge computing module and is used for storing the preprocessed data and the data subjected to data cleaning, screening and encryption processing so as to facilitate subsequent analysis and decision.

As another embodiment of the present invention, the acquisition preprocessing terminal adopted by the present invention may be configured to include: the device comprises an encoding unit, a classification unit, a cache unit comprising a plurality of buffers and a vulnerability detection unit.

The coding unit is connected with the multi-path data acquisition end and used for coding the acquired data of each device in the industrial system to obtain coded data.

The classification unit is connected with the coding unit and is used for classifying the coded data to obtain classified data. The classification data includes: control data, network data, platform data, log data, traffic data, asset data, tool data, production data, or vulnerability data. The vulnerability data refers to data which has security threat to the industrial system or causes abnormal operation of the industrial system.

The cache unit comprising a plurality of buffer areas is respectively connected with the classification unit and the edge calculation module, and is used for caching the classification data, transmitting the cached classification data to the edge calculation module when any buffer area is full, and simultaneously clearing the cached data in the full buffer area. Wherein the plurality of buffers include: a production data cache region, a control data cache region, a log data cache region, a network data cache region, a traffic data cache region, an asset data cache region, a tool data cache region, a platform data cache region, and a vulnerability data cache region.

The vulnerability detection unit is connected with the classification unit and the cloud data center, and is used for detecting whether vulnerability data exists in the classification data, encrypting the existing vulnerability data and uploading the encrypted vulnerability data to the cloud data center when the vulnerability data exists, and meanwhile generating an alarm signal.

As another embodiment of the present invention, the multi-source heterogeneous data processing system based on the industrial system provided above of the present invention may further include: and an alarm module.

The alarm module is connected with the vulnerability detection unit and used for receiving the alarm signal and then sending out an alarm. The mode of receiving the alarm signal is a short message, an email or an alarm mode.

As another embodiment of the present invention, the edge calculation module adopted in the foregoing may include: the device comprises a data cleaning unit, a data supplementing unit, a data screening unit and an encryption unit.

The data cleaning unit is connected with the acquisition preprocessing terminal and is used for cleaning the preprocessed data.

The data supplementing unit is connected with the data cleaning unit and is used for supplementing the cleaned data by adopting an interpolation method to obtain supplemented data. The interpolation method comprises the following steps: random interpolation and linear interpolation.

The data screening unit is connected with the data cleaning unit and is used for screening the supplementary data by adopting a distribution measurement-based downsampling method to obtain useful data.

The encryption unit is connected with the data screening unit and is used for encrypting the useful data.

Further, the data filtering unit includes: the device comprises a data distance determining subunit, a distribution metric determining subunit, a data sorting subunit, a first judging subunit, a first useful data determining subunit, a second judging subunit, a second useful data determining subunit and a redundant data determining subunit.

The data distance determining subunit is connected with the data supplementing unit and is used for supplementing the distance between any two data in the data by adopting Euclidean distance measurement.

The distribution metric determining subunit is connected with the data distance determining subunit, and the distribution metric determining subunit is used for determining the distribution metric of each data according to the distance based on the neighborhood of each data in the supplementary data. The neighborhood is a hyper-sphere formed by taking any data point in the supplementary data as a center and taking a preset value as a radius.

The data sorting subunit is connected with the distribution metric determining subunit, and the data sorting subunit is used for sorting the data in the supplementary data in a descending order based on the distribution metric to obtain the sorted data.

The first judging subunit is connected with the data sorting subunit, and is used for judging whether the distribution metric of each data in the arranged data is greater than a preset threshold value or not to obtain a first judging result.

And the first useful data determining subunit is connected with the judging subunit, and is used for reserving the data corresponding to the distribution metric and judging the data as useful data when the first judgment result is that the distribution metric is greater than a preset threshold value.

And the second judging subunit is connected with the judging subunit, and is used for judging whether the data corresponding to the distribution metric is in the neighborhood of the existing useful data or not when the first judging result is that the distribution metric is less than or equal to the preset threshold value, so as to obtain a second judging result.

And the second useful data determining subunit is connected with the second judging subunit, and the second useful data determining subunit is used for determining that the data corresponding to the distribution metric is useful data when the second judgment result is that the data corresponding to the distribution metric is not in the neighborhood of the existing useful data.

And the redundant data determining subunit is connected with the second judging subunit, and the redundant data determining subunit is used for determining that the data corresponding to the distribution metric is the useful data when the second judging result is that the data corresponding to the distribution metric is in the neighborhood of the existing useful data.

Corresponding to the multi-source heterogeneous data processing system based on the industrial system, the invention also provides a multi-source heterogeneous data processing method based on the industrial system, as shown in fig. 2, the method comprises the following steps:

step 100: data is collected for each device in the industrial system. An apparatus in an industrial system comprising: industrial host equipment, production control equipment, network equipment, security equipment, office equipment and industrial auxiliary equipment.

Step 101: and preprocessing the acquired data of each device in the industrial system. The pretreatment comprises the following steps: coding processing, classification processing and vulnerability data detection. The implementation process of the step can be as follows:

step 1011: and encoding the acquired data of each device in the industrial system to obtain encoded data.

Step 1012: and classifying the coded data to obtain classified data. The classification data includes: control data, network data, platform data, log data, traffic data, asset data, tool data, production data, or vulnerability data.

Step 1013: and caching the classified data, transmitting the cached classified data to the edge computing module when the cache is full, and clearing the cached data in the full cache region.

Step 1014: and detecting whether vulnerability data exists in the classified data, encrypting the existing vulnerability data and uploading the encrypted vulnerability data to a cloud data center when the vulnerability data exists, and generating an alarm signal at the same time.

Step 102: and carrying out data cleaning, screening and encryption processing on the preprocessed data. The implementation process of the step can comprise the following steps:

step 1021: and performing data cleaning on the preprocessed data.

Step 1022: and supplementing the cleaned data by adopting an interpolation method to obtain supplemented data.

Step 1023: and screening the supplementary data by adopting a downsampling method based on distribution measurement to obtain useful data.

Step 1024: useful data is encrypted.

Step 103: and storing the preprocessed data and the data subjected to data cleaning, screening and encryption.

As another embodiment, the implementation process of the step 1023 may be:

and supplementing the distance between any two data in the data by adopting Euclidean distance measurement.

A distribution metric for each data is determined from the distance based on a neighborhood of each data in the supplemental data. The neighborhood is a hyper-sphere formed by taking any data point in the supplementary data as a center and taking a preset value as a radius.

And sequencing the data in the supplementary data in a descending manner based on the distribution measurement to obtain the sequence data.

And judging whether the distribution metric of each data in the arrangement data is greater than a preset threshold value or not to obtain a first judgment result.

And when the first judgment result is that the distribution metric is larger than the preset threshold, retaining the data corresponding to the distribution metric and judging the data to be useful data.

And when the first judgment result is that the distribution metric is less than or equal to the preset threshold, judging whether the data corresponding to the distribution metric is in the neighborhood of the existing useful data or not, and obtaining a second judgment result.

And when the second judgment result is that the data corresponding to the distribution metric is not in the neighborhood of the existing useful data, determining that the data corresponding to the distribution metric is the useful data.

And when the second judgment result is that the data corresponding to the distribution metric is in the neighborhood of the existing useful data, determining the data corresponding to the distribution metric as the useful data.

The following provides a specific embodiment, which is used to explain the specific implementation process of the multi-source heterogeneous data processing system and method based on the industrial system, and in the practical application process, the implementation process is not limited to the algorithm adopted in the following embodiments.

Step 1: the multi-path data acquisition area comprises a plurality of data acquisition devices, the data acquisition devices acquire data of various industrial field devices, the device objects comprise industrial host equipment, production control equipment, network equipment, safety equipment, office equipment, industrial auxiliary equipment and the like, and the data acquisition devices transmit the data to the acquisition preprocessing terminal.

Step 2: the acquisition preprocessing terminal uniformly encodes and classifies and caches the obtained data

Step 2.1: the acquisition preprocessing terminal collects data sent by the data acquisition equipment and uniformly encodes the obtained data.

Step 2.2: the collection preprocessing terminal carries out preliminary classification and caching on the coded data, the data are divided into control data, network data, platform data, log data, flow data, asset data, tool data, production data and vulnerability data, the control data, the network data, the platform data, the log data, the flow data, the asset data, the tool data, the production data and the vulnerability data are stored in cache regions corresponding to the collection preprocessing terminal respectively, and the cache regions comprise a control data cache region, a network data cache region, a platform data cache region, a log data cache region, a flow data cache region, an asset data cache region, a tool data cache region, a production data cache region and a vulnerability data cache region. And after any cache region is full, sending the data of the cache region to an edge calculation module, emptying the data of the cache region after the data is successfully sent, and waiting for new data to be stored.

Step 2.3: if the vulnerability data is detected, the vulnerability data is encrypted and then directly uploaded to a cloud data center, and abnormal information is sent to an alarm module in the modes of short messages, mails, alarms and the like.

And step 3: and the edge calculation module is used for cleaning and screening data.

Step 3.1: after the edge calculation module receives data sent by the cache region of the acquisition preprocessing terminal, the data is firstly cleaned through the data cleaning module, and missing values are supplemented by adopting various interpolation methods, wherein the methods comprise a random interpolation method and a linear interpolation method. The random interpolation method is to select the historical data of the buffer area to carry out random sampling to replace the missing data.

The linear interpolation formula is as follows:

wherein (x)₀,y₀)，(x₁,y₁) For known historical data, (x)₂,y₂) For data with missing values, y₂Is a missing value.

Step 3.2: the edge calculation module screens out useful data and redundant data by adopting a distribution measurement-based downsampling method for the cleaned data, and the specific method is as follows:

the Euclidean distance is used for measuring the distance d (x) between any two data_i,x_j)：

Wherein x is_i,x_jFor any two pieces of data, n is the data dimension, x_ikIs the k-th number of the ith piece of data.

Is defined by the sample point x_iCentered on a hypersphere with epsilon as radius as a sample point x_iE neighborhood of (c). With N_ε(x_i) Number of sample points representing the intersection of all data with the neighborhood, N_ε(x_i) Larger means x_iThe greater the number of nearby data distributions. Setting an adjustable radius epsilon and a threshold q, and calculating N corresponding to each data_ε。

The distribution metric of each data is defined by the epsilon neighborhood of each data:

where ρ (x) is the distribution metric of data x and n is the number of data points in the ε neighborhood.

And calculating a distribution metric of all the data, wherein the distribution metric represents the distribution information of the data to a certain extent, and the larger the distribution metric value is, the more redundant data near the data is represented.

The data screening unit in the edge calculation module arranges all the data according to the sequence of rho (x) values from large to small, and screens the data one by one from the data point with the maximum rho (x): if N is present_ε(x_i) If greater than the threshold q, the number is considered to beThe site reserves for useful data. If N is present_ε(x_i) And if the data is not in the epsilon neighborhood of the existing useful data point, the point is taken as useful data, and if the data is in the hypersphere (namely the epsilon neighborhood of the existing useful data point), the point is considered as redundant data.

And after traversing all the data, the data screening module divides all the data into useful data and redundant data.

And 4, step 4: and storing useful data into a cache region of the edge computing module, encrypting the data after the cache region is full, and uploading the data to a cloud data center.

Based on the above description, the technical solution provided by the present invention now has the following advantages over the prior art:

1. the invention aims at industrial field multisource heterogeneous data, and utilizes an acquisition preprocessing terminal to uniformly encode, classify and store the data in a multipath parallel mode. The data are uniformly coded and then divided into log data, flow data, asset data, tool data, production data and vulnerability data, which are respectively stored in corresponding cache regions, and the data are uniformly coded and preliminarily classified, so that the data management is facilitated and the subsequent useful data screening is facilitated.

2. The method is oriented to the industrial field multi-source heterogeneous data, and classification and screening of the industrial field multi-source heterogeneous data are achieved. And the computing pressure of the cloud data center is effectively relieved by completing a part of computing tasks through the edge computing module. The edge computing module screens data based on unified coding, performs data compression by eliminating redundant data, and utilizes distribution measurement information to reduce data volume to a certain extent when screening data, and meanwhile, the information of original data is kept to a greater extent, useful data are effectively extracted, and data storage expenditure of a cloud data center is greatly saved. Meanwhile, for the cloud data center, data screening of the edge computing module is also an efficient data cleaning mode, and the edge computing module bears a part of computing tasks, so that the computing burden of the cloud data center is reduced. The vulnerability data is directly uploaded to the alarm module, and the timeliness requirement of abnormal alarm is met.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A multi-source heterogeneous data processing system based on an industrial system, comprising:

2. The industrial system-based multi-source heterogeneous data processing system according to claim 1, wherein the acquisition preprocessing terminal comprises:

3. The industrial system based multi-source heterogeneous data processing system of claim 2, further comprising:

4. The industrial system-based multi-source heterogeneous data processing system of claim 2, wherein the plurality of buffers comprises: a production data cache region, a control data cache region, a log data cache region, a network data cache region, a traffic data cache region, an asset data cache region, a tool data cache region, a platform data cache region, and a vulnerability data cache region.

5. The industrial system-based multi-source heterogeneous data processing system of claim 1, wherein the edge calculation module comprises:

6. The industrial system-based multi-source heterogeneous data processing system of claim 5, wherein the data screening unit comprises:

7. A multi-source heterogeneous data processing method based on an industrial system is characterized by comprising the following steps:

8. The multi-source heterogeneous data processing method based on the industrial system according to claim 7, wherein the preprocessing of the collected data of each device in the industrial system specifically includes:

9. The multi-source heterogeneous data processing method based on the industrial system according to claim 7, wherein the data cleaning, screening and encrypting the preprocessed data specifically comprises:

carrying out data cleaning on the preprocessed data;

and encrypting the useful data.

10. The multi-source heterogeneous data processing method based on the industrial system according to claim 9, wherein the filtering of the supplementary data by using a downsampling method based on distribution metrics to obtain useful data specifically comprises: