CN111752933A

CN111752933A - Method for fusion processing of clustered space-time big data and construction of quality evaluation system

Info

Publication number: CN111752933A
Application number: CN202010374707.4A
Authority: CN
Inventors: 颜军; 贾泽露; 叶伟立
Original assignee: JIANGSU ZHITU TECHNOLOGY CO LTD; Shenzhen Baoan District Information Center
Current assignee: JIANGSU ZHITU TECHNOLOGY CO LTD; Shenzhen Baoan District Information Center
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2020-10-09

Abstract

The invention provides a method for clustered space-time big Data fusion processing and quality evaluation system construction, which relates to the technical field of communication and comprises a zookeeper cluster, a Rule Manager management component and a Data Handle cluster, wherein the zookeeper cluster consists of 3 or more machines, the Data Handle cluster consists of more than 1 Data transmission service node, a component for monitoring a regular path in the zookeeper cluster is embedded in each Data transmission service node, the Data transmission service nodes are connected with the zookeeper cluster, and the Rule Manager management component is connected with the zookeeper cluster. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the timing rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.

Description

Method for fusion processing of clustered space-time big data and construction of quality evaluation system

Technical Field

The invention relates to the field of space-time big data, in particular to a method for fusion processing of clustered space-time big data and construction of a quality evaluation system.

Background

The space-time big data fusion processing is a process of carrying out multi-dimensional fusion analysis on traditional service data based on spatial data, the data quality evaluation is a process of comprehensively evaluating the data fusion quality under a universal algorithm rule aiming at a set data rule, and the space-time big data fusion processing and the quality evaluation system thereof generally comprise a data fusion processing system and a data quality evaluation system.

The space-time big data fusion processing is used for constructing data from the relation of time dimension and space dimension by using a data rule of different data sources through a universal data transmission protocol, and realizing the matching fusion of the whole flow of the process that the data is extracted (extract), transformed (transform) and loaded (load) from a source end to a destination end. The data fusion processing is an important ring of big data analysis, and a user extracts required data from a data source, and finally loads the data into a data warehouse according to a predefined data warehouse model after data cleaning to provide data support and data service for a general algorithm model.

Big data is an important resource of modern enterprises and governments and is the basis for applying scientific management and decision analysis. According to statistics, the data volume can be multiplied every 2-3 years, the data contain huge commercial values, and the data concerned by enterprises generally only account for about 2% -4% of the total data volume. As a result, businesses still do not maximize the use of existing data resources, wasting more time and money, and losing the best opportunity to make critical business decisions. Therefore, how to convert data into information and knowledge through various technical means has become a major bottleneck for improving the core competitiveness of enterprises. Data fusion processing and quality evaluation are very beneficial means, so that a user can rapidly and accurately discover potential value and decision information of data in mass data.

Most of fusion processing and quality evaluation tools in the market at present are single-machine processing, and some distributed data fusion processing tools are also available, but the cleaning and filtering rules are managed by a single node, the cleaning rules cannot be managed in a unified manner, and the real-time rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.

Disclosure of Invention

The invention aims to provide a method for fusion processing of clustered space-time big data and construction of a quality evaluation system, so as to solve the technical problems.

In order to solve the technical problems, the invention adopts the following technical scheme:

the method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.

The method for fusion processing of clustered spatio-temporal big data and construction of a quality evaluation system comprises the following steps:

(1) the space-time big Data are distributed to a DataRule Manager node through a GeoDataService node for regularized matching fusion, the fused Data Check the Data quality through a Data Check, and the Data passing the Check are directly fed back to the GeoDataService to provide Data service support for the outside;

(2) the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;

(3) and the Data Check cluster monitors the rule according to the specified path and evaluates the quality of the Data fusion result.

Preferably, the GeoDataService device is provided with an interface for management and detection.

Preferably, the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.

Preferably, the storage form table structure of the rule in the GeoDataService cluster in step (1) utilizes a database system provided by GeoDataService, and the directory combination is completely divided by referring to each data transmission task, and the GeoDataService can persist data to each GeoDataService node. The security of the file is ensured by multiple copies of the file.

Preferably, the DataRule Manager manages the configuration of the rule, and stores the rule into the GeoDataService cluster according to the task definition path.

Preferably, the Data Check cluster in step (3) automatically selects a task, and registers the task into the GeoDataService cluster to perform monitoring on the current task operation condition and the node to which the task belongs. And ensuring the task running integrity.

The invention has the beneficial effects that:

the invention can carry out fusion processing and quality evaluation on spatial big data at a clustering position, can inform each processing node of a GeoDataService cluster and a DataRule Manager cluster according to a change event when the rule changes, so that the latest rule takes effect in real time, can ensure the integrity of all running tasks when data migration service is deployed in a plurality of machines, and can be followed by other nodes to execute the rest work when a certain data transmission service node has a problem and stops. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the real-time rules are difficult to be strictly synchronized to other nodes and become effective in the rule changing process.

Drawings

FIG. 1 is a diagram of the method set up of the present invention;

Detailed Description

The present invention will be further described with reference to specific embodiments for the purpose of facilitating an understanding of technical means, characteristics of creation, objectives and functions realized by the present invention, but the following embodiments are only preferred embodiments of the present invention, and are not intended to be exhaustive. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention. The experimental methods in the following examples are conventional methods unless otherwise specified, and materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

The method comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.

The clustered data rule processing method comprises the following steps:

Besides the cleansing filtering rules, the GeoDataService cluster also contains the record of task attribution, when a data migration task is started, how many tasks are configured to be executed, and then the configuration file contains the following:

the file directories recorded in the GeoDataService database system are as follows:

/jobname/sink/task

the task file contents are as follows:

where flag represents the current point at which task is performed such as: the position value, active and down of the binlog represent the current task execution state; the Data Check cluster is used for monitoring the rules according to a specified path and loading the rules for effectiveness, in order to prevent Data migration pressure brought by multitask execution, the Data Check cluster generally deploys a plurality of tasks for shunting execution, at the moment, the inside of the Data Check cluster can contain a function for competing execution of the tasks, the Data Check cluster in the node can automatically search for the started jobs and then register the jobs in unbound task files, if no task to be bound exists in the task files, the next jobs are collectively searched, when the jobs are searched, balancing can be performed according to the bound jobs, and the balancing strategy is as follows: the number of bound tasks is N, the unbound tasks and the status of all the tasks are N, the number of Data Check clusters is DN, and the total number of tasks is TN: MIN (TN/DN, N) < TN/DN? (TN/DN-n): 0, when a certain data transmission service node is disconnected, the data transmission service node can be obtained by other task nodes again;

the following rules are adopted for the rules for data cleaning and filtering in data migration: field level rule R, table level rule R, library level rule TR:

when the sub-item rules are changed, only the corresponding rules R need to be changed, R and TR are kept unchanged, if the upper-level rules are deleted correspondingly, the lower-level rules are directly deleted, any change is directly reported to the GeoDataService cluster, the rule files under the related directories are directly changed and are synchronizedAnd the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action to other nodes.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, and is characterized in that:

the GeoDataService cluster is composed of 2 or more GeoDataService devices, the Data Check cluster is composed of more than 1 Data Check service node, each Data Check service node is embedded with a component for monitoring a regular path in a DataRule Manager cluster, the Data Check service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.

2. The method for fusion processing of clustered spatiotemporal big data and construction of a quality evaluation system according to claim 1, characterized in that the method comprises the following steps:

the method comprises the following steps: the data check rule is changed and directly reported to one of the GeoDataService nodes, the rule file in the relevant directory is directly changed and synchronized to other GeoDataService nodes;

step two: the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;

step three: and the Data Check cluster performs rule checking and Data fusion processing on the rule according to the specified path.

3. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the GeoDataService equipment is provided with an interface for management and detection.

4. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.

5. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the DataRule Manager management component is used for carrying out configuration management operation on the rules and storing the rules into the GeoDataService cluster according to the task definition path.

6. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: in the first step, the storage mode of the rule in the GeoDataService cluster is structured and unstructured large spatial data files, the file system, the relational database system and the non-relational database system provided by the GeoDataService are utilized, the directory combination is completely divided by referring to each data transmission task, and the GeoDataService file system can persist the files to each GeoDataService node.

7. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: and step three, the Data Check cluster automatically selects the task and registers the task into the GeoDataService cluster to monitor the current task operation condition and the node to which the operation belongs.