CN111752933A - Method for fusion processing of clustered space-time big data and construction of quality evaluation system - Google Patents

Method for fusion processing of clustered space-time big data and construction of quality evaluation system Download PDF

Info

Publication number
CN111752933A
CN111752933A CN202010374707.4A CN202010374707A CN111752933A CN 111752933 A CN111752933 A CN 111752933A CN 202010374707 A CN202010374707 A CN 202010374707A CN 111752933 A CN111752933 A CN 111752933A
Authority
CN
China
Prior art keywords
cluster
geodataservice
data
fusion processing
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010374707.4A
Other languages
Chinese (zh)
Inventor
颜军
贾泽露
叶伟立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU ZHITU TECHNOLOGY CO LTD
Shenzhen Baoan District Information Center
Original Assignee
JIANGSU ZHITU TECHNOLOGY CO LTD
Shenzhen Baoan District Information Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU ZHITU TECHNOLOGY CO LTD, Shenzhen Baoan District Information Center filed Critical JIANGSU ZHITU TECHNOLOGY CO LTD
Priority to CN202010374707.4A priority Critical patent/CN111752933A/en
Publication of CN111752933A publication Critical patent/CN111752933A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention provides a method for clustered space-time big Data fusion processing and quality evaluation system construction, which relates to the technical field of communication and comprises a zookeeper cluster, a Rule Manager management component and a Data Handle cluster, wherein the zookeeper cluster consists of 3 or more machines, the Data Handle cluster consists of more than 1 Data transmission service node, a component for monitoring a regular path in the zookeeper cluster is embedded in each Data transmission service node, the Data transmission service nodes are connected with the zookeeper cluster, and the Rule Manager management component is connected with the zookeeper cluster. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the timing rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.

Description

Method for fusion processing of clustered space-time big data and construction of quality evaluation system
Technical Field
The invention relates to the field of space-time big data, in particular to a method for fusion processing of clustered space-time big data and construction of a quality evaluation system.
Background
The space-time big data fusion processing is a process of carrying out multi-dimensional fusion analysis on traditional service data based on spatial data, the data quality evaluation is a process of comprehensively evaluating the data fusion quality under a universal algorithm rule aiming at a set data rule, and the space-time big data fusion processing and the quality evaluation system thereof generally comprise a data fusion processing system and a data quality evaluation system.
The space-time big data fusion processing is used for constructing data from the relation of time dimension and space dimension by using a data rule of different data sources through a universal data transmission protocol, and realizing the matching fusion of the whole flow of the process that the data is extracted (extract), transformed (transform) and loaded (load) from a source end to a destination end. The data fusion processing is an important ring of big data analysis, and a user extracts required data from a data source, and finally loads the data into a data warehouse according to a predefined data warehouse model after data cleaning to provide data support and data service for a general algorithm model.
Big data is an important resource of modern enterprises and governments and is the basis for applying scientific management and decision analysis. According to statistics, the data volume can be multiplied every 2-3 years, the data contain huge commercial values, and the data concerned by enterprises generally only account for about 2% -4% of the total data volume. As a result, businesses still do not maximize the use of existing data resources, wasting more time and money, and losing the best opportunity to make critical business decisions. Therefore, how to convert data into information and knowledge through various technical means has become a major bottleneck for improving the core competitiveness of enterprises. Data fusion processing and quality evaluation are very beneficial means, so that a user can rapidly and accurately discover potential value and decision information of data in mass data.
Most of fusion processing and quality evaluation tools in the market at present are single-machine processing, and some distributed data fusion processing tools are also available, but the cleaning and filtering rules are managed by a single node, the cleaning rules cannot be managed in a unified manner, and the real-time rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.
Disclosure of Invention
The invention aims to provide a method for fusion processing of clustered space-time big data and construction of a quality evaluation system, so as to solve the technical problems.
In order to solve the technical problems, the invention adopts the following technical scheme:
the method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
The method for fusion processing of clustered spatio-temporal big data and construction of a quality evaluation system comprises the following steps:
(1) the space-time big Data are distributed to a DataRule Manager node through a GeoDataService node for regularized matching fusion, the fused Data Check the Data quality through a Data Check, and the Data passing the Check are directly fed back to the GeoDataService to provide Data service support for the outside;
(2) the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
(3) and the Data Check cluster monitors the rule according to the specified path and evaluates the quality of the Data fusion result.
Preferably, the GeoDataService device is provided with an interface for management and detection.
Preferably, the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.
Preferably, the storage form table structure of the rule in the GeoDataService cluster in step (1) utilizes a database system provided by GeoDataService, and the directory combination is completely divided by referring to each data transmission task, and the GeoDataService can persist data to each GeoDataService node. The security of the file is ensured by multiple copies of the file.
Preferably, the DataRule Manager manages the configuration of the rule, and stores the rule into the GeoDataService cluster according to the task definition path.
Preferably, the Data Check cluster in step (3) automatically selects a task, and registers the task into the GeoDataService cluster to perform monitoring on the current task operation condition and the node to which the task belongs. And ensuring the task running integrity.
The invention has the beneficial effects that:
the invention can carry out fusion processing and quality evaluation on spatial big data at a clustering position, can inform each processing node of a GeoDataService cluster and a DataRule Manager cluster according to a change event when the rule changes, so that the latest rule takes effect in real time, can ensure the integrity of all running tasks when data migration service is deployed in a plurality of machines, and can be followed by other nodes to execute the rest work when a certain data transmission service node has a problem and stops. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the real-time rules are difficult to be strictly synchronized to other nodes and become effective in the rule changing process.
Drawings
FIG. 1 is a diagram of the method set up of the present invention;
Detailed Description
The present invention will be further described with reference to specific embodiments for the purpose of facilitating an understanding of technical means, characteristics of creation, objectives and functions realized by the present invention, but the following embodiments are only preferred embodiments of the present invention, and are not intended to be exhaustive. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention. The experimental methods in the following examples are conventional methods unless otherwise specified, and materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
The method comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
The clustered data rule processing method comprises the following steps:
(1) the space-time big Data are distributed to a DataRule Manager node through a GeoDataService node for regularized matching fusion, the fused Data Check the Data quality through a Data Check, and the Data passing the Check are directly fed back to the GeoDataService to provide Data service support for the outside;
(2) the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
(3) and the Data Check cluster monitors the rule according to the specified path and evaluates the quality of the Data fusion result.
Besides the cleansing filtering rules, the GeoDataService cluster also contains the record of task attribution, when a data migration task is started, how many tasks are configured to be executed, and then the configuration file contains the following:
the file directories recorded in the GeoDataService database system are as follows:
/jobname/sink/task
the task file contents are as follows:
Figure BDA0002479561990000051
Figure BDA0002479561990000061
where flag represents the current point at which task is performed such as: the position value, active and down of the binlog represent the current task execution state; the Data Check cluster is used for monitoring the rules according to a specified path and loading the rules for effectiveness, in order to prevent Data migration pressure brought by multitask execution, the Data Check cluster generally deploys a plurality of tasks for shunting execution, at the moment, the inside of the Data Check cluster can contain a function for competing execution of the tasks, the Data Check cluster in the node can automatically search for the started jobs and then register the jobs in unbound task files, if no task to be bound exists in the task files, the next jobs are collectively searched, when the jobs are searched, balancing can be performed according to the bound jobs, and the balancing strategy is as follows: the number of bound tasks is N, the unbound tasks and the status of all the tasks are N, the number of Data Check clusters is DN, and the total number of tasks is TN: MIN (TN/DN, N) < TN/DN? (TN/DN-n): 0, when a certain data transmission service node is disconnected, the data transmission service node can be obtained by other task nodes again;
the following rules are adopted for the rules for data cleaning and filtering in data migration: field level rule R, table level rule R, library level rule TR:
Figure BDA0002479561990000062
when the sub-item rules are changed, only the corresponding rules R need to be changed, R and TR are kept unchanged, if the upper-level rules are deleted correspondingly, the lower-level rules are directly deleted, any change is directly reported to the GeoDataService cluster, the rule files under the related directories are directly changed and are synchronizedAnd the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action to other nodes.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. The method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, and is characterized in that:
the GeoDataService cluster is composed of 2 or more GeoDataService devices, the Data Check cluster is composed of more than 1 Data Check service node, each Data Check service node is embedded with a component for monitoring a regular path in a DataRule Manager cluster, the Data Check service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
2. The method for fusion processing of clustered spatiotemporal big data and construction of a quality evaluation system according to claim 1, characterized in that the method comprises the following steps:
the method comprises the following steps: the data check rule is changed and directly reported to one of the GeoDataService nodes, the rule file in the relevant directory is directly changed and synchronized to other GeoDataService nodes;
step two: the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
step three: and the Data Check cluster performs rule checking and Data fusion processing on the rule according to the specified path.
3. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the GeoDataService equipment is provided with an interface for management and detection.
4. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.
5. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the DataRule Manager management component is used for carrying out configuration management operation on the rules and storing the rules into the GeoDataService cluster according to the task definition path.
6. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: in the first step, the storage mode of the rule in the GeoDataService cluster is structured and unstructured large spatial data files, the file system, the relational database system and the non-relational database system provided by the GeoDataService are utilized, the directory combination is completely divided by referring to each data transmission task, and the GeoDataService file system can persist the files to each GeoDataService node.
7. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: and step three, the Data Check cluster automatically selects the task and registers the task into the GeoDataService cluster to monitor the current task operation condition and the node to which the operation belongs.
CN202010374707.4A 2020-05-06 2020-05-06 Method for fusion processing of clustered space-time big data and construction of quality evaluation system Pending CN111752933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010374707.4A CN111752933A (en) 2020-05-06 2020-05-06 Method for fusion processing of clustered space-time big data and construction of quality evaluation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010374707.4A CN111752933A (en) 2020-05-06 2020-05-06 Method for fusion processing of clustered space-time big data and construction of quality evaluation system

Publications (1)

Publication Number Publication Date
CN111752933A true CN111752933A (en) 2020-10-09

Family

ID=72673818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010374707.4A Pending CN111752933A (en) 2020-05-06 2020-05-06 Method for fusion processing of clustered space-time big data and construction of quality evaluation system

Country Status (1)

Country Link
CN (1) CN111752933A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857827A (en) * 2019-01-31 2019-06-07 山东省国土测绘院 A kind of geography information archives integrated management approach and system
CN109885613A (en) * 2019-01-03 2019-06-14 江苏智途科技股份有限公司 Clustering data rule processing method
CN110334164A (en) * 2019-06-12 2019-10-15 重庆工商大学融智学院 A kind of fusion method of ecological space data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885613A (en) * 2019-01-03 2019-06-14 江苏智途科技股份有限公司 Clustering data rule processing method
CN109857827A (en) * 2019-01-31 2019-06-07 山东省国土测绘院 A kind of geography information archives integrated management approach and system
CN110334164A (en) * 2019-06-12 2019-10-15 重庆工商大学融智学院 A kind of fusion method of ecological space data

Similar Documents

Publication Publication Date Title
US11586692B2 (en) Streaming data processing
US20220327149A1 (en) Dynamic partition allocation for query execution
US20230049579A1 (en) Executing commands from a distributed execution model
US9940373B2 (en) Method and system for implementing an operating system hook in a log analytics system
US11461334B2 (en) Data conditioning for dataset destination
US11232100B2 (en) Resource allocation for multiple datasets
US10592562B2 (en) Cloud deployment of a data fabric service system
US11615082B1 (en) Using a data store and message queue to ingest data for a data intake and query system
US9992269B1 (en) Distributed complex event processing
US9400810B2 (en) Monitoring and debugging query execution objects
US20080306904A1 (en) System, method, and program product for integrating databases
CN110795257A (en) Method, device and equipment for processing multi-cluster operation records and storage medium
US11966797B2 (en) Indexing data at a data intake and query system based on a node capacity threshold
US20130047161A1 (en) Selecting processing techniques for a data flow task
US11436116B1 (en) Recovering pre-indexed data from a shared storage system following a failed indexer
US11609913B1 (en) Reassigning data groups from backup to searching for a processing node
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
CN107506381A (en) A kind of big data distributed scheduling analysis method, system and device and storage medium
CN109063040A (en) Client-side program collecting method and system
CN107920067B (en) Intrusion detection method on active object storage system
CN111752933A (en) Method for fusion processing of clustered space-time big data and construction of quality evaluation system
US11841827B2 (en) Facilitating generation of data model summaries
CN109033196A (en) A kind of distributed data scheduling system and method
Wang et al. Turbo: Dynamic and decentralized global analytics via machine learning
Punn et al. Testing big data application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201009