CN111752933A - Method for fusion processing of clustered space-time big data and construction of quality evaluation system - Google Patents
Method for fusion processing of clustered space-time big data and construction of quality evaluation system Download PDFInfo
- Publication number
- CN111752933A CN111752933A CN202010374707.4A CN202010374707A CN111752933A CN 111752933 A CN111752933 A CN 111752933A CN 202010374707 A CN202010374707 A CN 202010374707A CN 111752933 A CN111752933 A CN 111752933A
- Authority
- CN
- China
- Prior art keywords
- cluster
- geodataservice
- data
- fusion processing
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007499 fusion processing Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 20
- 238000010276 construction Methods 0.000 title claims abstract description 12
- 230000005540 biological transmission Effects 0.000 claims abstract description 13
- 238000012544 monitoring process Methods 0.000 claims abstract description 6
- 230000001360 synchronised effect Effects 0.000 claims abstract description 4
- 230000008859 change Effects 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 2
- 238000004140 cleaning Methods 0.000 abstract description 8
- 230000008569 process Effects 0.000 abstract description 6
- 238000001914 filtration Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 3
- 238000004891 communication Methods 0.000 abstract 1
- 230000004927 fusion Effects 0.000 description 7
- 238000007689 inspection Methods 0.000 description 6
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The invention provides a method for clustered space-time big Data fusion processing and quality evaluation system construction, which relates to the technical field of communication and comprises a zookeeper cluster, a Rule Manager management component and a Data Handle cluster, wherein the zookeeper cluster consists of 3 or more machines, the Data Handle cluster consists of more than 1 Data transmission service node, a component for monitoring a regular path in the zookeeper cluster is embedded in each Data transmission service node, the Data transmission service nodes are connected with the zookeeper cluster, and the Rule Manager management component is connected with the zookeeper cluster. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the timing rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.
Description
Technical Field
The invention relates to the field of space-time big data, in particular to a method for fusion processing of clustered space-time big data and construction of a quality evaluation system.
Background
The space-time big data fusion processing is a process of carrying out multi-dimensional fusion analysis on traditional service data based on spatial data, the data quality evaluation is a process of comprehensively evaluating the data fusion quality under a universal algorithm rule aiming at a set data rule, and the space-time big data fusion processing and the quality evaluation system thereof generally comprise a data fusion processing system and a data quality evaluation system.
The space-time big data fusion processing is used for constructing data from the relation of time dimension and space dimension by using a data rule of different data sources through a universal data transmission protocol, and realizing the matching fusion of the whole flow of the process that the data is extracted (extract), transformed (transform) and loaded (load) from a source end to a destination end. The data fusion processing is an important ring of big data analysis, and a user extracts required data from a data source, and finally loads the data into a data warehouse according to a predefined data warehouse model after data cleaning to provide data support and data service for a general algorithm model.
Big data is an important resource of modern enterprises and governments and is the basis for applying scientific management and decision analysis. According to statistics, the data volume can be multiplied every 2-3 years, the data contain huge commercial values, and the data concerned by enterprises generally only account for about 2% -4% of the total data volume. As a result, businesses still do not maximize the use of existing data resources, wasting more time and money, and losing the best opportunity to make critical business decisions. Therefore, how to convert data into information and knowledge through various technical means has become a major bottleneck for improving the core competitiveness of enterprises. Data fusion processing and quality evaluation are very beneficial means, so that a user can rapidly and accurately discover potential value and decision information of data in mass data.
Most of fusion processing and quality evaluation tools in the market at present are single-machine processing, and some distributed data fusion processing tools are also available, but the cleaning and filtering rules are managed by a single node, the cleaning rules cannot be managed in a unified manner, and the real-time rules are difficult to be strictly synchronized to other nodes and take effect in the rule changing process.
Disclosure of Invention
The invention aims to provide a method for fusion processing of clustered space-time big data and construction of a quality evaluation system, so as to solve the technical problems.
In order to solve the technical problems, the invention adopts the following technical scheme:
the method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
The method for fusion processing of clustered spatio-temporal big data and construction of a quality evaluation system comprises the following steps:
(1) the space-time big Data are distributed to a DataRule Manager node through a GeoDataService node for regularized matching fusion, the fused Data Check the Data quality through a Data Check, and the Data passing the Check are directly fed back to the GeoDataService to provide Data service support for the outside;
(2) the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
(3) and the Data Check cluster monitors the rule according to the specified path and evaluates the quality of the Data fusion result.
Preferably, the GeoDataService device is provided with an interface for management and detection.
Preferably, the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.
Preferably, the storage form table structure of the rule in the GeoDataService cluster in step (1) utilizes a database system provided by GeoDataService, and the directory combination is completely divided by referring to each data transmission task, and the GeoDataService can persist data to each GeoDataService node. The security of the file is ensured by multiple copies of the file.
Preferably, the DataRule Manager manages the configuration of the rule, and stores the rule into the GeoDataService cluster according to the task definition path.
Preferably, the Data Check cluster in step (3) automatically selects a task, and registers the task into the GeoDataService cluster to perform monitoring on the current task operation condition and the node to which the task belongs. And ensuring the task running integrity.
The invention has the beneficial effects that:
the invention can carry out fusion processing and quality evaluation on spatial big data at a clustering position, can inform each processing node of a GeoDataService cluster and a DataRule Manager cluster according to a change event when the rule changes, so that the latest rule takes effect in real time, can ensure the integrity of all running tasks when data migration service is deployed in a plurality of machines, and can be followed by other nodes to execute the rest work when a certain data transmission service node has a problem and stops. The invention solves the problems that the cleaning and filtering rules of the data transmission tool in the current market are managed by a single node, the cleaning rules cannot be managed uniformly, and the real-time rules are difficult to be strictly synchronized to other nodes and become effective in the rule changing process.
Drawings
FIG. 1 is a diagram of the method set up of the present invention;
Detailed Description
The present invention will be further described with reference to specific embodiments for the purpose of facilitating an understanding of technical means, characteristics of creation, objectives and functions realized by the present invention, but the following embodiments are only preferred embodiments of the present invention, and are not intended to be exhaustive. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention. The experimental methods in the following examples are conventional methods unless otherwise specified, and materials, reagents and the like used in the following examples are commercially available unless otherwise specified.
The method comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, wherein the GeoDataService cluster consists of 2 or more GeoDataService devices, the Data Check cluster consists of more than 1 Data inspection service node, a component for monitoring a regular path in the DataRule Manager cluster is embedded in each Data inspection service node, the Data inspection service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
The clustered data rule processing method comprises the following steps:
(1) the space-time big Data are distributed to a DataRule Manager node through a GeoDataService node for regularized matching fusion, the fused Data Check the Data quality through a Data Check, and the Data passing the Check are directly fed back to the GeoDataService to provide Data service support for the outside;
(2) the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
(3) and the Data Check cluster monitors the rule according to the specified path and evaluates the quality of the Data fusion result.
Besides the cleansing filtering rules, the GeoDataService cluster also contains the record of task attribution, when a data migration task is started, how many tasks are configured to be executed, and then the configuration file contains the following:
the file directories recorded in the GeoDataService database system are as follows:
/jobname/sink/task
the task file contents are as follows:
where flag represents the current point at which task is performed such as: the position value, active and down of the binlog represent the current task execution state; the Data Check cluster is used for monitoring the rules according to a specified path and loading the rules for effectiveness, in order to prevent Data migration pressure brought by multitask execution, the Data Check cluster generally deploys a plurality of tasks for shunting execution, at the moment, the inside of the Data Check cluster can contain a function for competing execution of the tasks, the Data Check cluster in the node can automatically search for the started jobs and then register the jobs in unbound task files, if no task to be bound exists in the task files, the next jobs are collectively searched, when the jobs are searched, balancing can be performed according to the bound jobs, and the balancing strategy is as follows: the number of bound tasks is N, the unbound tasks and the status of all the tasks are N, the number of Data Check clusters is DN, and the total number of tasks is TN: MIN (TN/DN, N) < TN/DN? (TN/DN-n): 0, when a certain data transmission service node is disconnected, the data transmission service node can be obtained by other task nodes again;
the following rules are adopted for the rules for data cleaning and filtering in data migration: field level rule R, table level rule R, library level rule TR:when the sub-item rules are changed, only the corresponding rules R need to be changed, R and TR are kept unchanged, if the upper-level rules are deleted correspondingly, the lower-level rules are directly deleted, any change is directly reported to the GeoDataService cluster, the rule files under the related directories are directly changed and are synchronizedAnd the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action to other nodes.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (7)
1. The method for constructing the clustering space-time big Data fusion processing and quality evaluation system comprises a GeoDataService cluster, a DataRule Manager management component and a Data Check cluster, and is characterized in that:
the GeoDataService cluster is composed of 2 or more GeoDataService devices, the Data Check cluster is composed of more than 1 Data Check service node, each Data Check service node is embedded with a component for monitoring a regular path in a DataRule Manager cluster, the Data Check service nodes are connected with the DataRule Manager cluster, and the DataRule Manager management component is connected with the GeoDataService cluster.
2. The method for fusion processing of clustered spatiotemporal big data and construction of a quality evaluation system according to claim 1, characterized in that the method comprises the following steps:
the method comprises the following steps: the data check rule is changed and directly reported to one of the GeoDataService nodes, the rule file in the relevant directory is directly changed and synchronized to other GeoDataService nodes;
step two: the GeoDataService cluster informs each node of the Data Check cluster of the change according to the event action;
step three: and the Data Check cluster performs rule checking and Data fusion processing on the rule according to the specified path.
3. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the GeoDataService equipment is provided with an interface for management and detection.
4. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the data transmission service nodes in the GeoDataService cluster share a configuration rule, and the rule is changed and validated in real time through the watch.
5. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 1, characterized in that: and the DataRule Manager management component is used for carrying out configuration management operation on the rules and storing the rules into the GeoDataService cluster according to the task definition path.
6. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: in the first step, the storage mode of the rule in the GeoDataService cluster is structured and unstructured large spatial data files, the file system, the relational database system and the non-relational database system provided by the GeoDataService are utilized, the directory combination is completely divided by referring to each data transmission task, and the GeoDataService file system can persist the files to each GeoDataService node.
7. The method for fusion processing and quality evaluation system construction of clustered spatiotemporal big data according to claim 2, characterized in that: and step three, the Data Check cluster automatically selects the task and registers the task into the GeoDataService cluster to monitor the current task operation condition and the node to which the operation belongs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010374707.4A CN111752933A (en) | 2020-05-06 | 2020-05-06 | Method for fusion processing of clustered space-time big data and construction of quality evaluation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010374707.4A CN111752933A (en) | 2020-05-06 | 2020-05-06 | Method for fusion processing of clustered space-time big data and construction of quality evaluation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111752933A true CN111752933A (en) | 2020-10-09 |
Family
ID=72673818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010374707.4A Pending CN111752933A (en) | 2020-05-06 | 2020-05-06 | Method for fusion processing of clustered space-time big data and construction of quality evaluation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111752933A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857827A (en) * | 2019-01-31 | 2019-06-07 | 山东省国土测绘院 | A kind of geography information archives integrated management approach and system |
CN109885613A (en) * | 2019-01-03 | 2019-06-14 | 江苏智途科技股份有限公司 | Clustering data rule processing method |
CN110334164A (en) * | 2019-06-12 | 2019-10-15 | 重庆工商大学融智学院 | A kind of fusion method of ecological space data |
-
2020
- 2020-05-06 CN CN202010374707.4A patent/CN111752933A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109885613A (en) * | 2019-01-03 | 2019-06-14 | 江苏智途科技股份有限公司 | Clustering data rule processing method |
CN109857827A (en) * | 2019-01-31 | 2019-06-07 | 山东省国土测绘院 | A kind of geography information archives integrated management approach and system |
CN110334164A (en) * | 2019-06-12 | 2019-10-15 | 重庆工商大学融智学院 | A kind of fusion method of ecological space data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11586692B2 (en) | Streaming data processing | |
US20220327149A1 (en) | Dynamic partition allocation for query execution | |
US20230049579A1 (en) | Executing commands from a distributed execution model | |
US9940373B2 (en) | Method and system for implementing an operating system hook in a log analytics system | |
US11461334B2 (en) | Data conditioning for dataset destination | |
US11232100B2 (en) | Resource allocation for multiple datasets | |
US10592562B2 (en) | Cloud deployment of a data fabric service system | |
US11615082B1 (en) | Using a data store and message queue to ingest data for a data intake and query system | |
US9992269B1 (en) | Distributed complex event processing | |
US9400810B2 (en) | Monitoring and debugging query execution objects | |
US20080306904A1 (en) | System, method, and program product for integrating databases | |
CN110795257A (en) | Method, device and equipment for processing multi-cluster operation records and storage medium | |
US11966797B2 (en) | Indexing data at a data intake and query system based on a node capacity threshold | |
US20130047161A1 (en) | Selecting processing techniques for a data flow task | |
US11436116B1 (en) | Recovering pre-indexed data from a shared storage system following a failed indexer | |
US11609913B1 (en) | Reassigning data groups from backup to searching for a processing node | |
US11892976B2 (en) | Enhanced search performance using data model summaries stored in a remote data store | |
CN107506381A (en) | A kind of big data distributed scheduling analysis method, system and device and storage medium | |
CN109063040A (en) | Client-side program collecting method and system | |
CN107920067B (en) | Intrusion detection method on active object storage system | |
CN111752933A (en) | Method for fusion processing of clustered space-time big data and construction of quality evaluation system | |
US11841827B2 (en) | Facilitating generation of data model summaries | |
CN109033196A (en) | A kind of distributed data scheduling system and method | |
Wang et al. | Turbo: Dynamic and decentralized global analytics via machine learning | |
Punn et al. | Testing big data application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201009 |