CN111737242A - Method for monitoring mass data processing process - Google Patents
Method for monitoring mass data processing process Download PDFInfo
- Publication number
- CN111737242A CN111737242A CN202010564008.6A CN202010564008A CN111737242A CN 111737242 A CN111737242 A CN 111737242A CN 202010564008 A CN202010564008 A CN 202010564008A CN 111737242 A CN111737242 A CN 111737242A
- Authority
- CN
- China
- Prior art keywords
- data
- monitoring
- information
- quality
- mass data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 title claims abstract description 21
- 238000012795 verification Methods 0.000 claims abstract description 10
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 abstract description 3
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention relates to a method for monitoring a mass data processing process, which comprises the following steps: step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data; step S2: controlling a collector to collect tasks according to a preset collection frequency; step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface; step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry; step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality. The invention can improve the effectiveness and accuracy of data analysis and data mining.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a method for monitoring a mass data processing process.
Background
With the development of computer technology, many government departments and related enterprises convert manual handling into electronic handling, and the conversion of intelligent handling needs to be supported by a powerful data processing framework. However, as electronic certificate services become more and more popular, higher requirements are placed on the stability and efficiency of the data processing framework.
In the prior art, the data quality problem of each link is often ignored when massive certificate handling information is processed, so that data which does not meet the quality is continuously provided downstream, the system is often jammed or the certificate issuing information is inaccurate, manual investigation and error correction are required, and the electronic certificate handling efficiency is greatly reduced.
Disclosure of Invention
In view of this, the present invention provides a method for monitoring a mass data processing process, which can improve effectiveness and accuracy of data analysis and data mining.
The invention is realized by adopting the following scheme: a method for monitoring a mass data processing process specifically comprises the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
Further, in step S1, the type of the imported data includes interface data and database type data.
Further, the step S2 includes setting that the same task can be processed by only one collector at the same time.
Further, in step S3, the quality check is a rule verification, and the collected certificate handling information is checked according to a preset rule.
Further, in step S3, the data that fails the quality check is saved and then repaired later.
Further, in step S4, the analyzed office information is stored in the middleware, database, or statistical database.
Further, in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
Further, in step S5, the monitoring of the data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
Further, the monitoring of the integrity comprises: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the table field data is accurate;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: and whether the data synchronization and the data statistics calculated speed reach a preset speed threshold value or not.
Further, step S6 is included: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
Compared with the prior art, the invention has the following beneficial effects: the invention can monitor the quality of the information data in the process of processing the information data, quickly generate a data quality report and improve the effectiveness and accuracy of data analysis and data mining.
Drawings
Fig. 1 is a schematic diagram of the principle of the embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a method for monitoring a mass data processing process, which specifically includes the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
In this embodiment, in step S1, the type of the imported data includes interface data and database type data (mysql, postgre, hive, elastic search). The final data format is the character string content in json format.
In this embodiment, the step S2 further includes setting that the same task can be processed by only one collector at the same time.
In this embodiment, in step S3, the quality check is a rule check, and the collected certificate handling information is checked according to a preset rule (e.g., rule check whether the id card is 18 bits, whether the office code meets the rule, whether the receiving time is longer than the receiving time, etc.).
In this embodiment, in step S3, the data that fails the quality check is saved and then repaired later.
In this embodiment, in step S4, the analyzed office package information is stored in the middleware, database, or statistical database.
In this embodiment, in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
In this embodiment, in step S5, the monitoring of the data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
In this embodiment, the monitoring of the integrity includes: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the data of the field is accurate or not is shown, if the transaction time is longer than the receiving time;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: whether the data synchronization and the data statistics calculation speed reach the preset speed threshold value is reflected in whether the calculation of the monitoring result data is completed before the appointed time point.
In the present embodiment, step S6 is included: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
Claims (10)
1. A method for monitoring the processing process of mass data is characterized by comprising the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
2. The method for monitoring mass data processing procedure according to claim 1, wherein in step S1, the type of the imported data includes interface data and database type data.
3. The method for monitoring mass data processing procedure according to claim 1, wherein the step S2 further includes setting that the same task can be processed by only one collector at the same time.
4. The method for monitoring the processing process of mass data according to claim 1, wherein in step S3, the quality check is a rule verification, and the collected certificate handling information is checked according to a preset rule.
5. The method for monitoring mass data processing procedure according to claim 1, wherein in step S3, the data that fails the quality check is saved and then repaired later.
6. The method for monitoring mass data processing procedure according to claim 1, wherein in step S4, the parsed office information is stored in a middleware, a database or a statistical database.
7. The method for monitoring the processing process of mass data according to claim 1, wherein in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
8. The method for monitoring mass data processing process according to claim 1, wherein in step S5, the monitoring of data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
9. The method for monitoring mass data processing procedure according to claim 8,
the monitoring of the integrity comprises: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the table field data is accurate;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: and whether the data synchronization and the data statistics calculated speed reach a preset speed threshold value or not.
10. The method for monitoring mass data processing procedure according to claim 8, comprising step S6: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010564008.6A CN111737242A (en) | 2020-06-19 | 2020-06-19 | Method for monitoring mass data processing process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010564008.6A CN111737242A (en) | 2020-06-19 | 2020-06-19 | Method for monitoring mass data processing process |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111737242A true CN111737242A (en) | 2020-10-02 |
Family
ID=72650262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010564008.6A Pending CN111737242A (en) | 2020-06-19 | 2020-06-19 | Method for monitoring mass data processing process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737242A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113641667A (en) * | 2021-08-12 | 2021-11-12 | 深圳市润迅通投资有限公司 | Data abnormity monitoring system and method of distributed big data acquisition platform |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103025A (en) * | 2017-01-05 | 2017-08-29 | 北京亚信智慧数据科技有限公司 | A kind of data processing method and data processing platform (DPP) |
CN108846076A (en) * | 2018-06-08 | 2018-11-20 | 山大地纬软件股份有限公司 | The massive multi-source ETL process method and system of supporting interface adaptation |
CN108959616A (en) * | 2018-07-18 | 2018-12-07 | 广州供电局有限公司 | Production numeric field data quality based on big data technology quasi real time monitoring system and method |
CN110008201A (en) * | 2019-04-09 | 2019-07-12 | 浩鲸云计算科技股份有限公司 | A kind of quality of data towards big data checks monitoring method |
CN110611576A (en) * | 2018-06-14 | 2019-12-24 | 亿阳信通股份有限公司 | Data quality monitoring method, device, equipment and storage medium |
CN111124679A (en) * | 2019-12-19 | 2020-05-08 | 南京莱斯信息技术股份有限公司 | Time-limited automatic processing method for multi-source heterogeneous mass data |
US20200160230A1 (en) * | 2018-11-19 | 2020-05-21 | International Business Machines Corporation | Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs |
-
2020
- 2020-06-19 CN CN202010564008.6A patent/CN111737242A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103025A (en) * | 2017-01-05 | 2017-08-29 | 北京亚信智慧数据科技有限公司 | A kind of data processing method and data processing platform (DPP) |
CN108846076A (en) * | 2018-06-08 | 2018-11-20 | 山大地纬软件股份有限公司 | The massive multi-source ETL process method and system of supporting interface adaptation |
CN110611576A (en) * | 2018-06-14 | 2019-12-24 | 亿阳信通股份有限公司 | Data quality monitoring method, device, equipment and storage medium |
CN108959616A (en) * | 2018-07-18 | 2018-12-07 | 广州供电局有限公司 | Production numeric field data quality based on big data technology quasi real time monitoring system and method |
US20200160230A1 (en) * | 2018-11-19 | 2020-05-21 | International Business Machines Corporation | Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs |
CN110008201A (en) * | 2019-04-09 | 2019-07-12 | 浩鲸云计算科技股份有限公司 | A kind of quality of data towards big data checks monitoring method |
CN111124679A (en) * | 2019-12-19 | 2020-05-08 | 南京莱斯信息技术股份有限公司 | Time-limited automatic processing method for multi-source heterogeneous mass data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113641667A (en) * | 2021-08-12 | 2021-11-12 | 深圳市润迅通投资有限公司 | Data abnormity monitoring system and method of distributed big data acquisition platform |
CN113641667B (en) * | 2021-08-12 | 2022-05-20 | 深圳市润迅通投资有限公司 | Data abnormity monitoring system and method of distributed big data acquisition platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287052B (en) | Root cause task determination method and device for abnormal task | |
EP4099170B1 (en) | Method and apparatus of auditing log, electronic device, and medium | |
CN107220892B (en) | Intelligent preprocessing tool and method applied to massive P2P network loan financial data | |
CN112416724B (en) | Alarm processing method, system, computer device and storage medium | |
CN111274095B (en) | Log data processing method, device, equipment and computer readable storage medium | |
CN112395177A (en) | Interactive processing method, device and equipment of service data and storage medium | |
CN111800292B (en) | Early warning method and device based on historical flow, computer equipment and storage medium | |
CN111913824A (en) | Method for determining data link fault reason and related equipment | |
CN115509797A (en) | Method, device, equipment and medium for determining fault category | |
CN111737242A (en) | Method for monitoring mass data processing process | |
US20140149524A1 (en) | Information processing apparatus and information processing method | |
CN114090529A (en) | Log management method, device, system and storage medium | |
CN110011845B (en) | Log collection method and system | |
CN111143304A (en) | Micro-service system abnormal log analysis method based on request link | |
CN114387123B (en) | Data acquisition management method | |
CN112965793B (en) | Identification analysis data-oriented data warehouse task scheduling method and system | |
CN112131180B (en) | Data reporting method, device and storage medium | |
CN114022279A (en) | Service data error correction method, device, equipment and readable storage medium | |
CN114546780A (en) | Data monitoring method, device, equipment, system and storage medium | |
CN112604295A (en) | Method and device for reporting game update failure, management method and server | |
CN112835794A (en) | Method and system for positioning and monitoring code execution problem based on Swoole | |
CN113254313A (en) | Monitoring index abnormality detection method and device, electronic equipment and storage medium | |
CN111835566A (en) | System fault management method, device and system | |
CN116089365A (en) | Service log screening method, device, equipment and storage medium | |
CN116909921A (en) | Data quality detection method, device, apparatus and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201002 |
|
RJ01 | Rejection of invention patent application after publication |