CN111737242A - Method for monitoring mass data processing process - Google Patents

Method for monitoring mass data processing process Download PDF

Info

Publication number
CN111737242A
CN111737242A CN202010564008.6A CN202010564008A CN111737242A CN 111737242 A CN111737242 A CN 111737242A CN 202010564008 A CN202010564008 A CN 202010564008A CN 111737242 A CN111737242 A CN 111737242A
Authority
CN
China
Prior art keywords
data
monitoring
information
quality
mass data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010564008.6A
Other languages
Chinese (zh)
Inventor
吴志雄
陈辉挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linewell Software Co Ltd
Original Assignee
Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linewell Software Co Ltd filed Critical Linewell Software Co Ltd
Priority to CN202010564008.6A priority Critical patent/CN111737242A/en
Publication of CN111737242A publication Critical patent/CN111737242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a method for monitoring a mass data processing process, which comprises the following steps: step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data; step S2: controlling a collector to collect tasks according to a preset collection frequency; step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface; step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry; step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality. The invention can improve the effectiveness and accuracy of data analysis and data mining.

Description

Method for monitoring mass data processing process
Technical Field
The invention relates to the technical field of data processing, in particular to a method for monitoring a mass data processing process.
Background
With the development of computer technology, many government departments and related enterprises convert manual handling into electronic handling, and the conversion of intelligent handling needs to be supported by a powerful data processing framework. However, as electronic certificate services become more and more popular, higher requirements are placed on the stability and efficiency of the data processing framework.
In the prior art, the data quality problem of each link is often ignored when massive certificate handling information is processed, so that data which does not meet the quality is continuously provided downstream, the system is often jammed or the certificate issuing information is inaccurate, manual investigation and error correction are required, and the electronic certificate handling efficiency is greatly reduced.
Disclosure of Invention
In view of this, the present invention provides a method for monitoring a mass data processing process, which can improve effectiveness and accuracy of data analysis and data mining.
The invention is realized by adopting the following scheme: a method for monitoring a mass data processing process specifically comprises the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
Further, in step S1, the type of the imported data includes interface data and database type data.
Further, the step S2 includes setting that the same task can be processed by only one collector at the same time.
Further, in step S3, the quality check is a rule verification, and the collected certificate handling information is checked according to a preset rule.
Further, in step S3, the data that fails the quality check is saved and then repaired later.
Further, in step S4, the analyzed office information is stored in the middleware, database, or statistical database.
Further, in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
Further, in step S5, the monitoring of the data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
Further, the monitoring of the integrity comprises: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the table field data is accurate;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: and whether the data synchronization and the data statistics calculated speed reach a preset speed threshold value or not.
Further, step S6 is included: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
Compared with the prior art, the invention has the following beneficial effects: the invention can monitor the quality of the information data in the process of processing the information data, quickly generate a data quality report and improve the effectiveness and accuracy of data analysis and data mining.
Drawings
Fig. 1 is a schematic diagram of the principle of the embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a method for monitoring a mass data processing process, which specifically includes the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
In this embodiment, in step S1, the type of the imported data includes interface data and database type data (mysql, postgre, hive, elastic search). The final data format is the character string content in json format.
In this embodiment, the step S2 further includes setting that the same task can be processed by only one collector at the same time.
In this embodiment, in step S3, the quality check is a rule check, and the collected certificate handling information is checked according to a preset rule (e.g., rule check whether the id card is 18 bits, whether the office code meets the rule, whether the receiving time is longer than the receiving time, etc.).
In this embodiment, in step S3, the data that fails the quality check is saved and then repaired later.
In this embodiment, in step S4, the analyzed office package information is stored in the middleware, database, or statistical database.
In this embodiment, in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
In this embodiment, in step S5, the monitoring of the data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
In this embodiment, the monitoring of the integrity includes: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the data of the field is accurate or not is shown, if the transaction time is longer than the receiving time;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: whether the data synchronization and the data statistics calculation speed reach the preset speed threshold value is reflected in whether the calculation of the monitoring result data is completed before the appointed time point.
In the present embodiment, step S6 is included: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (10)

1. A method for monitoring the processing process of mass data is characterized by comprising the following steps:
step S1: monitoring the type of the imported data, and caching the acquisition task queue and the process data;
step S2: controlling a collector to collect tasks according to a preset collection frequency;
step S3: the collector performs quality verification on the collected certificate handling information data, and pushes the data passing the quality verification to the kafka middleware by calling a service interface;
step S4: consuming the messages produced by the kafka, analyzing message data, and storing analyzed office information for service inquiry;
step S5: and monitoring the acquisition running condition of the data source and monitoring the data quality.
2. The method for monitoring mass data processing procedure according to claim 1, wherein in step S1, the type of the imported data includes interface data and database type data.
3. The method for monitoring mass data processing procedure according to claim 1, wherein the step S2 further includes setting that the same task can be processed by only one collector at the same time.
4. The method for monitoring the processing process of mass data according to claim 1, wherein in step S3, the quality check is a rule verification, and the collected certificate handling information is checked according to a preset rule.
5. The method for monitoring mass data processing procedure according to claim 1, wherein in step S3, the data that fails the quality check is saved and then repaired later.
6. The method for monitoring mass data processing procedure according to claim 1, wherein in step S4, the parsed office information is stored in a middleware, a database or a statistical database.
7. The method for monitoring the processing process of mass data according to claim 1, wherein in step S5, the monitoring of the data source acquisition operation condition specifically includes: monitoring the state of each acquisition task, and checking the conditions including abnormal tasks and missed data acquisition; and meanwhile, the utilization rates of a CPU, a memory and a hard disk of the server are monitored, and whether the server is down is checked.
8. The method for monitoring mass data processing process according to claim 1, wherein in step S5, the monitoring of data quality specifically includes: the integrity, accuracy, consistency and timeliness of the data are monitored.
9. The method for monitoring mass data processing procedure according to claim 8,
the monitoring of the integrity comprises: whether the basic information, the accepted information, the link information and the transaction information data of the same office are complete or not; reporting whether field data in the basic information is missing;
the monitoring of the accuracy comprises: whether the table field data is accurate;
the monitoring of the consistency comprises: whether the closing time of the closing table is consistent with the closing time in the basic declaration information or not;
the monitoring of the timeliness comprises: and whether the data synchronization and the data statistics calculated speed reach a preset speed threshold value or not.
10. The method for monitoring mass data processing procedure according to claim 8, comprising step S6: and feeding back the data monitored by the data quality to a user in a quality report mode so as to repair the data problem in time.
CN202010564008.6A 2020-06-19 2020-06-19 Method for monitoring mass data processing process Pending CN111737242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010564008.6A CN111737242A (en) 2020-06-19 2020-06-19 Method for monitoring mass data processing process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010564008.6A CN111737242A (en) 2020-06-19 2020-06-19 Method for monitoring mass data processing process

Publications (1)

Publication Number Publication Date
CN111737242A true CN111737242A (en) 2020-10-02

Family

ID=72650262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010564008.6A Pending CN111737242A (en) 2020-06-19 2020-06-19 Method for monitoring mass data processing process

Country Status (1)

Country Link
CN (1) CN111737242A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641667A (en) * 2021-08-12 2021-11-12 深圳市润迅通投资有限公司 Data abnormity monitoring system and method of distributed big data acquisition platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103025A (en) * 2017-01-05 2017-08-29 北京亚信智慧数据科技有限公司 A kind of data processing method and data processing platform (DPP)
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN108959616A (en) * 2018-07-18 2018-12-07 广州供电局有限公司 Production numeric field data quality based on big data technology quasi real time monitoring system and method
CN110008201A (en) * 2019-04-09 2019-07-12 浩鲸云计算科技股份有限公司 A kind of quality of data towards big data checks monitoring method
CN110611576A (en) * 2018-06-14 2019-12-24 亿阳信通股份有限公司 Data quality monitoring method, device, equipment and storage medium
CN111124679A (en) * 2019-12-19 2020-05-08 南京莱斯信息技术股份有限公司 Time-limited automatic processing method for multi-source heterogeneous mass data
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103025A (en) * 2017-01-05 2017-08-29 北京亚信智慧数据科技有限公司 A kind of data processing method and data processing platform (DPP)
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN110611576A (en) * 2018-06-14 2019-12-24 亿阳信通股份有限公司 Data quality monitoring method, device, equipment and storage medium
CN108959616A (en) * 2018-07-18 2018-12-07 广州供电局有限公司 Production numeric field data quality based on big data technology quasi real time monitoring system and method
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN110008201A (en) * 2019-04-09 2019-07-12 浩鲸云计算科技股份有限公司 A kind of quality of data towards big data checks monitoring method
CN111124679A (en) * 2019-12-19 2020-05-08 南京莱斯信息技术股份有限公司 Time-limited automatic processing method for multi-source heterogeneous mass data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641667A (en) * 2021-08-12 2021-11-12 深圳市润迅通投资有限公司 Data abnormity monitoring system and method of distributed big data acquisition platform
CN113641667B (en) * 2021-08-12 2022-05-20 深圳市润迅通投资有限公司 Data abnormity monitoring system and method of distributed big data acquisition platform

Similar Documents

Publication Publication Date Title
CN110287052B (en) Root cause task determination method and device for abnormal task
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
CN107220892B (en) Intelligent preprocessing tool and method applied to massive P2P network loan financial data
CN112416724B (en) Alarm processing method, system, computer device and storage medium
CN111274095B (en) Log data processing method, device, equipment and computer readable storage medium
CN112395177A (en) Interactive processing method, device and equipment of service data and storage medium
CN111800292B (en) Early warning method and device based on historical flow, computer equipment and storage medium
CN111913824A (en) Method for determining data link fault reason and related equipment
CN115509797A (en) Method, device, equipment and medium for determining fault category
CN111737242A (en) Method for monitoring mass data processing process
US20140149524A1 (en) Information processing apparatus and information processing method
CN114090529A (en) Log management method, device, system and storage medium
CN110011845B (en) Log collection method and system
CN111143304A (en) Micro-service system abnormal log analysis method based on request link
CN114387123B (en) Data acquisition management method
CN112965793B (en) Identification analysis data-oriented data warehouse task scheduling method and system
CN112131180B (en) Data reporting method, device and storage medium
CN114022279A (en) Service data error correction method, device, equipment and readable storage medium
CN114546780A (en) Data monitoring method, device, equipment, system and storage medium
CN112604295A (en) Method and device for reporting game update failure, management method and server
CN112835794A (en) Method and system for positioning and monitoring code execution problem based on Swoole
CN113254313A (en) Monitoring index abnormality detection method and device, electronic equipment and storage medium
CN111835566A (en) System fault management method, device and system
CN116089365A (en) Service log screening method, device, equipment and storage medium
CN116909921A (en) Data quality detection method, device, apparatus and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201002

RJ01 Rejection of invention patent application after publication