CN110569174A - Distributed monitoring system and method for NIFI task - Google Patents

Distributed monitoring system and method for NIFI task Download PDF

Info

Publication number
CN110569174A
CN110569174A CN201910874660.5A CN201910874660A CN110569174A CN 110569174 A CN110569174 A CN 110569174A CN 201910874660 A CN201910874660 A CN 201910874660A CN 110569174 A CN110569174 A CN 110569174A
Authority
CN
China
Prior art keywords
task
processor
log
nifi
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910874660.5A
Other languages
Chinese (zh)
Other versions
CN110569174B (en
Inventor
陈绪申
程林
杨培强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Shandong Inspur Business System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Business System Co Ltd filed Critical Shandong Inspur Business System Co Ltd
Priority to CN201910874660.5A priority Critical patent/CN110569174B/en
Publication of CN110569174A publication Critical patent/CN110569174A/en
Application granted granted Critical
Publication of CN110569174B publication Critical patent/CN110569174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/323Visualisation of programs or trace data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a distributed monitoring system and a distributed monitoring method for an NIFI task, belongs to the field of task monitoring, and aims to solve the technical problem of how to perform log statistics and monitoring on the NIFI from two aspects of tasks and steps. The system comprises: kafka distributed cluster; the system comprises a relational database, a database management module and a database management module, wherein a log table is configured in the relational database; the log processor is a component and comprises a task log processor and a step log processor; and the monitoring platform is used for reading the log table and displaying the log table through a monitoring interface. The method comprises the following steps: constructing a distributed monitoring system for NIFI tasks, and configuring a log processor based on an NIFI task chain; performing log statistics on the NIFI task from two aspects of task and step through a log processor; and respectively displaying the task log table and the step log table through monitoring interface display to monitor the tasks and the steps.

Description

Distributed monitoring system and method for NIFI task
Technical Field
the invention relates to the field of task monitoring, in particular to a distributed monitoring system and a distributed monitoring method for NIFI tasks.
Background
In the tax industry, with the development and use of various informatization systems such as 'three-phase of gold tax', value-added tax invoice management systems, personal tax management systems and the like and the deep integration of life consumption of people and internet technology, the explosive growth of internal data of the tax system, third-party data of other government departments and internet tax-related data is caused. In order to effectively store, manage and apply the tax big data and improve the tax administration level, various big data assembly technologies such as Hive, Hbase, Kudu and the like are introduced into tax systems all over the country.
Due to the fact that the adopted systems, platforms and involved fields are various, tax-involved data have the characteristics of being large in quantity, multiple in source and heterogeneous, and certain difficulty is brought to data collection and integration. The appearance of the NIFI framework provides powerful technical support for large data integration, a bridge is built for realizing data interconnection and intercommunication among different systems such as SQL, HDFS, Kafka, Hbase, elastic search, FTP and the like under a large data environment, conversion of various data formats such as json, xml, avro and the like is supported, a customized processor development framework is provided, a user can customize functions according to self requirements, streaming data standardization processing of data acquisition, cleaning, conversion and matching is conveniently realized, and real-time integration and utilization of multi-source heterogeneous data are realized.
the NIFI task uses a processor as a basic module, is connected in series to form a chained processing flow, different flow branches can be set in the middle according to conditions, and the data is usually processed by various cleaning, converting and matching modules after being acquired and finally falls to the ground at a target end. For complex tasks, the processing links are numerous, the processing branches and the rules are various, and any point in the process has a problem, which may cause difficulty in positioning and tracing. Therefore, detailed monitoring of each task link is necessary.
The NIFI framework itself can provide very limited log monitoring, mainly with the following problems:
The first problem is that: the NIFI canvas only displays error reporting information at the processor level, unified monitoring cannot be carried out by taking tasks as dimensions, the processor position needs to be manually positioned, and running monitoring is inconvenient when a plurality of tasks are deployed on one NIFI node.
The second problem is that: NIFI adopts a streaming data processing mode, data does not fall to the ground in the processing process, and problem data cannot be checked afterwards.
The third problem is that: the error information displayed by the canvas is only kept within a certain time, and the log cannot be durably stored.
one NIFI task needs to be monitored from both the task and step dimensions to obtain the success and failure counts of the overall task execution and the success and failure counts of each step. However, due to the NIFI streaming data processing method, the same batch of data flows through each processor on the branch and obtains a success or failure state at each node, so the execution condition of the task overall cannot simply add the statistical results of each processor.
Based on the analysis, how to perform log statistics and monitoring on NIFI from two aspects of tasks and steps is a technical problem to be solved.
Disclosure of Invention
With regard to NIFI, it is to be understood that:
NIFI, a big data frame for realizing automatic processing of intersystem data flow.
The canvas, the visual display of the NIFI framework, can create and configure NIFI processor instances and task links on the canvas, and can view the status information of the processors and connections.
the processor, a component, is responsible for executing specific operations in the NIFI, is a basic module for forming NIFI data stream, and mainly completes operations of stream file creation and acquisition, stream file content reading/writing, stream file attribute reading/changing, stream file routing/distribution and the like.
The stream file, each data object flowing in the system, representing a piece of data in the NIFI task, is composed of an attribute and a content volume, the content being the data represented by the stream file, the attribute being a set of key-value pairs providing metadata associated with the content.
the connection, the actual connection between processors, acts as a queue for transferring and buffering streaming files between processors, and the connection must make a relationship.
Each component may have several relationships, which indicate the processor results of the stream files of the processors, such as "success", "failure", etc., and one or more relationships need to be specified when creating connections between the processors, and the stream files are sent to the corresponding connections according to the processing results of the processors.
ProcessContext provides a beam between the processor and the canvas, and can acquire the configuration information of the processor by the canvas.
Kafka, a unified, high throughput, low latency open source platform for processing real-time data, can be used as a large-scale publish/subscribe message queue.
the technical task of the invention is to provide a distributed monitoring system and a distributed monitoring method for NIFI task, aiming at the defects, so as to solve the problem of how to perform log statistics and monitoring on NIFI from two aspects of task and step.
in a first aspect, the present invention provides a distributed monitoring system for NIFI tasks, which is used for performing log statistics and monitoring on NIFI tasks from two aspects of tasks and steps, and the distributed monitoring system includes:
kafka distributed cluster;
the system comprises a relational database, a database management module and a database management module, wherein a log table is configured in the relational database and comprises a task log table and a step log table;
The log processor is a component and comprises a task log processor and a step log processor; the task log processor is used for counting task data in real time, storing the task data and task metadata corresponding to the tasks to a task log table at regular time, and storing the contents of stream files for processing failed tasks to the kafka distributed cluster; the step log processor is used for counting step data corresponding to each intermediate processor executing task in an NIFI task link in real time and storing the step data and task metadata corresponding to the task into a step log table;
The monitoring platform is used for reading the log table and displaying the log table through a monitoring interface;
the task data comprises a total task data volume, a total task success data volume and a total task failure data volume, wherein the total task data volume is the data volume of all streaming files corresponding to the task, the total task success data volume is the data volume of the whole processing success of all processors in the NIFI task chain, and the task failure data volume is the data volume of the processing failure of a tail node processor in the NIFI task chain;
the step data comprises step success data volume and step failure data volume, wherein the step success data volume is data volume successfully processed by the intermediate processor, and the step failure data volume is data volume failed to be processed by the intermediate processor;
The intermediate processor is each processor in the NIFI task chain except for the first node processor and the last node processor.
The NIFI task chain is provided with a plurality of processors, usually, a first node processor is a data acquisition processor and used for acquiring data, a tail node processor is a storage processor and used for storing the processed data, intermediate processors are arranged before the first node processor and the tail node processor, and the intermediate processors perform data cleaning, conversion matching and the like.
in the above mode, each processor of the NIFI task link is connected with a log processor, the log processor is divided into a task log processor and a step log processor, the task log processor updates task data (including total data volume of tasks, total data volume of successful tasks and the like) in real time, and inserts or updates data into a task log table in the database at regular time along with related metadata of the tasks, and meanwhile, the content of the stream file which fails to be processed is sent to the kafka distributed cluster, the monitoring platform acquires information from the kafka distributed cluster through the monitor and generates a local log file, the monitoring platform reads and displays data from the database and the local log file, the displayed data comprises task data and step data, and a user can conveniently monitor the execution condition of each task and check the problem data so as to perform subsequent processing on the problem data of the failed task stream file.
Preferably, the task log table further stores a total data volume of task failures, wherein the total data volume of task failures is a total data volume of overall processing failures of all processors in the NIFI task chain;
the total data amount of the task failure is the total data amount of the task-the total data amount of the task success.
Preferably, the task log table further stores a present task data volume, a present success data volume, and a present failure data volume, where the present task data volume is a total data volume of all tasks within a present time limit, the present task data volume is a total data volume of all tasks within the present time limit, and the present failure data volume is a total data volume of all tasks within the present time limit.
Preferably, the task log processors are three in total, and are respectively:
the first task log processor is connected with a first node processor in an NIFI task chain in a total number relation and used for counting the data volume of the task in real time and storing the data volume of the task and the task metadata corresponding to the task into a task log table at regular time;
the second task log processor is connected with the tail node processor in the NIFI task chain in a success relation and used for counting the data volume successfully processed by the tail node processor in real time and storing the data volume successfully processed by the tail node processor in a task log table at regular time, wherein the data volume successfully processed by the tail node processor is the data volume successfully processed by all processors in the NIFI task chain;
And the third task log processor is connected with the tail node processor in the NIFI task chain in a failure relation and is used for counting the data volume of the failure processing of the tail node processor in real time.
Preferably, the step log processor is connected with each intermediate processor in the NIFI task chain in a success relationship and a failure relationship respectively, and is used for counting the data volume of successful processing and the data volume of failed processing of each intermediate processor in real time.
Preferably, a stateManager and a timer are configured in the log processor;
the log processor acquires the stream file from the connected queue, counts the data volume of the stream file according to the relation corresponding to the connection, and stores the data volume as the state information of the processor to a stateManager;
The log processor acquires the current stream file, counts the data volume of the current stream file according to the corresponding connection relation, acquires the total amount of historical data from the stateManager, adds the total amount of the historical data with the data volume of the current stream file to obtain the total amount of new data, and updates the total amount of the new data to the stateManager;
and the log processor sends the total data amount in the stateManager and the task metadata corresponding to the task to a corresponding log table in the database at regular time through a timer.
the stateManager provides the processor with the function of storing and retrieving some state information, which is a set of custom key-value pairs, and is available through ProcessContext.
preferably, the log processor utilizes an @ OnUnschedulled annotation mechanism of a processor in the NIFI task chain, and when the processor in the NIFI task chain receives a stop command, the log processor stores the total amount of data in the stateManager into a corresponding log table in the database.
In both the task log processor and the step log processor, the statistical result is stored in the stateManager and is stored in the log table at regular time, and when the task is normally stopped or abnormally interrupted at a certain moment, the latest data volume in the stateManager may not be updated in the database log table. In contrast, in the above embodiment, the log processor uses the @ onanscheld annotation mechanism of the NIFI processor, and when the processor receives the stop command, the log processor performs to store the latest result in the stateManager in a database, so as to ensure the accuracy of the statistical data.
Preferably, the log processor reads the attribute information of the stream file and acquires metadata, and sends the total amount of data in the stateManager and the key value to the task log table at regular time through a timer by taking the ID of an upstream processor in the metadata and the connection relation with the upstream processor as the key value;
The upstream processor is a processor connected with the step log processor in an NIFI task link;
The connection relation with the upstream processor is the corresponding relation of the connection of the step log processor and the upstream processor thereof.
preferably, the monitoring interface is a two-stage interface, including:
The task monitoring interface is used for reading and displaying a task log table, and supporting log checking, log deleting and failure state resetting, wherein the failure state resetting is to cancel the task failure state and continue monitoring after problem data in a stream file for processing the failure task is completely processed;
and the step monitoring interface is used for reading and displaying the step log table, and the task metadata comprises a processing field, a processing rule and processing time of each intermediate processor in the NIFI task chain.
In a second aspect, the present invention provides a distributed monitoring method for NIFI tasks, comprising the following steps:
Constructing a distributed monitoring system for NIFI tasks according to any one of the first aspect, and configuring a log processor based on a NIFI task chain;
performing log statistics on the NIFI task from two aspects of task and step through a log processor, and respectively storing the task log statistics into a task log table and the step log statistics into a step log table;
displaying the task log table and the step log table respectively through monitoring interface display to monitor tasks and steps;
the task log statistics include task data and task metadata, and the step log statistics include step data and task metadata.
The distributed monitoring system and method for NIFI tasks have the following advantages that:
1. the task log processor is matched with the step log processor, so that distributed monitoring and data summarization of the NIFI cluster multitask logs are realized, two-stage monitoring of tasks and steps is realized, and the problem is conveniently and quickly positioned;
2. The log statistical data are visually displayed through two stages of monitoring pages for tasks and steps, so that accurate monitoring and maintenance are facilitated;
3. The log processor utilizes the @ OnUnschedulled annotation mechanism of the NIFI processor, and when the processor receives a stop command, the log processor stores the latest result in the stateManager in a warehouse, so that the accuracy of statistical data is ensured.
drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
the invention is further described below with reference to the accompanying drawings.
FIG. 1 is a schematic structural diagram of a distributed monitoring system for NIFI task in embodiment 1;
FIG. 2 is a schematic diagram showing a task log processor configuration in a distributed monitoring system for NIFI tasks according to embodiment 1;
FIG. 3 is a schematic diagram of a step log processor configuration in a distributed monitoring system for NIFI task in embodiment 1.
Detailed Description
The present invention is further described in the following with reference to the drawings and the specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not to be construed as limiting the present invention, and the embodiments and the technical features of the embodiments can be combined with each other without conflict.
It is to be understood that the terms first, second, and the like in the description of the embodiments of the invention are used for distinguishing between the descriptions and not necessarily for describing a sequential or chronological order. The "plurality" in the embodiment of the present invention means two or more.
The embodiment of the invention provides a distributed monitoring system and a distributed monitoring method for NIFI tasks, which are used for solving the technical problem of how to perform log statistics and monitoring on NIFI from two aspects of tasks and steps.
Example 1:
the invention discloses a distributed monitoring system for NIFI tasks, which is used for carrying out log statistics and monitoring on the NIFI tasks from the aspects of tasks and steps.
as shown in fig. 1, a log table, specifically, a task log table and a step log table, is configured in the relational database. In this embodiment, the relational database is oracle.
The log processor is a component, the log processor is connected with the processors in the NIFI task chain, the log processor is configured through the canvas, and the corresponding relation of the connection between the log processor and the processors in the NIFI task chain is appointed. The log processor is internally provided with a stateManager and a timer, the stateManager provides the processor with the functions of storing and acquiring certain state information, the state information is a set of self-defined key value pairs, and the stateManager can be acquired through the ProcessContext. The working principle of the log processor is as follows: acquiring the stream file from the connected queue, counting the data volume of the stream file according to the relation corresponding to the connection, and saving the data volume as the state information of the processor to a stateManager; the log processor acquires the current stream file, counts the data volume of the current stream file according to the corresponding connection relation, acquires the total amount of historical data from the stateManager, adds the total amount of the historical data with the data volume of the current stream file to obtain the total amount of new data, and updates the total amount of the new data to the stateManager; and the log processor sends the total data amount in the stateManager and the task metadata corresponding to the task to a corresponding log table in the database at regular time through a timer.
the log processor in this embodiment includes a task log processor and a step log processor. The number of the task log processors is three, and the three task log processors are respectively a first task log processor, a second task log processor and a third task log processor.
the head node processor is a data acquisition processor and is the source of all stream files and data. The first task log processor is connected with a first node processor in an NIFI task chain in a total number relation, all generated stream files are sent to the first task log processor through the total number relation, the first task log processor sequentially acquires each stream file from a connected queue, counts the data volume in the stream file, stores the stream file as state information of the processor in a stateManager, then acquires a next stream file, acquires the total amount of historical data from the stateManager, adds the total amount of the historical data with the data volume of the current stream file, and updates the latest data total amount into the stateManager, so that the first task log processor can acquire the total data volume in a task, and the first task log processor can realize real-time task data volume counting and regularly store the task data volume and task metadata corresponding to the task into a task log table.
the tail node processor is used as the end point of the NIFI task chain, is a storage processor and is the final result of the stream file after being processed by each intermediate processor. A second task log processor is connected to the tail node processor in a success relationship and a third task log processor is connected to the tail node processor in a failure relationship. The statistical manner of the second task log processor and the third task log processor is the same as that of the first task log processor, the data counted by the second task log processor represents the data amount successfully processed by all the processors in the NIFI task chain, and the data counted by the third task log processor represents only the data amount failed to be processed by the tail node processor (namely, the data amount of the problem data).
And the data communication task metadata collected by the first task log processor, the second task log processor and the third task log processor are all stored in a task database table.
simultaneously, for the better monitoring task of being convenient for, task log table still stores the total data bulk of task failure, and the total data bulk of task failure is the total data bulk of all treater overall processing failures in the NIFI task chain, and the computational formula is:
the total data amount of the task failure is the total data amount of the task-the total data amount of the task success.
In order to better monitor the current task, the task log table further stores a present task data volume, a present success data volume and a present failure data volume, wherein the present task data volume is the total data volume of all tasks within the present time limit, and the present failure data volume is the total data volume of all tasks within the present time limit.
the method comprises the steps that a step log processor is connected with each intermediate processor in an NIFI task chain according to a success relation and a failure relation respectively, the step log processor reads stream file attribute information and acquires metadata, the upstream processor ID in the metadata and the connection relation with the upstream processor are key values, the counted stream file data volume received from the corresponding relation of the processors is stored and updated into a stateManager, the data volume successfully processed and/or the data volume failed to be processed by the NIFI task chain intermediate processors are uncoupled, the counted accumulated value is the real success and failure data volume of each processor, and then the total data volume and the key values in the stateManager are inserted and updated into a step task log table at regular time through a timer.
wherein, the upstream processor is a processor connected with the log processor in the step in the NIFI task link; the connection relation with the upstream processor is the corresponding relation of the connection of the step log processor and the upstream processor thereof.
in both the task log processor and the step log processor, the statistical result is stored in the stateManager and is stored in the database table at regular time, and when the task is normally stopped or abnormally interrupted at a certain moment, the latest result in the stateManager may not be updated into the database table. In contrast, the log processor designed by the invention utilizes the @ OnUnschedulled annotation mechanism of the NIFI processor to store the latest result in the stateManager when the processor receives a stop command, so as to ensure the accuracy of statistical data.
The monitoring platform is used for reading the log table and displaying the log table through the monitoring interface. In this embodiment, the monitoring interface is a two-level interface, and includes a task monitoring interface and a step monitoring interface.
The task monitoring interface is used for reading and displaying the task log table, and supporting log checking, log deleting and failure state resetting, wherein the failure state resetting is to cancel the task failure state and continue monitoring after problem data in a stream file for processing the failure task is completely processed.
the step monitoring interface is used for reading and displaying the step log table, and the task metadata comprises processing fields, processing rules and processing time of each intermediate processor in the NIFI task chain.
the monitoring interface is set by a conventional monitoring interface setting method, and is not described in detail herein.
the distributed monitoring system for the NIFI task can perform log statistics and monitoring on the NIFI task from two aspects of task and step.
The NIFI task extracts data from one oracle table, adds certain attributes to the stream file through a two-stage AddStaticValue processor, and synchronizes the processed data to another table after the Chinese Convert processor performs complex and simplified conversion on the data content. The ExtendedExecuteSQL is used as a source node and is responsible for extracting data to generate a stream file, the iPudJDBC processor is a link end point and is connected with task log processing, and the intermediate processor is connected with the step log processor.
In this embodiment, the NIFI task chain includes a data acquisition processor, a cleaning processor, a conversion processor, a matching processor, and a storage processor, and is a distributed monitoring system for the above NIFI task. No matter how many processors in the NIFI task chain are formed, only four log processors are needed to carry out comprehensive monitoring statistics. The parameter configuration of the task log processor and the step log processor is shown in fig. 2 and fig. 3, in the configuration parameters, task ID represents task name; the schema name represents a user name; the table name represents the name of the log table; the interval time represents the data timing warehousing period; relationship selects the task log processor processing relationships, divided into "total", "success", and "failure"; bootstrap servers represent addresses IP, and a krb5.path table, a keytab. path, and a principal cooperate to validate and configure the kafka distributed cluster.
And after the configuration is finished, the log processor performs log statistics and stores the related data into a database log table.
the user can check the tasks and the steps at two levels through the monitoring interface.
The monitoring statistics of all tasks and the detailed statistics of all tasks can be checked through the task monitoring interface, the total data volume of the tasks, the total data volume of the task success, the total data volume of the task failure, the data volume of the task at present and the data volume of the failure at present can be displayed, and a user can monitor the execution condition of each task through the task monitoring interface and check a stream file for processing the failed task to check problem data so as to perform subsequent processing aiming at the problem data of the tasks.
The processing name, the processing field and the processing rule of each intermediate processor in the task, the processing success data volume and the processing failure data volume can be displayed through the step monitoring interface, so that a user can clearly view the composition steps of the task chain, the processing action of each step and the statistics of the execution condition.
Example 2:
the invention discloses a distributed monitoring method for NIFI tasks, which comprises the following steps:
S100, constructing a distributed monitoring system for NIFI tasks disclosed in embodiment 1, and configuring a log processor based on an NIFI task chain;
s200, carrying out log statistics on the NIFI task from the aspects of the task and the step through a log processor, and respectively storing the task log statistics into a task log table and the step log statistics into a step log table;
S300, displaying the task log table and the step log table respectively through a monitoring interface to monitor the task and the step.
The task log statistics comprise task data and task metadata, and the step log statistics comprise step data and task metadata.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (10)

1. a distributed monitoring system for NIFI tasks, for performing log statistics and monitoring on NIFI tasks in terms of both tasks and steps, the distributed monitoring system comprising:
kafka distributed cluster;
the system comprises a relational database, a database management module and a database management module, wherein a log table is configured in the relational database and comprises a task log table and a step log table;
The log processor is a log component and comprises a task log processor and a step log processor; the task log processor is used for counting task data in real time, storing the task data and task metadata corresponding to the tasks to a task log table at regular time, and storing the contents of stream files for processing failed tasks to the kafka distributed cluster; the step log processor is used for counting step data corresponding to each intermediate processor executing task in an NIFI task link in real time and storing the step data and task metadata corresponding to the task into a step log table;
The monitoring platform is used for reading the log table and displaying the log table through a monitoring interface;
The task data comprises a total task data volume, a total task success data volume and a total task failure data volume, wherein the total task data volume is the data volume of all streaming files corresponding to the task, the total task success data volume is the data volume of the whole processing success of all processors in the NIFI task chain, and the task failure data volume is the data volume of the processing failure of a tail node processor in the NIFI task chain;
The step data comprises step success data volume and step failure data volume, wherein the step success data volume is data volume successfully processed by the intermediate processor, and the step failure data volume is data volume failed to be processed by the intermediate processor;
the intermediate processor is each processor in the NIFI task chain except for the first node processor and the last node processor.
2. The distributed monitoring system for NIFI tasks according to claim 1, wherein the task log table further stores total data amount of task failures, wherein the total data amount of task failures is total data amount of overall processing failures of all processors in the NIFI task chain;
The total data amount of the task failure is the total data amount of the task-the total data amount of the task success.
3. The distributed monitoring system for NIFI tasks of claim 1, wherein the task log table further stores today's task data volume, today's success data volume, and today's failure data volume, wherein the today's task data volume is a total task data volume of all tasks within a current time limit, the today's task data volume is a total task success data volume of all tasks within the current time limit, and the today's failure data volume is a total task failure data volume of all tasks within the current time limit.
4. a distributed monitoring system for NIFI tasks according to claim 1, 2 or 3, characterized in that the task log processors are three in number, respectively:
the first task log processor is connected with a first node processor in an NIFI task chain in a total number relation and used for counting the data volume of the task in real time and storing the data volume of the task and the task metadata corresponding to the task into a task log table at regular time;
the second task log processor is connected with the tail node processor in the NIFI task chain in a success relation and used for counting the data volume successfully processed by the tail node processor in real time and storing the data volume successfully processed by the tail node processor in a task log table at regular time, wherein the data volume successfully processed by the tail node processor is the data volume successfully processed by all processors in the NIFI task chain;
And the third task log processor is connected with the tail node processor in the NIFI task chain in a failure relation and is used for counting the data volume of the failure processing of the tail node processor in real time.
5. the distributed monitoring system for NIFI tasks as claimed in claim 4, wherein the log processor is connected with each intermediate processor in the NIFI task chain in success relationship and failure relationship respectively, and is used for counting the data amount of successful processing and the data amount of failed processing of each intermediate processor in real time.
6. The distributed monitoring system for NIFI tasks of claim 5, wherein a stateManager and a timer are configured in the log processor;
The log processor acquires the stream file from the connected queue, counts the data volume of the stream file according to the relation corresponding to the connection, and stores the data volume as the state information of the processor to a stateManager;
The log processor acquires the current stream file, counts the data volume of the current stream file according to the corresponding connection relation, acquires the total amount of historical data from the stateManager, adds the total amount of the historical data with the data volume of the current stream file to obtain the total amount of new data, and updates the total amount of the new data to the stateManager;
And the log processor sends the total data amount in the stateManager and the task metadata corresponding to the task to a corresponding log table in the database at regular time through a timer.
7. The distributed monitoring system for NIFI task as claimed in claim 6, wherein said log processor utilizes the @ OnUnschedulled annotation mechanism of processors in NIFI task chain, when a stop command is received by a processor in NIFI task chain, the log processor stores the total amount of data in stateManager to the corresponding log table in the database.
8. the distributed monitoring system for NIFI task as claimed in claim 6, wherein the log processor reads stream file attribute information and obtains metadata, and sends the total amount of data in the stateManager and the key value to the task log table by timer timing with the upstream processor ID in the metadata and the connection relationship with the upstream processor as the key value;
the upstream processor is a processor connected with the step log processor in an NIFI task link;
the connection relation with the upstream processor is the corresponding relation of the connection of the step log processor and the upstream processor thereof.
9. A distributed monitoring system for NIFI tasks according to claim 1, 2 or 3, characterized in that the monitoring interface is a two-level interface comprising:
the task monitoring interface is used for reading and displaying a task log table, and supporting log checking, log deleting and failure state resetting, wherein the failure state resetting is to cancel the task failure state and continue monitoring after problem data in a stream file for processing the failure task is completely processed;
and the step monitoring interface is used for reading and displaying the step log table, and the task metadata comprises a processing field, a processing rule and processing time of each intermediate processor in the NIFI task chain.
10. A distributed monitoring method for NIFI tasks is characterized by comprising the following steps:
Constructing a distributed monitoring system for NIFI tasks as claimed in any one of claims 1-9 and configuring a log processor based on a NIFI task chain;
Performing log statistics on the NIFI task from two aspects of task and step through a log processor, and respectively storing the task log statistics into a task log table and the step log statistics into a step log table;
Displaying the task log table and the step log table respectively through monitoring interface display to monitor tasks and steps;
The task log statistics include task data and task metadata, and the step log statistics include step data and task metadata.
CN201910874660.5A 2019-09-17 2019-09-17 Distributed monitoring system and method for NIFI task Active CN110569174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910874660.5A CN110569174B (en) 2019-09-17 2019-09-17 Distributed monitoring system and method for NIFI task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910874660.5A CN110569174B (en) 2019-09-17 2019-09-17 Distributed monitoring system and method for NIFI task

Publications (2)

Publication Number Publication Date
CN110569174A true CN110569174A (en) 2019-12-13
CN110569174B CN110569174B (en) 2023-05-12

Family

ID=68780432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910874660.5A Active CN110569174B (en) 2019-09-17 2019-09-17 Distributed monitoring system and method for NIFI task

Country Status (1)

Country Link
CN (1) CN110569174B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112187579A (en) * 2020-09-28 2021-01-05 中国建设银行股份有限公司 Auxiliary processing method, device and equipment for data transmission exception and readable storage medium
CN112380218A (en) * 2020-11-18 2021-02-19 浪潮天元通信信息系统有限公司 ETL-based automatic triggering method for summarizing data tables of data warehouse layers
CN114679487A (en) * 2022-03-25 2022-06-28 度小满科技(北京)有限公司 Link processing method, device, storage medium and processor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103058A (en) * 2017-04-13 2017-08-29 华南理工大学 Big data service combining method and composite service combined method based on Artifact
US20170276573A1 (en) * 2016-03-25 2017-09-28 Uptake Technologies, Inc. Computer Systems and Methods for Providing a Visualization of Asset Event and Signal Data
US20180069925A1 (en) * 2016-09-08 2018-03-08 BigStream Solutions, Inc. Systems and methods for automatic transferring of big data computations from centralized systems to at least one of messaging systems and data collection systems
CN109634652A (en) * 2018-11-28 2019-04-16 郑州云海信息技术有限公司 A kind of method, apparatus of data processing, computer storage medium and terminal
CN109753502A (en) * 2018-12-29 2019-05-14 山东浪潮商用系统有限公司 A kind of collecting method based on NiFi

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170276573A1 (en) * 2016-03-25 2017-09-28 Uptake Technologies, Inc. Computer Systems and Methods for Providing a Visualization of Asset Event and Signal Data
US20180069925A1 (en) * 2016-09-08 2018-03-08 BigStream Solutions, Inc. Systems and methods for automatic transferring of big data computations from centralized systems to at least one of messaging systems and data collection systems
CN107103058A (en) * 2017-04-13 2017-08-29 华南理工大学 Big data service combining method and composite service combined method based on Artifact
CN109634652A (en) * 2018-11-28 2019-04-16 郑州云海信息技术有限公司 A kind of method, apparatus of data processing, computer storage medium and terminal
CN109753502A (en) * 2018-12-29 2019-05-14 山东浪潮商用系统有限公司 A kind of collecting method based on NiFi

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈天乐等: "一种基于层次分割和聚合的大数据流水线任务处理方法", 《科研信息化技术与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112187579A (en) * 2020-09-28 2021-01-05 中国建设银行股份有限公司 Auxiliary processing method, device and equipment for data transmission exception and readable storage medium
CN112187579B (en) * 2020-09-28 2021-11-23 中国建设银行股份有限公司 Auxiliary processing method, device and equipment for data transmission exception and readable storage medium
CN112380218A (en) * 2020-11-18 2021-02-19 浪潮天元通信信息系统有限公司 ETL-based automatic triggering method for summarizing data tables of data warehouse layers
CN112380218B (en) * 2020-11-18 2023-03-28 浪潮通信信息系统有限公司 ETL-based automatic triggering method for summarizing data tables of data warehouse layers
CN114679487A (en) * 2022-03-25 2022-06-28 度小满科技(北京)有限公司 Link processing method, device, storage medium and processor
CN114679487B (en) * 2022-03-25 2023-12-22 度小满科技(北京)有限公司 Link processing method, device, storage medium and processor

Also Published As

Publication number Publication date
CN110569174B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN111339071B (en) Method and device for processing multi-source heterogeneous data
CN110569174B (en) Distributed monitoring system and method for NIFI task
US11847130B2 (en) Extract, transform, load monitoring platform
CN104036025A (en) Distribution-base mass log collection system
Xhafa et al. Processing and analytics of big data streams with yahoo! s4
CN107018042A (en) Method for tracing and tracing system for online service system
CN110060118A (en) A kind of order is honoured an agreement full link method for real-time monitoring, device and computer equipment
WO2023004881A1 (en) Smart agriculture aiot distributed big data storage platform
CN109977125A (en) A kind of big data safety analysis plateform system based on network security
US10331672B2 (en) Stream data processing method with time adjustment
CN108108466A (en) A kind of distributed system journal query analysis method and device
CN113486008A (en) Data blood margin analysis method, device, equipment and storage medium
CN112148578A (en) IT fault defect prediction method based on machine learning
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN108717875A (en) A kind of chronic disease intelligent management system based on big data
CN112559634A (en) Big data management system based on computer cloud computing
Raj et al. Big data analytics processes and platforms facilitating smart cities
CN113868248A (en) Index data pre-polymerization method
CN114356692A (en) Visual processing method and device for application monitoring link and storage medium
CN111984826B (en) XML-based data automatic warehousing method, system, device and storage medium
CN107357919A (en) User behaviors log inquiry system and method
CN116594982A (en) Flow number bin construction method based on rule engine and Clickhouse
CN112825165A (en) Project quality management method and device
CN115481111A (en) Data fusion method and device, computer equipment and storage medium
CN114860851A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230413

Address after: 250000 Langchao Science Park, No. 1036, Langchao Road, high tech Zone, Jinan, Shandong

Applicant after: Inspur Software Technology Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: SHANDONG INSPUR BUSINESS SYSTEM Co.,Ltd.

GR01 Patent grant
GR01 Patent grant