CN113342608A - Method and device for monitoring streaming computing engine task - Google Patents

Method and device for monitoring streaming computing engine task Download PDF

Info

Publication number
CN113342608A
CN113342608A CN202110639027.5A CN202110639027A CN113342608A CN 113342608 A CN113342608 A CN 113342608A CN 202110639027 A CN202110639027 A CN 202110639027A CN 113342608 A CN113342608 A CN 113342608A
Authority
CN
China
Prior art keywords
delay
task
record
flink
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110639027.5A
Other languages
Chinese (zh)
Other versions
CN113342608B (en
Inventor
刘伟
金磐石
杨晓勤
李世宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110639027.5A priority Critical patent/CN113342608B/en
Publication of CN113342608A publication Critical patent/CN113342608A/en
Application granted granted Critical
Publication of CN113342608B publication Critical patent/CN113342608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for monitoring a task of a stream computing engine, wherein the method comprises the following steps: periodically acquiring an application identifier of the Flink application according to the first time step; acquiring a task identifier list corresponding to the application identifier and at least having one task identifier; determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task; generating a delay record corresponding to the whole delay time, and writing the state of the delay record into a record file after marking the state of the delay record as an unread state; and according to the second time step, periodically calling the monitoring module to read the delay records with the unread state in the record file, judging whether delay records with abnormal delay exist in each read delay record, and if yes, sending a generated alarm instruction to the alarm module so that the alarm module gives an alarm. And monitoring the delay record so as to alarm the staff in time when the delay abnormity exists.

Description

Method and device for monitoring streaming computing engine task
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for monitoring a task of a stream type computing engine.
Background
With the rapid development of big data, various popular open source community technologies are beginning to be applied in the computer industry, such as Hadoop, Storm, Spark, Flink, and the like, where Flink is a distributed processing engine for stream data and batch data, and Flink is the only set of distributed stream data processing framework integrating high throughput, low latency, and high performance in the current open source community, and thus Flink becomes the mainstream choice of each user in the field of real-time computing.
The application constructed based on the Flink can be called a Flink application, and the Flink application executes a corresponding Flink task to realize the function of the Flink application when running, wherein the Flink task can also be called a streaming computing engine task. Although the Flink framework has various advantages, the method for alarming for task execution delay in the Flink framework is not perfect, so that the situation that the task is delayed for a long time is easy to occur when the application constructed by using the Flink framework runs, and the situation can not be solved in time.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for monitoring a stream computing engine task, which are used to monitor the stream computing engine task and give an alarm in time when a delay anomaly occurs, so that a worker can know the delay anomaly in time.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a method for monitoring tasks of a streaming computing engine comprises the following steps:
periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier;
determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
and according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, updating the state of each obtained delay record into a read state, analyzing each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, if the delay records with abnormal delay exist, generating an alarm instruction, and sending the alarm instruction to a preset alarm module so that the alarm module performs delay alarm.
The above method, optionally, further includes:
the monitoring module stores each acquired delay record to a preset data storage platform;
and calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
Optionally, the determining the overall delay time of the Flink application based on each Flink task includes:
determining a key task in each Flink task;
determining a task horizon time for the critical task and determining a current time for a node executing the critical task;
and determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the overall delay time of the Flink application.
Optionally, the determining the overall delay time of the Flink application based on each Flink task includes:
for each Flink task, determining a task horizontal line time of the Flink task and the current time of a node executing the Flink task, and calculating a second delay time of the Flink task based on the task horizontal line time and the current time;
and carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the whole delay time applied by the Flink.
The above method, optionally, the generating a delay record corresponding to the overall delay time includes:
collecting node information of a node executing the key task;
and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the whole delay time.
The above method, optionally, the generating a delay record corresponding to the overall delay time includes:
for each Flink task, determining node information of a node executing the Flink task;
and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template to generate a delay record corresponding to the whole delay time.
A device for monitoring tasks of a streaming computing engine, comprising:
the first obtaining unit is used for periodically obtaining an application identifier of the Flink application according to a preset first time step;
a second obtaining unit, configured to obtain a task identifier list corresponding to the application identifier, where the task identifier list includes at least one task identifier;
the determining unit is used for determining the Flink task corresponding to each task identifier and determining the whole delay time of the Flink application based on each Flink task;
the generating unit is used for generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state and writing the marked delay record into a preset record file;
and the alarm unit is used for periodically calling the monitoring module to read the record file according to a preset second time step so that the monitoring module acquires each delay record of which the state is an unread state in the record file, updates the state of each acquired delay record into a read state, analyzes each acquired delay record to judge whether a delay record with abnormal delay exists in each delay record, generates an alarm instruction if the delay record with abnormal delay exists, and sends the alarm instruction to a preset alarm module so that the alarm module carries out delay alarm.
The above apparatus, optionally, further comprises:
the storage unit is used for storing each acquired delay record to a preset data storage platform by the monitoring module;
and the calling unit is used for calling a preset visual component to process each delay record in the data storage platform, so that the visual component displays each delay record in the data storage platform.
The above apparatus, optionally, the determining unit includes:
the first determining subunit is used for determining a key task in each Flink task;
the second determining subunit is used for determining the task horizontal line time of the key task and determining the current time of the node executing the key task;
and the third determining subunit is configured to determine, based on the current time and the task horizontal line time, a first delay time of the key task, and use the first delay time as an overall delay time of the Flink application.
The above apparatus, optionally, the determining unit includes:
a fourth determining subunit, configured to determine, for each of the Flink tasks, a task horizontal line time of the Flink task and a current time of a node that executes the Flink task, and calculate, based on the task horizontal line time and the current time, a second delay time of the Flink task;
and the operation subunit is configured to perform weighted average operation on each second delay time to obtain an average delay time, and use the average delay time as the overall delay time applied by the Flink.
The above apparatus, optionally, the generating unit includes:
the collecting subunit is used for collecting the node information of the node executing the key task;
and the obtaining subunit is configured to fill the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the overall delay time.
The above apparatus, optionally, the generating unit includes:
a fifth determining subunit, configured to determine, for each of the Flink tasks, node information of a node that executes the Flink task;
and the generating subunit is configured to write the average delay time and the node information and the second delay time of each Flink task into a preset second record template, and generate a delay record corresponding to the overall delay time.
Compared with the prior art, the invention has the following advantages:
the invention provides a method and a device for monitoring a streaming engine task, wherein the method comprises the following steps: periodically acquiring an application identifier of the Flink application according to the first time step; acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier; determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task; generating a delay record corresponding to the whole delay time, and writing the state of the delay record into a record file after marking the state of the delay record as an unread state; and according to the second time step, periodically calling the monitoring module to read the delay records with the unread state in the record file, judging whether delay records with abnormal delay exist in each read delay record, and if yes, sending a generated alarm instruction to the alarm module so that the alarm module carries out delay alarm. And generating a delay record of the whole delay time of the Flink application, monitoring whether an abnormal delay record exists in the delay record, and if so, giving an alarm to a worker in time so that the worker can solve the problem of delay abnormality in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for monitoring a task of a streaming computing engine according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for monitoring a task of a streaming computing engine according to an embodiment of the present invention;
fig. 3 is a flowchart of another method of monitoring a task of a streaming computing engine according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a monitoring apparatus for a task of a streaming computing engine according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In order to solve the problem that an application constructed by using a Flink framework is prone to have a task delayed for a long time and cannot be solved in time when running, the invention provides a method for monitoring a streaming computation engine task, the method can be applied to a monitoring platform, an execution subject of the method can be a processor or a server in the monitoring platform, and referring to fig. 1, a method flowchart of the method for monitoring the streaming computation engine task provided by the embodiment of the invention is described as follows:
s101, periodically obtaining an application identifier of the Flink application according to a preset first time step.
In the method provided by the embodiment of the invention, the monitoring platform periodically obtains the application identifier of the Flink application according to a preset first time step, wherein the first time step can be set based on actual requirements, and specifically, if the first time step is 2 minutes, the application identifier of the Flink application is obtained every 2 minutes; further, the application identifier is the unique identity identifier of the Flink application, and the Flink application is a Flink computing application program running on the yann cluster.
S102, a task identification list corresponding to the application identification is obtained, and the task identification list comprises at least one task identification.
Based on the application identifier, a task identifier list of the Flink application is obtained, the task identifier list has a list identifier, and the list identifier is an identification identifier of the task identifier list, wherein the list identifier is associated with the application identifier of the Flink application, and the task identifier list corresponding to the list identifier consistent with the application identifier can be used as the task identifier list of the Flink application.
The task identification list comprises at least one task identification, wherein the task identification is the unique identity identification of the Flink task of the Flink application.
S103, determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task.
Determining a Flink task corresponding to each task identifier, wherein the Flink task is a task or a process which needs to be executed when the Flink application runs; based on each Flink task, determining the overall delay time of the Flink application, where the overall delay time is a time length, specifically, 5 minutes, 30 seconds, and the like, and the overall delay time may be used to characterize the delay condition of the Flink application in the execution process, and the overall delay time may be used to subsequently determine whether the delay occurring in the execution process of the Flink application is abnormal.
And S104, generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file.
And generating a corresponding delay record for the whole delay time, wherein the delay record is used for recording the running condition of the Flink application in the current first time step, marking the delay record in an unread state, and writing the marked delay record into a preset record file, wherein the marking of the delay record in the unread state is used for indicating that the delay record is not read yet.
The recording file is used for storing a delay record for recording the running condition of the Flink application, and the recording file in the invention is explained, the delay record in the recording file can be only a delay record in an unread state, that is, the delay record in a state that is not in the unread state is not included in the recording file; alternatively, a deferred record of multiple states, such as an unread state and a read state, may be stored in the record file.
And S105, periodically calling a monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, and updating the state of each obtained delay record into a read state.
In the method provided by the invention, the monitoring platform can also periodically call the monitoring module to read the record file according to a preset second time step so as to enable the monitoring module to regularly acquire the delay record of which the state is the unread state in the record file, wherein the second time step can be set according to the actual requirement, and preferably, the second time step is greater than or equal to the first time step. And after the monitoring module acquires the delay records with the unread states in the record file, updating the acquired states of the delay records in the record file into the read states.
S106, judging whether delay records with delay abnormality exist in the acquired delay records, and executing S107 if the delay records with delay abnormality exist; if there is no delay record of the delay exception, S108 is executed.
Analyzing each acquired delay record to determine the integral delay time in each delay record; when judging whether the delay records with abnormal delay exist in the obtained delay records, the specific process is as follows:
comparing the whole delay time in each delay record with preset delay time, and judging whether the whole delay time is larger than or equal to the preset delay time or not;
if the integral delay time which is greater than or equal to the preset delay time exists in each integral delay time, determining delay records with delay abnormality in each delay record, and determining the delay record to which the integral delay time which is greater than or equal to the preset delay time belongs as the delay record with delay abnormality; at this time, it can be determined that there is a delay exception for the Flink application;
if the integral delay time which is greater than or equal to the preset delay time does not exist in all the integral delay times, determining that delay records which are not delayed abnormally do not exist in all the delay records, and determining all the delay records as delay records which are delayed normally; at this point it may be determined that the Flink application is not present with a delay exception.
The preset delay time in the present invention is a time length, for example, 2 minutes or 1 minute, and the preset delay time can be set according to actual requirements.
And S107, generating an alarm instruction, and sending the alarm instruction to a preset alarm module, so that the alarm module carries out delayed alarm.
When the delay record with the delay abnormality exists in each delay record, generating an alarm instruction corresponding to the delay record with the delay abnormality, wherein the alarm instruction comprises information of the delay record with the delay abnormality, and sending the alarm instruction to an alarm module to enable the alarm module to carry out delay alarm, wherein the alarm module carries out alarm in various ways, such as sending an alarm short message, an alarm mail or sending an alarm sound. When the alarm module carries out delay alarm, delay abnormity alarm is actually carried out, so that a worker is timely informed that the delay abnormity occurs in the Flink application, and the problem of delay abnormity of the Flink application is timely solved by the worker.
And S108, generating a monitoring normal record, and storing the monitoring normal record.
And generating a monitoring normal record based on the record information of each delay normal record, and storing the monitoring normal record for the subsequent inspection.
In the method provided by the embodiment of the present invention, the monitoring module may store each acquired delay record in a preset data storage platform, where the data storage platform includes, but is not limited to, a database and an ES (Elastic Search, an open source distributed Search and data analysis engine), and the ES is used to store, retrieve, and analyze data.
The monitoring module can call a preset visual component to process each delay record in the data storage platform after the delay records are stored in the data storage platform, so that the visual component can display each delay record in the data storage platform, wherein when the visual component displays the delay records, the delay time in each delay record can be drawn according to a time dimension, and then the whole delay condition is displayed.
In the method provided by the embodiment of the invention, the application identifier of the Flink application is periodically acquired according to the first time step; acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier; determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task; generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file; and according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, updating the state of each obtained delay record into a read state, analyzing each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, if the delay records with abnormal delay exist, generating an alarm instruction, and sending the alarm instruction to a preset alarm module so that the alarm module performs delay alarm. The method comprises the steps of determining the whole delay time of the Flink application based on each Flink task of the Flink application, monitoring delay records corresponding to the whole delay time by using a monitoring module, and triggering an alarm module to give an alarm in time when the delay records with abnormal delay exist so that a worker can find problems in time.
Referring to fig. 2, a flowchart of a method for determining an overall delay time of a Flink application based on each Flink task according to an embodiment of the present invention is specifically described as follows:
s201, determining key tasks in the Flink tasks.
In the method provided by the embodiment of the invention, a key task is determined in each flight task, wherein the key task is the last flight task executed in each flight task, and the execution time of each flight task can be obtained first when the key task is determined, wherein the execution time is the time for the flight task to be executed by the corresponding Yarn cluster node; and determining the latest execution time, and determining the Flink task corresponding to the latest execution time as a key task.
S202, determining the task horizontal line time of the key task, and determining the current time of a node executing the key task.
Acquiring task horizontal line time of a key task, wherein the task horizontal line time is a timestamp; and determining the current time of the node executing the key task, wherein the current time of the node is the system time of the node.
S203, determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the whole delay time of the Flink application.
Calculating a first delay time of a key task based on the current time and the task horizontal line time, and taking the first delay time as the whole delay time of the Flink application, specifically, when calculating the first delay time of the key task, subtracting the task horizontal line time from the current time to obtain the first delay time, wherein the first delay time in the invention is a time length, such as 1 minute, 2 minutes or 52 seconds; the first delay time may be a negative number, and if the first delay time is a negative number, it indicates that the critical task is not delayed when being executed by the node; if the first delay time is a positive number, it indicates that the critical task is delayed when being executed by the node.
In the method provided by the embodiment of the invention, the whole delay time of the Flink application is determined based on the task time line of the key task and the current time for executing the key task, so that the delay state of the Flink application in operation can be accurately obtained, and further whether the Flink application has abnormal delay or not can be determined through the whole delay time.
After the overall delay time of the Flink application is determined, a delay record corresponding to the overall delay time needs to be generated, and the specific method is as follows: collecting node information of a node executing the key task; and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the whole delay time. The first recording module in the invention is specifically as follows: < sequence number > < node information > < first delay time >; the sequence number in the record template is a sequence number allocated to the delay record when the delay record is generated, and the sequence number has uniqueness. The first record template can be used for rapidly generating the delay record corresponding to the whole delay time, so that the efficiency of monitoring the Flink application is improved, and the process of monitoring the Flink application is simplified.
Referring to fig. 3, a flowchart of another method for determining an overall delay time of a Flink application function based on each Flink task according to an embodiment of the present invention is specifically described as follows:
s301, for each Flink task, determining the task horizontal line time of the Flink task and the current time of a node executing the Flink task, and calculating the second delay time of the Flink task based on the task horizontal line time and the current time.
The description of the second delay time in the present invention can refer to the related description of the first delay time in fig. 2, the calculation process of the second delay time is the same as the calculation process of the first delay time, and the description of the second delay time is not repeated here.
S302, carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the whole delay time applied by the Flink.
One of the ways of performing the weighted average operation on each second delay time may be: and summing the second delay times, dividing the obtained time value by the number of the second delay times to obtain an average delay time, and taking the average delay time as the overall delay time of the Flink application, wherein the average delay time is a time length, specifically 1 minute, 2 minutes or 40 seconds and the like.
The weighted average operation is carried out on the second delay time of each Flink task, and the obtained average delay time is used as the whole delay time of the Flink application, so that the whole delay time of the Flink application is more representative and universal, and the whole delay time of the Flink application is more representative.
After the overall delay time of the Flink application is determined, a delay record corresponding to the overall delay time needs to be generated, and the invention provides a mode for generating the delay record corresponding to the overall delay time based on each Flink task, which comprises the following specific processes: for each Flink task, determining node information of a node executing the Flink task; and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template to generate a delay record corresponding to the whole delay time.
And acquiring node information of a node corresponding to each flight task, and writing the average delay time, the node information of each flight task and the second delay time into a preset second recording template so as to obtain a delay record corresponding to the whole delay time.
The second recording template may specifically be:
< delay record number > < average delay time >;
< task number > < node information > < second delay time >;
further, the < deferred record number > in the second record template is used for recording the number of the deferred record, which is the number assigned to the deferred record when the deferred record is generated, and the number has uniqueness; the < task sequence > < node information > < second delay time > is used for recording a Flink task, wherein the task sequence can be the execution sequence of the Flink task or the task sequence of the Flink task, the sequence is unique to the Flink task, and a plurality of < task sequence > < node information > < second delay time > are included in the delay record generated by using the second record template.
In the method provided by the embodiment of the invention, the data recorded by the delay record generated by using the second recording template is more detailed, so that reliable and accurate data support is provided for the subsequent monitoring of the Flink application, and the monitoring of the Flink application is more accurate.
Corresponding to fig. 1, an embodiment of the present invention further provides a monitoring apparatus for a task of a streaming computing engine, where the apparatus is applied to a monitoring platform, and is used to support an application of the monitoring method for a task of a streaming computing engine provided in the embodiment of the present invention in reality. The schematic structural diagram of the device provided by the embodiment of the invention is shown in fig. 4, and the following is specifically explained:
a first obtaining unit 401, configured to periodically obtain an application identifier of the Flink application according to a preset first time step;
a second obtaining unit 402, configured to obtain a task identifier list corresponding to the application identifier, where the task identifier list includes at least one task identifier;
a determining unit 403, configured to determine a Flink task corresponding to each task identifier, and determine an overall delay time of the Flink application based on each Flink task;
a generating unit 404, configured to generate a delay record corresponding to the overall delay time, mark a state of the delay record as an unread state, and write the marked delay record into a preset record file;
and an alarm unit 405, configured to periodically call a monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record in the record file, the state of each obtained delay record is an unread state, the state of each obtained delay record is updated to a read state, and each obtained delay record is analyzed to determine whether a delay record with an abnormal delay exists in each delay record, if a delay record with an abnormal delay exists, an alarm instruction is generated, and the alarm instruction is sent to a preset alarm module, so that the alarm module performs a delayed alarm.
In the device provided by the embodiment of the invention, the application identifier of the Flink application is periodically acquired according to the first time step; acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier; determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task; generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file; and according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, updating the state of each obtained delay record into a read state, analyzing each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, if the delay records with abnormal delay exist, generating an alarm instruction, and sending the alarm instruction to a preset alarm module so that the alarm module performs delay alarm. The method comprises the steps of determining the whole delay time of the Flink application based on each Flink task of the Flink application, monitoring delay records corresponding to the whole delay time by using a monitoring module, and triggering an alarm module to give an alarm in time when the delay records with abnormal delay exist, so that a worker can find and solve the problem that the delay abnormality occurs in the Flink application in time.
In the apparatus provided in the embodiment of the present invention, the apparatus further includes:
the storage unit is used for storing each acquired delay record to a preset data storage platform by the monitoring module;
and the calling unit is used for calling a preset visual component to process each delay record in the data storage platform, so that the visual component displays each delay record in the data storage platform.
In the apparatus provided in the embodiment of the present invention, the determining unit 403 may be configured to:
the first determining subunit is used for determining a key task in each Flink task;
the second determining subunit is used for determining the task horizontal line time of the key task and determining the current time of the node executing the key task;
and the third determining subunit is configured to determine, based on the current time and the task horizontal line time, a first delay time of the key task, and use the first delay time as an overall delay time of the Flink application.
In the apparatus provided in the embodiment of the present invention, the determining unit 403 may be configured to:
a fourth determining subunit, configured to determine, for each of the Flink tasks, a task horizontal line time of the Flink task and a current time of a node that executes the Flink task, and calculate, based on the task horizontal line time and the current time, a second delay time of the Flink task;
and the operation subunit is configured to perform weighted average operation on each second delay time to obtain an average delay time, and use the average delay time as the overall delay time applied by the Flink.
In the apparatus provided in the embodiment of the present invention, the generating unit 404 may be configured to:
the collecting subunit is used for collecting the node information of the node executing the key task;
and the obtaining subunit is configured to fill the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the overall delay time.
In the apparatus provided in the embodiment of the present invention, the generating unit 404 may be configured to:
a fifth determining subunit, configured to determine, for each of the Flink tasks, node information of a node that executes the Flink task;
and the generating subunit is configured to write the average delay time and the node information and the second delay time of each Flink task into a preset second record template, and generate a delay record corresponding to the overall delay time.
The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the monitoring method of the streaming computing engine task.
An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 5, which specifically includes a memory 501 and one or more instructions 502, where the one or more instructions 502 are stored in the memory 501, and are configured to be executed by one or more processors 503 to perform the following operations according to the one or more instructions 502:
periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier;
determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
and according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, updating the state of each obtained delay record into a read state, analyzing each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, if the delay records with abnormal delay exist, generating an alarm instruction, and sending the alarm instruction to a preset alarm module so that the alarm module performs delay alarm.
The specific implementation procedures and derivatives thereof of the above embodiments are within the scope of the present invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for monitoring tasks of a streaming computing engine is characterized by comprising the following steps:
periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identifier list corresponding to the application identifier, wherein the task identifier list comprises at least one task identifier;
determining a Flink task corresponding to each task identifier, and determining the whole delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the whole delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
and according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record of which the state is an unread state in the record file, updating the state of each obtained delay record into a read state, analyzing each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, if the delay records with abnormal delay exist, generating an alarm instruction, and sending the alarm instruction to a preset alarm module so that the alarm module performs delay alarm.
2. The method of claim 1, further comprising:
the monitoring module stores each acquired delay record to a preset data storage platform;
and calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
3. The method according to claim 1, wherein said determining an overall delay time of said Flink application based on each of said Flink tasks comprises:
determining a key task in each Flink task;
determining a task horizon time for the critical task and determining a current time for a node executing the critical task;
and determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the overall delay time of the Flink application.
4. The method according to claim 1, wherein said determining an overall delay time of said Flink application based on each of said Flink tasks comprises:
for each Flink task, determining a task horizontal line time of the Flink task and the current time of a node executing the Flink task, and calculating a second delay time of the Flink task based on the task horizontal line time and the current time;
and carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the whole delay time applied by the Flink.
5. The method of claim 3, wherein generating the delay record corresponding to the overall delay time comprises:
collecting node information of a node executing the key task;
and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the whole delay time.
6. The method of claim 4, wherein generating the delay record corresponding to the overall delay time comprises:
for each Flink task, determining node information of a node executing the Flink task;
and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template to generate a delay record corresponding to the whole delay time.
7. An apparatus for monitoring tasks of a streaming computing engine, comprising:
the first obtaining unit is used for periodically obtaining an application identifier of the Flink application according to a preset first time step;
a second obtaining unit, configured to obtain a task identifier list corresponding to the application identifier, where the task identifier list includes at least one task identifier;
the determining unit is used for determining the Flink task corresponding to each task identifier and determining the whole delay time of the Flink application based on each Flink task;
the generating unit is used for generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state and writing the marked delay record into a preset record file;
and the alarm unit is used for periodically calling the monitoring module to read the record file according to a preset second time step so that the monitoring module acquires each delay record of which the state is an unread state in the record file, updates the state of each acquired delay record into a read state, analyzes each acquired delay record to judge whether a delay record with abnormal delay exists in each delay record, generates an alarm instruction if the delay record with abnormal delay exists, and sends the alarm instruction to a preset alarm module so that the alarm module carries out delay alarm.
8. The apparatus of claim 7, further comprising:
the storage unit is used for storing each acquired delay record to a preset data storage platform by the monitoring module;
and the calling unit is used for calling a preset visual component to process each delay record in the data storage platform, so that the visual component displays each delay record in the data storage platform.
9. The apparatus of claim 7, wherein the determining unit comprises:
the first determining subunit is used for determining a key task in each Flink task;
the second determining subunit is used for determining the task horizontal line time of the key task and determining the current time of the node executing the key task;
and the third determining subunit is configured to determine, based on the current time and the task horizontal line time, a first delay time of the key task, and use the first delay time as an overall delay time of the Flink application.
10. The apparatus of claim 7, wherein the determining unit comprises:
a fourth determining subunit, configured to determine, for each of the Flink tasks, a task horizontal line time of the Flink task and a current time of a node that executes the Flink task, and calculate, based on the task horizontal line time and the current time, a second delay time of the Flink task;
and the operation subunit is configured to perform weighted average operation on each second delay time to obtain an average delay time, and use the average delay time as the overall delay time applied by the Flink.
CN202110639027.5A 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine Active CN113342608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110639027.5A CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110639027.5A CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Publications (2)

Publication Number Publication Date
CN113342608A true CN113342608A (en) 2021-09-03
CN113342608B CN113342608B (en) 2024-06-21

Family

ID=77475406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110639027.5A Active CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Country Status (1)

Country Link
CN (1) CN113342608B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115328974A (en) * 2022-10-12 2022-11-11 南斗六星系统集成有限公司 Data real-time detection method, device, equipment and readable storage medium
CN117408595A (en) * 2023-12-11 2024-01-16 上海文景信息科技有限公司 Block chain-based multi-mode intermodal whole-course quality control method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130081001A1 (en) * 2011-09-23 2013-03-28 Microsoft Corporation Immediate delay tracker tool
CN109766198A (en) * 2018-12-28 2019-05-17 深圳前海微众银行股份有限公司 Stream Processing method, apparatus, equipment and computer readable storage medium
CN110532152A (en) * 2019-08-05 2019-12-03 北明云智(武汉)网软有限公司 A kind of monitoring alarm processing method and system based on Kapacitor computing engines
CN111522719A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 Method and device for monitoring big data task state
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium
CN112767080A (en) * 2021-01-19 2021-05-07 上海微盟企业发展有限公司 Alarming method, device and medium based on stream type calculation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130081001A1 (en) * 2011-09-23 2013-03-28 Microsoft Corporation Immediate delay tracker tool
CN109766198A (en) * 2018-12-28 2019-05-17 深圳前海微众银行股份有限公司 Stream Processing method, apparatus, equipment and computer readable storage medium
CN110532152A (en) * 2019-08-05 2019-12-03 北明云智(武汉)网软有限公司 A kind of monitoring alarm processing method and system based on Kapacitor computing engines
CN111522719A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 Method and device for monitoring big data task state
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium
CN112767080A (en) * 2021-01-19 2021-05-07 上海微盟企业发展有限公司 Alarming method, device and medium based on stream type calculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李梓杨;于炯;卞琛;鲁亮;蒲勇霖;: "基于流网络的流式计算动态任务调度策略", 计算机应用, no. 09, 19 March 2018 (2018-03-19) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115328974A (en) * 2022-10-12 2022-11-11 南斗六星系统集成有限公司 Data real-time detection method, device, equipment and readable storage medium
CN115328974B (en) * 2022-10-12 2022-12-13 南斗六星系统集成有限公司 Data real-time detection method, device, equipment and readable storage medium
CN117408595A (en) * 2023-12-11 2024-01-16 上海文景信息科技有限公司 Block chain-based multi-mode intermodal whole-course quality control method and system
CN117408595B (en) * 2023-12-11 2024-04-30 上海文景信息科技有限公司 Block chain-based multi-mode intermodal whole-course quality control method and system

Also Published As

Publication number Publication date
CN113342608B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN107871190B (en) Service index monitoring method and device
US7409316B1 (en) Method for performance monitoring and modeling
CN110888783B (en) Method and device for monitoring micro-service system and electronic equipment
US20200183946A1 (en) Anomaly Detection in Big Data Time Series Analysis
US7082381B1 (en) Method for performance monitoring and modeling
CN111294217B (en) Alarm analysis method, device, system and storage medium
US9038030B2 (en) Methods for predicting one or more defects in a computer program and devices thereof
CN110309130A (en) A kind of method and device for host performance monitor
CN106940677A (en) One kind application daily record data alarm method and device
US7197428B1 (en) Method for performance monitoring and modeling
US11310140B2 (en) Mitigating failure in request handling
CN113342608B (en) Method and device for monitoring tasks of streaming computing engine
US7369967B1 (en) System and method for monitoring and modeling system performance
US9860109B2 (en) Automatic alert generation
US10372572B1 (en) Prediction model testing framework
CN113190415A (en) Internet hospital system monitoring method, equipment, storage medium and program product
US7617313B1 (en) Metric transport and database load
JP2018060332A (en) Incident analysis program, incident analysis method, information processing device, service specification program, service specification method and service specification device
CN111327466A (en) Alarm analysis method, system, equipment and medium
CN111061588A (en) Method and device for locating database abnormal source
JP2015194797A (en) Omitted monitoring identification processing program, omitted monitoring identification processing method and omitted monitoring identification processor
CN110011845B (en) Log collection method and system
JP2004348640A (en) Method and system for managing network
EP3099012A1 (en) A method for determining a topology of a computer cloud at an event date
CN115580528A (en) Fault root cause positioning method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant