CN113986706A

CN113986706A - Automatic data service re-running method based on data service monitoring

Info

Publication number: CN113986706A
Application number: CN202111280366.5A
Authority: CN
Inventors: 何晓庆
Original assignee: Yamu Technology Co ltd
Current assignee: Yamu Technology Co ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The invention provides an automatic data service rerun method based on data service monitoring. The invention can realize standard development by including the monitoring file in the standardized module, thereby reducing the deployment threshold. In addition, during monitoring and rerun, all operations do not need manual intervention, an additional system is not needed, automatic service monitoring in the field of big data analysis can be achieved based on the monitoring file, and automatic service rerun is achieved based on the result of the service monitoring.

Description

Automatic data service re-running method based on data service monitoring

Technical Field

The invention relates to the technical field of big data operation and maintenance, in particular to an automatic data service rerun method based on data service monitoring.

Background

The existing big data operation and maintenance technical scheme does not really combine the service program, the monitoring program and the management program effectively and lacks an automatic processing mechanism. The current big data service re-running mechanism is mostly realized in a scheduling program, whether the operation is successful or not is basically judged based on the program operation state, and the abnormal response of the data service is not really completed by combining the service and the program operation state monitoring.

Therefore, a solution is needed that can perform data traffic reruns automatically based on data traffic monitoring.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

According to one embodiment of the invention, a method for automatic service re-running based on service monitoring is disclosed, comprising: acquiring an abnormal monitoring task of a service module, which needs to be rerun; generating and executing a rerun task for the abnormal monitoring task; verifying a result of the rerun task to determine whether the rerun task was successful; if the re-running task is not successful, comparing the re-running times of the re-running task with the re-running redundant times defined in the monitoring file; if the re-running times are less than the re-running redundant times, a new re-running task for the abnormal monitoring task is generated and executed again; and if the re-running times are more than or equal to the re-running redundant times, sending an alarm.

According to another embodiment of the present invention, an automatic service re-running system for service monitoring based on the following is disclosed, comprising: one or more business modules and a rerun module. Wherein each of the one or more traffic modules is configured to: executing a monitoring task based on a monitoring file defined in a business module, identifying the monitoring task as an abnormal monitoring task if the monitoring result of the monitoring task is abnormal, and adding the abnormal monitoring task into an abnormal monitoring task list, wherein the monitoring file is used for monitoring the running health degree of the business module, and the monitoring file defines one or more monitoring tasks aiming at the business module. The rerun module is configured to acquire an abnormal monitoring task of the one or more business modules, which needs to perform rerun, generate and execute a rerun task for the abnormal monitoring task, execute rerun task result detection, and judge whether to execute a new rerun task for the abnormal monitoring task again when the rerun task is not successful.

According to yet another embodiment of the present invention, a computing device for automatic business re-run based on business monitoring is disclosed, comprising: a processor; a memory storing instructions that, when executed by the processor, are capable of performing the method as described above.

These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.

Drawings

So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

FIG. 1 is a diagram 100 illustrating an abnormal operation of a data program in the prior art of big data field;

FIG. 2 illustrates a general schematic diagram of an automated business re-run system 200 for business-based monitoring according to one embodiment of the invention;

FIG. 3 shows a schematic diagram of a system 300 for generating a business module 201 and a rerun module 202 through code structure normalization according to one embodiment of the invention;

FIG. 4 shows a schematic diagram of a more detailed description of an automated business re-run system 200 for business-based monitoring, according to one embodiment of the invention;

FIG. 5 illustrates a flow diagram of a method 500 for automatic business re-run based on business monitoring according to one embodiment of the invention; and

FIG. 6 illustrates a block diagram of a computing device 600 that may be used as a hardware device for aspects of the invention, according to an embodiment of the invention.

Detailed Description

The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.

The definitions of some terms appearing in the present invention are briefly described below.

OLAP (on-Line analytical Processing): generally referred to as an on-line analytical process. Online analytical processing OLAP is a software technology that enables analysts to quickly, consistently, and interactively view information from various aspects for the purpose of deep understanding of the data.

Data service rerun: and the data program operates again according to the monitoring feedback result and the abnormal time.

Fig. 1 shows a schematic diagram 100 of a data program with an abnormal operation in the prior art of big data field. Referring to fig. 1, in the process of online running of a data program, when any link of source data (i), scheduling (ii) and data program service (iii) is abnormal, target data cannot be output or a correct result cannot be output.

When the source data of the link (r) is delayed or exceeds the retry range of the service, no target data is output. And the scheduling service of the link II is excessive, and the scheduling task is lost when the scheduling overtime occurs, so that the service cannot run normally. When running resources are insufficient and running is overtime in link III, the business process is killed, and data cannot be correctly output.

In the field of data management, although the service operation state is monitored and the data service is rerun, the configuration is mostly performed manually, and obviously, the method is not an economical method. The invention provides an automatic data service re-running technology which can automatically perform service re-running based on a service monitoring result.

FIG. 2 shows a general schematic of an automated business re-run system 200 for business-based monitoring according to one embodiment of the invention. The system 200 may include one or more business modules 201-1 through 201N (hereinafter collectively referred to as business modules 201) and a rerun module 202. The service module 201 is configured to perform automatic monitoring on the execution of the service based on the monitoring file defined in the service module 201, and generate an abnormal monitoring task list. The rerun module 202 is configured to obtain the abnormal monitoring task of the service module 201, generate and execute the rerun task, perform rerun task result detection, and determine whether to execute a new rerun task for the abnormal monitoring task again if the rerun task is not successful.

In general, the service module 201 and the rerun module 202 may be implemented at a client of a client performing a big data service. According to one embodiment of the invention, the business module 201 and the rerun module 202 are distributed to the clients by a provider providing big data OLAP services. The business module 201 and the rerun module 202 are further described below.

Fig. 3 shows a schematic diagram of a system 300 for generating a business module 201 and a rerun module 202 through code structure normalization according to one embodiment of the invention. The system 300 is configured to develop big data OLAP services. The system 300 may include a definition module 301 and an initialization module 302. Generally, the system 300 is implemented at a provider that provides big data OLAP services.

According to one embodiment of the invention, the definition module 301 is configured to define a module code directory structure to standardize an organization structure of module codes, thereby facilitating specification management in a code development process. According to one embodiment of the invention, the definition module 301 is configured to define a generic module code directory structure to accommodate all business functions. According to another embodiment of the invention, the definition module 301 may be configured to define different module code directory structures for specific business functions, so that a developer may select an appropriate module code directory structure to develop in the face of a business function proposed by a customer. For example, each module may be directed to a type of business function. For example, a module may be directed to a traffic function for detecting whether a malicious attack is being experienced. And another module may be directed to a service function for detecting whether the data flow is correct. Thus, the code directory structure of the two modules may be defined differently depending on the specific requirements of the different business functions.

According to an embodiment of the present invention, the module code directory structure for the business module 201 and the rerun module 202 may be the same or different.

In the big data OLAP business development process, the following aspects are generally involved: resource organization methods (i.e., code catalogs), front-end modular definition methods, back-end modular definition methods (e.g., data/schedule/configuration/code/environment/public services, etc.), automated operation and maintenance resource organization (e.g., monitoring/alarming, etc.), automated deployment management resource management (e.g., description for modules, etc.), and knowledge management (e.g., code dependencies, etc.).

According to one embodiment of the present invention, the present invention generalizes several aspects involved in the above big data OLAP business development process, such that each item in the module code directory structure for the business module 201 mainly contains one or more of the following: the description of the service module, the description of the configuration file, the description of the scheduling task, the description of the module initialization operation, the description of the monitoring task, the description of the module management operation after verification, the description of the service task and the description of the dependency library.

According to another embodiment of the invention, the entries in the module code directory structure for the rerun module 202 may primarily contain one or more of the following: the method comprises the following steps of describing a rerun module, describing a configuration file, describing a scheduling task, describing module initialization operation, describing a rerun task, describing how to call a service module, describing verification after module management operation is executed, and describing a dependency library.

Thus, the definition module 301 provides a standardized module directory structure so that developers can place corresponding code files according to the standardized module directory structure in subsequent development processes.

According to one embodiment of the invention, the initialization module 302 is configured to initialize a module code directory structure based on a desired business function to generate a business module (e.g., business module 201) containing initialized module code for the desired business function. Specifically, the initialization module 302 uses the module code directory structure defined by the definition module 301 as a standardized template structure, and generates corresponding initialization files for each item in the module code directory structure according to the needs of the business functions. By means of the initialization module 302, the standardized module directory structure is initialized for specific business requirements, so that a specific code organization manner conforms to the standardized module directory structure.

For example, to generate traffic module 201, initialization module 302 may be configured to generate a monitoring file (e.g., monitor. The monitoring file is used for monitoring the operation health degree of the business module aiming at the needed business function. The monitoring file specifies the scheduling frequency of the monitoring task, the specific execution content of the monitoring task, how to process the abnormal result after the monitoring task is executed, and the like. In particular, the monitoring file may define one or more monitoring tasks for the business module. For each monitoring task, the monitoring file may include one or more of: the method comprises the steps of monitoring the task name of a task, the monitoring type of the task, module information aimed at by the task, a cyclic identifier for judging whether the task needs to be executed periodically, the monitoring period of the task, an execution instruction of the task, a service monitoring rule, the running redundancy times and the like. As described below, the execution instruction of the monitoring task may include "data source information" and "sql of the monitoring task" as in the case where the monitoring type is data. The person skilled in the art can specify the execution instructions of the monitoring tasks according to different monitoring types.

In the following description of an example monitoring document, a task for monitoring based on sql is defined. This example is based on the type of monitoring of the data for monitoring the number of business items from the business data cache.

Wherein:

sql identifies the sql to be executed when the type is data, and the monitoring result is obtained through the sql, wherein the band { } represents a parameter and can be defined in a templating way, and the sql in the example represents the number of data strips with statistical conformity;

defined in rules are traffic monitoring rules that, if met, are considered abnormal and require rerunning or manual intervention. Although only one rule is shown in this example, there may be more than one rule to form a monitoring rule set;

alert _ times is the number of running redundancies that actually triggered an alert, in this example the identification triggered an alert up to 2 times.

In addition to the data types shown in this example, there may be the following monitoring types: file (file monitoring), directory (directory monitoring), process (process monitoring), port (port monitoring), api (api monitoring), and the like. For example, a file (file monitor) may monitor whether a file exists, and a process (process monitor) may monitor whether a process exists.

Table 1 below provides full names for matching symbols commonly used in codes, corresponding symbols used in codes, and corresponding meanings:

TABLE 1

After the provider delivers the service module 201 to the customer, the customer may deploy the service module 201, so that the customer can automatically perform monitoring of the operation state of the service module 201 based on the monitoring file in the service module 201. Those skilled in the art will appreciate that the client may be a computing device or a plurality of computing devices existing in a cluster manner, so as to at least implement the execution of the business function and the monitoring of the running state of the business function.

According to one embodiment of the invention, similar to business module 201, as described above, rerun module 202 may also be developed in a modular standard fashion and may be installed and/or deployed at the client independently of business module 201.

FIG. 4 shows a schematic diagram of a more detailed description of an automated business re-run system 200 for business-based monitoring, according to one embodiment of the invention. The system 200 may be implemented at a client of a customer performing a big data service. Of course, depending on the specific service module deployment, the modules in the system 200 may also be implemented in different or the same computing devices, servers, or cloud terminals. Although only one service module 201 is shown in fig. 4, it is fully understood by those skilled in the art that the technical solution of the present invention can be applied to the case of a plurality of service modules 201-1 to 201-N (as shown in fig. 2).

As shown in fig. 4, according to the logic function division, the service module 201 may include a service-related code file 203 (e.g., execution code of a service task, etc.), a monitoring file 204, a monitoring task scheduling module 205, a monitoring task execution module 206, and an exception monitoring task generation module 207. The rerun module 202 may include an exception monitoring task acquisition module 208 (optional), a rerun task generation module 209, a rerun task execution module 210, and a rerun task result detection module 211. Where any of the modules described above may communicate with any other module, but not all connections are shown for ease of illustration. Also, it is fully understood by those skilled in the art that the various modules described above are illustrated herein for illustrative purposes only, and that the functionality of one or more of the modules described above may be combined into a single module or split into multiple modules. Also, one or more of the above modules may be implemented in software, hardware, or a combination thereof.

According to an embodiment of the present invention, the monitoring task scheduling module 205 may be configured to read the monitoring file 204 in the service module 201 after the service module 201 is deployed at the client, and generate the monitoring task scheduling instruction according to the monitoring period specified in the monitoring file 204. For example, the monitoring task scheduling module 205 may be configured to obtain the corresponding unit number of the current time, perform a modulo operation on the monitoring period specified in the monitoring file 204, and when the modulo result is 0, represent that the execution condition is met, indicating that the monitoring task scheduling instruction may be generated.

According to an embodiment of the invention, the generated monitoring task scheduling instruction may be transmitted to the monitoring task execution module 206 to trigger execution of the monitoring task.

According to an embodiment of the present invention, the monitoring task execution module 206 may be configured to execute the monitoring task based on the monitoring file 204 upon receiving the monitoring task scheduling instruction, and match the monitoring result with the traffic monitoring rule specified in the monitoring file 204. For example, the monitoring task execution module 206 may be configured to extract execution instructions of the monitoring task, traffic monitoring rules, etc. from the monitoring file 204 to execute the monitoring task.

For example, the monitoring task executing module 206 is configured to execute the monitoring task based on an execution instruction of the monitoring task to obtain a monitoring result, compare the monitoring result with the service monitoring rule to determine whether the monitoring result is abnormal, and if so, send an abnormal monitoring task generating instruction to the abnormal monitoring task generating module 207.

Continuing with the exemplary monitoring file described above, the sql to be executed by the monitoring task is "sql": select count (—) from rpt, rpt _ resource _ partner _ domain _ name _ pv _1min where _ distance _ time ═ execution _ date }; ". For example, the monitoring result obtained by performing this sql is the number of terms in rpt.rpt _ resource _ partner _ domain _ name _ pv _1min at a certain time. If the obtained monitoring result is 0, the service monitoring rule is eq:0, and eq means equal, the monitoring result conforms to the service monitoring rule, and the monitoring result is abnormal. Herein, a monitoring task in which an abnormal monitoring result occurs is referred to as an abnormal monitoring task. In this case, the monitor task execution module 206 transmits the abnormality monitor task generation instruction to the abnormality monitor task generation module 207.

According to an embodiment of the present invention, the exception-monitoring task generating module 207 is configured to add the exception-monitoring task to the exception-monitoring task list based on the exception-monitoring task generating instruction sent by the monitoring task executing module 206. The exception monitoring task list is used for recording the monitoring tasks with exceptions in the business module 201, and may have the following fields, for example: { service module name, name of exception monitoring task, execution timestamp of exception monitoring task }, so that each exception monitoring task entry can represent an exception monitoring task with an exception monitoring result. Of course, one skilled in the art will fully appreciate that the exception monitoring task list may have other forms and/or fields. For example, a { rerun status } field may also be included in the exception monitoring task list to indicate the rerun status of the exception monitoring task (such as a rerun success, no rerun, a rerun failure, etc.).

According to an embodiment of the present invention, the exception monitoring task obtaining module 208 in the rerun module 202 is configured to obtain an exception monitoring task list in the business module 201, and select an exception monitoring task entry that needs to be rerun processed based on the obtained exception monitoring task list, so as to generate a corresponding rerun instruction to be transmitted to the rerun task generating module 209. For example, the exception-monitoring task fetch module 208 may generate the corresponding one or more rerun instructions based on one or more exception-monitoring task entries in the fetched exception-monitoring task list. In one embodiment, the exception monitoring task obtaining module 208 may select an exception monitoring task that requires a rerun process based on a "rerun status" of an exception monitoring task entry.

In one embodiment, the rerun instruction may include one or more fields of information in the exception-monitoring task entry, such as a module name of the business module 201, a task name of the exception-monitoring task, and the like. For example, in the case where there are a plurality of business modules 201 in the client, the rerun instruction includes a business module name to inform the rerun task generation module 209 which business task in the business module 201 needs to be rerun. In addition, the task name of the abnormal monitoring task in the rerun instruction may inform the rerun task generation module 209 of which monitoring task in the monitoring file 204 of the business module 201 has the abnormal monitoring result.

In another embodiment, the rerun module 202 may not include the exception monitoring task obtaining module 208, but obtains the monitoring file 204 of the service module 201 to perform the monitoring task, and when an exception monitoring result is found, identifies the monitoring task as an exception monitoring task and generates a corresponding rerun instruction to transmit to the rerun task generating module 209.

According to an embodiment of the present invention, the obtaining of the exception monitoring task by the rerun module 202 may be performed periodically or may be performed in response to receiving a trigger instruction transmitted by the exception monitoring task generation module 207 from the business module 201.

According to one embodiment of the invention, the re-running task generation module 209 is configured to generate a re-running task based on the received re-running instruction and to transfer the re-running task to the re-running task execution module 210 for re-running by the re-running task execution module 210. In one example, the generated rerun task may specify one or more of the following: the running method comprises the steps of a service task needing running again, running again scheduling time of the service task, a running again result verification mode, a running again task identifier and the like.

According to an embodiment of the present invention, the rerun task generation module 209 may determine, according to the business module 201 (e.g., any one of the business modules 201-1 to 201-N) indicated in the rerun instruction and the abnormal monitoring task indicated in the rerun instruction, one or more business tasks related to the abnormal monitoring task (i.e., the business task that causes the monitoring task to be abnormal and thus needs to be rerun) based on the business-related code file 203 in the indicated business module 201, and specify the rerun scheduling time of the one or more business tasks based on the dependency relationship of the one or more business tasks and other business tasks. According to another embodiment of the present invention, the rerun task generation module 209 may also directly specify a rerun scheduled time for the rerun task, for example, to perform the rerun task at a certain point in time after the current time.

According to an embodiment of the present invention, the rerun task generation module 209 may designate a verification manner of the rerun result as reacquiring the abnormal monitoring task list in the business module 201, and determine whether the rerun is successful based on the abnormal monitoring task list. According to another embodiment of the present invention, the rerun task generation module 209 may designate the verification mode of the rerun result as that the monitoring task in which the abnormality occurs is executed again at a designated time or immediately to determine whether the rerun is successful. Of course, the rerun task generation module 209 may also specify other verification methods to determine success.

According to one embodiment of the invention, the rerun task execution module 210 is configured to execute a rerun based on a received rerun task. In one embodiment, the rerun task execution module 210 executes the business tasks associated with the exception-monitoring task by calling the business-related code files 203 in the business module 201 at rerun scheduled time. In one embodiment, the manner in which the rerun task execution module 210 calls the service-related code file 203 in the service module 201 may be defined at the rerun module 202 according to the standardized method described above with reference to fig. 3, and this calling manner is out of the scope of the present invention, and those skilled in the art may implement this calling in various programming manners. After the rerun is completed, the rerun task execution module 210 sends a rerun completion instruction to the rerun task result detection module 211. The rerun completion instruction may include an identifier (e.g., name, ID) of the rerun task, and the like. For example, where there are multiple re-run tasks that require re-running, the identifier of the re-run task may inform the re-run task result detection module 210 which re-run task completed.

According to an embodiment of the present invention, the rerun task result detection module 211 is configured to verify the result of the rerun task based on the received rerun completion instruction to determine whether the rerun task was successful.

In the case where the rerun task specifies the verification mode of the rerun result as reacquiring the abnormal monitoring task list in the service module 201, the rerun task result detection module 211 acquires the abnormal monitoring task list of the service module 201, and determines whether a new abnormal monitoring task identical to the abnormal monitoring task that has been subjected to rerun before occurs. For example, if the timestamp of an abnormal monitor task entry indicating the same monitor task is later than the schedule time of the rerun task, it indicates that a new abnormal monitor task entry occurred and the previous rerun task failed. In one embodiment, the rerun task result detection module 211 may determine when to re-acquire the abnormal monitoring task list based on a scheduled time in the traffic module 201 for the abnormal monitoring task to prevent the abnormal monitoring task list from being acquired if the monitoring task has not been re-executed.

In the case where the rerun task specifies the verification manner of the rerun result as that the monitoring task in which the abnormality occurs is executed again at a specified time or immediately, the rerun task result detection module 211 calls the monitoring file 204 in the service module 201 to execute the monitoring task again, and determines whether the monitoring result is abnormal based on the service monitoring rule in the monitoring file 204. And if the monitoring result is not abnormal, indicating that the rerun task is successful.

According to one embodiment of the invention, the rerun task result detection module 211 is further configured to maintain a number of reruns for the rerun task. For example, the re-run task result detection module 211 may add 1 to the number of re-runs for the re-run task upon receiving the re-run completion instruction. In one embodiment, the number of reruns is initialized to 0.

According to an embodiment of the present invention, the rerun task result detection module 211 is further configured to compare the number of rerun times for the abnormal monitoring task with the number of rerun redundancy times specified in the monitoring file 204 in the business module 201 in case of a rerun task failure. If the number of re-running times is less than the number of re-running redundancy times, the re-running task result detection module 211 sends a re-running instruction to the re-running task generation module 209 to generate and execute a new re-running task for the abnormality monitoring task. If the number of rerun times is greater than or equal to the number of rerun redundancy times, an alert is sent to an administrator or customer for manual intervention.

According to an embodiment of the present invention, the rerun task result detection module 211 may return the final rerun result status to the service module 201 to update the "rerun status" of the entry for the abnormal monitoring task in the abnormal monitoring task list. For example, the "re-run status" is updated to "re-run successful" or "re-run failed".

FIG. 5 illustrates a flow diagram of a method 500 for automatic business re-run based on business monitoring according to one embodiment of the invention. The method 500 may be implemented at a client. Of course, depending on the particular business module deployment, the steps of method 500 may be implemented on different or the same computing device, server, or cloud.

At 501, an abnormal monitoring task of a business module, which needs to be rerun, is obtained.

According to an embodiment of the present invention, acquiring the exception monitoring task of the service module, which requires a rerun, may include acquiring an exception monitoring task list of the service module to identify the exception monitoring task from which the rerun is required.

According to another embodiment of the present invention, acquiring the abnormal monitoring task of the business module, which needs to be rerun, may include scheduling a monitoring file in the business module to execute the monitoring task. In one embodiment, performing the monitoring task may include: (a) executing a monitoring task based on a monitoring file in the service module, and obtaining a monitoring result; (b) and judging whether the monitoring result conforms to the business monitoring rule or not based on the business monitoring rule defined in the monitoring file. According to one embodiment of the invention, the traffic module is generated by: (1) defining a module code directory structure to standardize the organization structure of the module codes; (2) initializing the module code directory structure based on the desired business function such that one or more of the module code directory structures are initialized according to the desired business function proposed by the customer to generate initialized module code for the desired business function, the business module comprising the initialized module code. Further, the service module includes a monitoring file generated during an initialization process. The monitoring file defines one or more monitoring tasks for the business module.

According to one embodiment of the invention, if the monitoring result conforms to the service monitoring rule, it indicates that the monitoring result is abnormal. In one embodiment, the monitoring task in which the abnormal monitoring result occurs may be added to the abnormal monitoring task list as an abnormal monitoring task.

At 502, a rerun task for the anomaly monitoring task is generated and executed. According to an embodiment of the invention, the rerun module may obtain an abnormal monitoring task list of the service module or execute the monitoring task to obtain an abnormal monitoring task that needs to be rerun. And the rerun module generates a rerun task aiming at the abnormal monitoring task and calls a relevant business module to execute the rerun task. In one embodiment, the rerun task may specify one or more of the following: the running method comprises the steps of a service task needing running again, running again scheduling time of the service task, a running again result verification mode, a running again task identifier and the like. According to one embodiment of the invention, after the rerun task is finished, the number of rerun times for the rerun task is increased by 1.

At 503, the results of the rerun task are verified to determine whether the rerun task was successful. If successful, step 507 is entered or the method is ended, if not successful, step 504 is entered. According to an embodiment of the present invention, verifying the result of the rerun task may include retrieving a list of abnormal monitoring tasks in the business module 201 and determining whether the rerun task is successful based on the list of abnormal monitoring tasks. According to another embodiment of the invention, verifying the results of the rerun task may include re-executing the monitoring task that is anomalous at a specified time or immediately.

At 504, the number of rerun times for the rerun task is compared to the redundant number of rerun times. If the number of rerun times is less than the number of rerun redundancy times, step 505 is entered, and if the number of rerun times is greater than or equal to the number of rerun redundancy times, step 506 is entered.

At 505, a new rerun task for the anomaly monitoring task is generated and executed. Thus, after the new rerun task is executed, the process returns to step 503, and it is determined again whether the new rerun task is successful.

At 506, an alert is sent to the administrator or customer for human intervention. According to one embodiment of the invention, the alert may be delivered to the client in a variety of ways, such as through a user interface, voice, text, and the like. According to another embodiment of the present invention, if a plurality of service modules are installed at a client, alarm information for the plurality of service modules may be simultaneously displayed in a user interface.

At 507, the final re-run result status is returned to the business module. Step 507 may be optional according to an embodiment of the present invention.

Therefore, by modularizing and standardizing the code structure and including the monitoring file in the standardized module, standard development can be achieved, and the deployment threshold is reduced. In addition, during monitoring and rerun, all operations do not need manual intervention, and an additional system is not needed, so that automatic service monitoring and complement rerun in the field of big data analysis can be realized.

FIG. 6 illustrates a block diagram of a computing device 600 that may be used as a hardware device for aspects of the invention, according to an embodiment of the invention. For example, a big data service provider and/or a client in the present invention may be implemented as computing device 600 or as a cluster of computing devices 600.

Referring to fig. 6, computing device 600 may be any machine that may be configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smartphone, an in-vehicle computer, a home camera, a video conference device, a road camera, or any combination thereof. The various methods/apparatus/servers/client devices described above may be implemented in whole or at least in part by a computing device 600 or similar device or system.

Computing device 600 may include components that may be connected or communicate via one or more interfaces and a bus 602. For example, computing device 600 may include a bus 602, one or more processors 604, one or more input devices 606, and one or more output devices 608. The one or more processors 604 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., dedicated processing chips). Input device 606 can be any type of device capable of inputting information to a computing device and can include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, a camera, and/or a remote control. Output device 608 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 600 may also include or be connected to non-transitory storage device 610, which may be any storage device that is non-transitory and that enables data storage, and which may include, but is not limited to, a disk drive, an optical storage device, a solid-state memory, a floppy disk, a flexible disk, a hard disk, a tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. Non-transitory storage device 610 may be detached from the interface. The non-transitory storage device 610 may have data/instructions/code for implementing the above-described methods and steps. Computing device 600 may also include a communication device 612. The communication device 612 may be any type of device or system capable of communicating with internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth device, an IEEE 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.

The bus 602 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.

Computing device 600 may also include a working memory 614, which working memory 614 may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 604 and may include, but is not limited to, random access memory and/or read only memory devices.

Software components may be located in the working memory 614, including, but not limited to, an operating system 616, one or more application programs 618, drivers, and/or other data and code. Instructions for implementing the above-described methods and steps of the invention may be included in the one or more applications 618, and the above-described method 500 of the invention may be implemented by the processor 604 reading and executing the instructions of the one or more applications 618.

It should also be appreciated that variations may be made according to particular needs. For example, customized hardware might also be used, and/or particular components might be implemented in hardware, software, firmware, middleware, microcode, hardware description speech, or any combination thereof. In addition, connections to other computing devices, such as network input/output devices and the like, may be employed. For example, some or all of the disclosed methods and apparatus can be implemented with logic and algorithms in accordance with the present invention through programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) having assembly language or hardware programming languages (e.g., VERILOG, VHDL, C + +).

Although the various aspects of the present invention have been described thus far with reference to the accompanying drawings, the above-described methods, systems, and apparatuses are merely examples, and the scope of the present invention is not limited to these aspects but only by the appended claims and equivalents thereof. Various components may be omitted or may be replaced with equivalent components. In addition, the steps may also be performed in a different order than described in the present invention. Further, the various components may be combined in various ways. It is also important that as technology develops that many of the described components can be replaced by equivalent components appearing later.

Claims

1. A method for automatic traffic re-running based on traffic monitoring, comprising:

(a) acquiring an abnormal monitoring task of a service module, which needs to be rerun;

(b) generating and executing a rerun task for the abnormal monitoring task;

(c) verifying a result of the rerun task to determine whether the rerun task was successful;

(d) if the re-running task is not successful, comparing the re-running times of the re-running task with the re-running redundant times defined in the monitoring file;

(e-1) if the number of re-running is less than the number of re-running redundancy, re-generating and executing a new re-running task for the abnormality monitoring task;

(e-2) if the number of re-runs is greater than or equal to the number of re-runs redundancy, transmitting an alarm.

2. The method of claim 1, wherein obtaining an exception-monitoring task requiring a rerun further comprises: and acquiring an abnormal monitoring task list in the service module, wherein the abnormal monitoring task list is used for recording abnormal monitoring tasks in the service module.

3. The method of claim 1, wherein obtaining an exception-monitoring task requiring a rerun further comprises: scheduling the monitoring file in the service module to execute a monitoring task, including:

executing the monitoring tasks based on the monitoring files in the business modules and obtaining monitoring results, wherein the monitoring files are used for monitoring the running health degree of the business modules and define one or more monitoring tasks aiming at the business modules;

judging whether the monitoring result conforms to the business monitoring rule or not based on the business monitoring rule defined in the monitoring file;

and if the monitoring result conforms to the service monitoring rule, identifying the monitoring task as an abnormal monitoring task.

4. The method of claim 1, wherein the rerun task specifies one or more of: the running method comprises the steps of a service task needing running again, running again scheduling time of the service task, a running again result verification mode or a running again task identifier.

5. The method of claim 1, wherein verifying the results of the rerun task to determine whether the rerun task was successful further comprises: and re-acquiring the abnormal monitoring task list of the service module, and determining whether the re-running task is successful or not based on the abnormal monitoring task list.

6. An automated business re-run system for business-based monitoring, comprising:

one or more traffic modules, each of the one or more traffic modules configured to: executing a monitoring task based on a monitoring file defined in a business module, identifying the monitoring task as an abnormal monitoring task if the monitoring result of the monitoring task is abnormal, and adding the abnormal monitoring task into an abnormal monitoring task list, wherein the monitoring file is used for monitoring the running health degree of the business module, and the monitoring file defines one or more monitoring tasks for the business module;

the rerun module is configured to acquire an abnormal monitoring task of the one or more service modules, which needs to be rerun, generate and execute a rerun task for the abnormal monitoring task, execute rerun task result detection, and judge whether to execute a new rerun task for the abnormal monitoring task again when the rerun task is not successful.

7. The system of claim 6, wherein the rerun module further comprises:

a rerun task generation module configured to generate a rerun task for the anomaly monitoring task;

a rerun task execution module configured to execute the rerun task,

a re-run task result detection module configured to: verifying a result of the rerun task to determine whether the rerun task was successful; if the re-running task is not successful, comparing the re-running times of the re-running task with the re-running redundant times defined in the monitoring file; if the re-running times are less than the re-running redundant times, instructing the re-running task generation module to generate a new re-running task for the abnormal monitoring task again; and if the re-running times are more than or equal to the re-running redundant times, sending an alarm.

8. The system of claim 6, wherein the rerun module further comprises an exception monitoring task obtaining module configured to obtain a list of exception monitoring tasks in the one or more business modules and select an exception monitoring task that requires rerun processing based on the obtained list of exception monitoring tasks.

9. The system of claim 6, wherein the rerun task specifies one or more of: the running method comprises the steps of a service task needing running again, running again scheduling time of the service task, a running again result verification mode or a running again task identifier.

10. A computing device for automatic business re-running based on business monitoring, comprising:

a processor;

a memory storing instructions that, when executed by the processor, are capable of performing the method of claims 1-5.