CN112181763A - Intelligent detection alarm method and device in intelligent scheduling - Google Patents

Intelligent detection alarm method and device in intelligent scheduling Download PDF

Info

Publication number
CN112181763A
CN112181763A CN202011005560.8A CN202011005560A CN112181763A CN 112181763 A CN112181763 A CN 112181763A CN 202011005560 A CN202011005560 A CN 202011005560A CN 112181763 A CN112181763 A CN 112181763A
Authority
CN
China
Prior art keywords
alarm
information
node
detection
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011005560.8A
Other languages
Chinese (zh)
Inventor
高伟钦
陈守当
翁世清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011005560.8A priority Critical patent/CN112181763A/en
Publication of CN112181763A publication Critical patent/CN112181763A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/182Level alarms, e.g. alarms responsive to variables exceeding a threshold
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms

Abstract

The invention discloses a method and a device for intelligently detecting and alarming in intelligent scheduling, and relates to the technical field of computers. One embodiment of the method comprises: configuring a detection alarm task according to the requirement, and storing configuration information into a database; acquiring detection alarm task information from the database at regular time; and carrying out corresponding detection alarm according to the detection alarm task information. According to the implementation mode, the abnormity is found in advance, the failure rate of operation is reduced, operation and maintenance personnel can quickly and accurately position the problems, and the high availability and high reliability of a dispatching system are ensured.

Description

Intelligent detection alarm method and device in intelligent scheduling
Technical Field
The invention relates to the technical field of computers, in particular to an intelligent detection alarm method and device in intelligent scheduling.
Background
At present, in the process of job scheduling, job operation failure exists, and in a traditional job scheduling system, generally, an alarm is given only after the job operation failure occurs, so that a user cannot immediately know own job abnormality when the job failure occurs. And the reasons for the operation failure of the job are many, which may be the problem of the job itself, or may be the shortage of the execution machine resources, the exception of the service component, even the exception of MQ and Redis. For the problem that the operation fails, the user is difficult to locate, and the developer usually needs to spend much time when checking the problem. If the job scheduling system is to become a high-performance job scheduling system, the job operation failure should be avoided as much as possible.
Disclosure of Invention
In view of this, embodiments of the present invention provide an intelligent detection and alarm method and apparatus in intelligent scheduling, which implement finding an abnormality in advance, reduce a failure rate of an operation, enable an operation and maintenance worker to quickly and accurately locate a problem, and ensure high availability and high reliability of a scheduling system.
To achieve the above object, according to an aspect of an embodiment of the present invention, a method for intelligently detecting an alarm in intelligent scheduling is provided.
The method for intelligently detecting the alarm in the intelligent scheduling comprises the following steps:
configuring a detection alarm task according to the requirement, and storing configuration information into a database;
acquiring detection alarm task information from the database at regular time; and
and carrying out corresponding detection alarm according to the detection alarm task information.
Optionally, the task of configuring the detection alarm includes configuring an alarm object, an alarm detection item, and an alarm indicator.
Optionally, performing corresponding detection alarm according to the alarm task information includes:
acquiring alarm detection items from the database;
detecting the acquired alarm detection items; and
and generating alarm information according to the detection result.
Optionally, performing corresponding detection alarm according to the alarm task information further includes:
storing the alarm information in an alarm information table while transmitting the alarm information to the MQ, an
And consuming the alarm information in the MQ queue, and alarming according to an alarm mode and an alarm information template configured by the user.
Optionally, if the sending MQ queue fails, a short message alarm is directly performed.
Optionally, the alert object includes: a job flow, a job, an MQ cluster, a Redis cluster, a service node, and a service component.
Optionally, the method for detecting and warning the workflow includes:
acquiring job flow alarm task information;
detecting whether the instantiation of the workflow is overtime;
if the instantiation of the job flow is overtime, sending alarm information to an alarm MQ queue; and
and if the instantiation of the workflow fails or the turnover of the workflow fails, respectively and automatically sending corresponding alarm information.
Optionally, the method for detecting and warning the job includes:
acquiring operation warning task information;
detecting whether the operation is overtime;
if the operation of the job is overtime, sending alarm information to an alarm MQ queue; and
and if the operation execution fails or the operation distribution fails, respectively and automatically sending corresponding alarm information.
Optionally, the method for detecting and alarming by MQ cluster includes:
acquiring MQ alarm task information;
detecting whether the MQ node is offline, the usage rate of the MQ disk space and whether MQ information is accumulated;
if the MQ node is offline, the use rate of the MQ disk space exceeds a threshold value or the MQ information is accumulated, the corresponding alarm information is respectively sent to the alarm MQ queue.
Optionally, the method for detecting and warning of the Redis cluster includes:
acquiring Redis alarm task information;
detecting whether the Redis node is offline or not and whether the Redis node is abnormal in reading and writing or not;
and if the Redis node is off-line or the Redis read-write is abnormal, respectively sending corresponding alarm information to the alarm MQ queue.
Optionally, the method for detecting and alarming by the service node includes:
acquiring alarm task information of a service node;
detecting whether the node is off-line, the utilization rate of a node CPU, the occupancy rate of a node memory, the number of node processes and the utilization rate of a node disk space;
and if the node is offline, the utilization rate of the CPU of the node exceeds a threshold value, the occupancy rate of the memory of the node exceeds a threshold value, the number of node processes exceeds a threshold value or the utilization rate of the disk space of the node exceeds a threshold value, respectively sending corresponding alarm information to an alarm MQ queue.
Optionally, the method for detecting and alarming a service component includes:
acquiring alarm task information of a service component;
detecting whether the service component process is abnormal;
and if the service component process is abnormal, sending corresponding alarm information to an alarm MQ queue.
To achieve the above object, according to still another aspect of the embodiments of the present invention, an apparatus for intelligently detecting an alarm in intelligent scheduling is provided.
The device for intelligently detecting the alarm in the intelligent scheduling comprises the following steps: the method comprises the following steps of configuring an alarm task module, acquiring the alarm task module and executing the alarm task module; wherein the content of the first and second substances,
the configuration alarm task module is used for configuring a detection alarm task according to the requirement and storing configuration information into a database;
the alarm task acquisition module is used for acquiring detection alarm task information from the database at regular time; and
and the alarm task execution module is used for carrying out corresponding detection alarm according to the detection alarm task information.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for intelligently detecting an alarm in intelligent scheduling.
The intelligent detection alarm electronic equipment in the intelligent scheduling of the embodiment of the invention comprises: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for intelligently detecting the alarm in the intelligent scheduling according to the embodiment of the invention.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which, when executed by a processor, implements a method of intelligently detecting alarms in intelligent scheduling of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: firstly, configuring a detection alarm task according to requirements, and storing configuration information into a database; then, acquiring detection alarm task information from the database at regular time; and then, corresponding detection alarm is carried out according to the detection alarm task information. Therefore, the abnormity is found in advance, the failure rate of the operation is reduced, the operation and maintenance personnel can quickly and accurately position the problems, and the high availability and the high reliability of the dispatching system are ensured.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a diagram illustrating the main steps of an intelligent detection alarm method in intelligent scheduling according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the main steps of a workflow detection alarm method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of the main steps of a job detection alarm method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of the main steps of the MQ cluster detection alarm method according to the embodiment of the present invention;
FIG. 5 is a diagram illustrating the main steps of a Redis cluster detection alarm method according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating the main steps of a service node detection alarm method according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating the main steps of a service component detection alarm method according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating the major modules of an intelligent detection alarm device in intelligent scheduling according to an embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 10 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram of main steps of an intelligent detection alarm method in intelligent scheduling according to an embodiment of the present invention.
As shown in fig. 1, an intelligent detection and alarm method in intelligent scheduling according to an embodiment of the present invention mainly includes the following steps:
step S101: configuring a detection alarm task according to the requirement, and storing configuration information into a database;
step S102: acquiring detection alarm task information from a database at regular time;
step S103: and carrying out corresponding detection alarm according to the detection alarm task information.
Optionally, performing corresponding detection alarm according to the alarm task information includes:
acquiring alarm detection items from the database;
detecting the acquired alarm detection items; and
and generating alarm information according to the detection result.
Optionally, performing corresponding detection alarm according to the alarm task information further includes:
storing the alert message in an alert message table while sending the alert message to a message queue MQ, an
And consuming the alarm information in the MQ queue, and alarming according to an alarm mode and an alarm information template configured by the user.
If the MQ queue fails to be sent, the short message alarm is directly carried out. And if the same alarm is detected to be triggered for multiple times in a time period, the alarm is not carried out any more.
Optionally, the task of configuring the detection alarm includes configuring an alarm object, an alarm detection item, and an alarm indicator. Here, the alarm objects include job flows, jobs, MQ clusters, Redis clusters, service nodes, and service components.
The detection items of the workflow comprise workflow instantiation delay, workflow instantiation failure and workflow card turning failure, and correspondingly, the alarm indexes of the three detection items are baseline time, running state and copying state respectively. According to the detection items and the alarm indexes, the system alarms under the following three conditions: the first lot of the workflow has not been instantiated at the specified point in time, the workflow has failed to instantiate, and the workflow has failed to flip.
The detection items of the operation comprise operation delay, operation execution overtime, operation execution failure and operation dispatching failure, and correspondingly, the alarm indexes of the four detection items are baseline time, duration and operation states respectively. Here, the alarm indicators of the job execution failure and the job dispatch failure are both in the running state. According to the detection items and the alarm indexes, the system alarms under the following four conditions: the job is not executed at the designated point in time, the execution time of the job (i.e., the current system time-dispatch time) is greater than the set duration, the job fails to execute, and the job fails to dispatch.
The detection items of the MQ cluster comprise whether the node is offline, whether the messages are accumulated and the utilization rate of the disk space, and correspondingly, the alarm indexes of the three detection items are respectively the node state, the message accumulation number and the threshold value of the utilization rate of the disk space. According to the detection items and the alarm indexes, the system alarms under the following three conditions: the MQ node is offline, the number of message piles exceeds a threshold, and the disk space usage of the MQ service exceeds a threshold.
The detection items of the Redis cluster comprise whether the node is off-line or not and whether the read-write function is normal or not, and correspondingly, the alarm indexes of the two detection items are both in node states. According to the detection items and the alarm indexes, the system alarms under the following two conditions: redis nodes are offline and Redis read-write functions are abnormal.
The detection items of the service node comprise whether the service node is offline, the service node CPU utilization rate, the service node memory occupancy rate, the service node process number and the service node disk space utilization rate, and correspondingly, the alarm indexes of the five detection items are respectively a node state, a CPU utilization rate threshold value and a disk space utilization rate threshold value. According to the detection items and the alarm indexes, the system alarms under the following three conditions: the MQ node is offline, the number of message piles exceeds a threshold, and the disk space usage of the MQ service exceeds a threshold.
The detection item of the service component is an instance state, the alarm index is a process state, and when the process of the service component is abnormal, an alarm is given.
The following fig. 2 to fig. 7 describe the detection alarm method of each detection item in detail.
FIG. 2 is a schematic diagram of the main steps of a workflow detection alarm method according to an embodiment of the invention;
as shown in the figure, the job flow detection alarm method according to the embodiment of the present invention includes the following steps:
step S201: acquiring job flow alarm task information;
step S202: detecting whether the instantiation of the workflow is overtime;
step S203: determining whether the instantiation of the job flow is overtime, if the instantiation of the job flow is overtime, executing the step S204, and sending alarm information to an alarm MQ array; and
step S205: and if the instantiation of the workflow fails or the turnover of the workflow fails, respectively and automatically sending corresponding alarm information.
FIG. 3 is a schematic diagram of the main steps of a job detection alarm method according to an embodiment of the present invention;
as shown in the figure, the job detection warning method according to the embodiment of the present invention includes the following steps:
step S301: acquiring operation warning task information;
step S302: detecting whether the operation is overtime;
step S303, determining whether the operation of the job is overtime, if the operation of the job is overtime, executing step S304, and sending alarm information to an alarm MQ array; and
step S305: and if the operation execution fails or the operation distribution fails, respectively and automatically sending corresponding alarm information.
Fig. 4 is a schematic diagram of the main steps of the MQ cluster detection alarm method according to the embodiment of the present invention.
As shown in the figure, the MQ cluster detection alarm method according to the embodiment of the present invention includes the following steps:
step S401: acquiring MQ alarm task information;
step S402: detecting whether the MQ node is offline, the space utilization rate of the MQ disk and whether MQ information is accumulated;
step S403: and determining a detection result, if the MQ node is offline, the usage rate of the MQ disk space exceeds a threshold value, and the MQ information is accumulated, executing a step S404, and respectively sending the alarm information to the alarm MQ array.
The MQ nodes refer to a master node and a slave node of the MQ, namely, once the node downtime is found, an alarm is given immediately. If too many messages are piled up in the queue and many messages in the MQ queue are not consumed, the service component is proved to have possible problems and needs to be alarmed. The MQ disk space utilization rate exceeds the threshold value, the MQ disk space is insufficient, and an alarm is also needed.
Fig. 5 is a schematic diagram of main steps of a Redis cluster detection alarm method according to an embodiment of the present invention.
As shown in the figure, the MQ cluster detection alarm method according to the embodiment of the present invention includes the following steps:
step S501: acquiring redis alarm task information;
step S502: detecting whether the redis node is off-line or not and whether the redis node is abnormal in reading and writing or not;
step S503: and determining a detection result, if the redis node is offline and the redis reading and writing are abnormal, executing a step S504, and respectively sending alarm information to the alarm MQ array.
Fig. 6 is a schematic diagram of main steps of a service node detection alarm method according to an embodiment of the present invention.
As shown in the figure, the service node detection alarm method according to the embodiment of the present invention includes the following steps:
step S601: acquiring alarm task information of a service node;
step S602: detecting whether the node is off-line, the utilization rate of a node CPU, the occupancy rate of a node memory, the number of node processes and the utilization rate of a node disk space;
step S603: and determining the detection result, and if the detection result exceeds the alarm index, executing the step S604 and respectively sending alarm information to the alarm MQ array.
The service node detection alarm refers to performing timing detection on the execution machine and giving an alarm.
The executive machine reports the information regularly and records the reporting time. The detection alarm items are respectively explained as follows:
a) if the node is offline, if no information is reported when the last time of reporting information by the execution machine exceeds a certain time, judging that the execution machine is offline and needing to be alarmed;
b) when the utilization rates of the node CPUs acquired before the current moment and acquired from the node information table and the node monitoring information table exceed a threshold value, alarming is needed;
c) the node memory occupancy rate is that when the node residual memories acquired before the current time and acquired from the node information table and the node monitoring information table are all smaller than a threshold value, an alarm needs to be given;
d) node process detection, namely alarming when the number of the node processes acquired before the current moment and acquired from a node information table and a node monitoring information table is greater than a threshold value;
e) and when the node disk space utilization rates acquired from the node information table and the node monitoring information table exceed the threshold value, alarming is required.
Fig. 7 is a schematic diagram of main steps of a service component detection alarm method according to an embodiment of the present invention.
As shown in the figure, the service component detection alarm method according to the embodiment of the present invention includes the following steps:
step S701: acquiring alarm task information of a service component;
step S702: detecting whether the service component process is abnormal;
step S703: and determining whether the service component process is abnormal, if the service component process is abnormal, executing a step S704, and respectively sending alarm information to the alarm MQ array.
The service components comprise an accept component, a deal component, a timer component and an intervention component, wherein the service components comprise an event receiving component, a deal component, a time management component and an intervention component, whether the instance states of the service components are abnormal or not is detected according to the alarm task information, and when the actual number of instances is less than the preset number of instances, the process of the service components is abnormal and an alarm needs to be given.
Fig. 8 is a schematic diagram of main modules of an intelligent detection alarm device in intelligent scheduling according to an embodiment of the present invention.
As shown in fig. 8, an intelligent detection alarm apparatus 800 in intelligent scheduling according to an embodiment of the present invention includes: a warning task configuration module 801, a warning task acquisition module 802 and a warning task execution module 803; wherein the content of the first and second substances,
the configuration alarm task module 801 is configured to configure a detection alarm task according to a requirement, and store configuration information into a database;
the alarm task obtaining module 802 is configured to obtain detection alarm task information from the database at regular time; and
the alarm task execution module 803 is configured to perform corresponding detection alarm according to the detection alarm task information.
Optionally, performing corresponding detection alarm according to the alarm task information includes:
acquiring alarm detection items from the database;
detecting the acquired alarm detection items; and
and generating alarm information according to the detection result.
Optionally, performing corresponding detection alarm according to the alarm task information further includes:
storing the alert message in an alert message table while sending the alert message to a message queue MQ, an
And consuming the alarm information in the MQ queue, and alarming according to an alarm mode and an alarm information template configured by the user.
If the MQ queue fails to be sent, the short message alarm is directly carried out. And if the same alarm is detected to be triggered for multiple times in a time period, the alarm is not carried out any more.
Optionally, the task of configuring the detection alarm includes configuring an alarm object, an alarm detection item, and an alarm indicator. Here, the alarm objects include job flows, jobs, MQ clusters, Redis clusters, service nodes, and service components.
From the above description, it can be seen that the intelligent detection alarm device in intelligent scheduling according to the embodiment of the present invention realizes early detection of an abnormality, reduces a failure rate of an operation, enables an operation and maintenance worker to quickly and accurately locate a problem, and ensures high availability and high reliability of a scheduling system.
Fig. 9 illustrates an exemplary system architecture 900 of a method for intelligently detecting alarms in intelligent scheduling or an apparatus for intelligently detecting alarms in intelligent scheduling, to which embodiments of the present invention may be applied.
As shown in fig. 9, the system architecture 900 may include end devices 901, 902, 903, a network 904, and a server 905. Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 905 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 901, 902, and 903. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that, the method for intelligently detecting an alarm device in intelligent scheduling provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the device for intelligently detecting an alarm is generally disposed in the server 905.
It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a configuration alarm task module, an acquisition alarm task module and an execution alarm task module. The names of these modules do not form a limitation to the module itself in some cases, for example, the module for acquiring alarm task may also be described as a "module for acquiring detection alarm task information from a database at regular time".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform the steps of: configuring a detection alarm task according to the requirement, and storing configuration information into a database; acquiring detection alarm task information from the database at regular time; and carrying out corresponding detection alarm according to the detection alarm task information.
According to the technical scheme of the embodiment of the invention, firstly, a detection alarm task is configured according to requirements, and configuration information is stored in a database; then, acquiring detection alarm task information from the database at regular time; and then, corresponding detection alarm is carried out according to the detection alarm task information. Therefore, the abnormity is found in advance, the failure rate of the operation is reduced, the operation and maintenance personnel can quickly and accurately position the problems, and the high availability and the high reliability of the dispatching system are ensured.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (16)

1. An intelligent detection alarm method in intelligent scheduling is characterized by comprising the following steps:
configuring a detection alarm task according to the requirement, and storing configuration information into a database;
acquiring detection alarm task information from the database at regular time; and
and carrying out corresponding detection alarm according to the detection alarm task information.
2. The method of claim 1, wherein configuring a detection alarm task comprises configuring an alarm object, an alarm detection item, and an alarm indicator.
3. The method of claim 1, wherein performing the corresponding detection alarm according to the alarm task information comprises:
acquiring alarm detection items from the database;
detecting the acquired alarm detection items; and
and generating alarm information according to the detection result.
4. The method of claim 1, wherein performing the corresponding detection alarm according to the alarm task information further comprises:
storing the alarm information in an alarm information table while transmitting the alarm information to the MQ, an
And consuming the alarm information in the MQ queue, and alarming according to an alarm mode and an alarm information template configured by the user.
5. The method of claim 4, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layer
If the MQ queue fails to be sent, the short message alarm is directly carried out.
6. The method of claim 4, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layer
If the same alarm is detected to be triggered multiple times within a time period, no alarm is performed.
7. The method of claim 2, wherein the alert object comprises: a job flow, a job, an MQ cluster, a Redis cluster, a service node, and a service component.
8. The method according to claim 1, wherein the detection alarm method for the workflow comprises:
acquiring job flow alarm task information;
detecting whether the instantiation of the workflow is overtime;
if the instantiation of the job flow is overtime, sending an alarm message to an alarm MQ queue; and
and if the instantiation of the workflow fails or the turnover of the workflow fails, respectively and automatically sending corresponding alarm information.
9. The method of claim 1, wherein the detection alarm method of the job comprises:
acquiring operation warning task information;
detecting whether the operation is overtime;
if the operation of the job is overtime, sending alarm information to an alarm MQ queue; and
and if the operation execution fails or the operation distribution fails, respectively and automatically sending corresponding alarm information.
10. The method as claimed in claim 1, wherein the MQ cluster detection alarm method comprises:
acquiring MQ alarm task information;
detecting whether the MQ node is offline, the usage rate of the MQ disk space and whether MQ information is accumulated;
if the MQ node is offline, the use rate of the MQ disk space exceeds a threshold value or the MQ information is accumulated, the corresponding alarm information is respectively sent to the alarm MQ queue.
11. The method according to claim 1, wherein the detecting alarm method of Redis cluster comprises:
acquiring Redis alarm task information;
detecting whether the Redis node is offline or not and whether the Redis node is abnormal in reading and writing or not;
and if the Redis node is off-line or the Redis read-write is abnormal, respectively sending corresponding alarm information to the alarm MQ queue.
12. The method of claim 1, wherein the method for detecting and alarming the service node comprises:
acquiring alarm task information of a service node;
detecting whether the node is off-line, the utilization rate of a node CPU, the occupancy rate of a node memory, the number of node processes and the utilization rate of a node disk space;
and if the node is offline, the utilization rate of the CPU of the node exceeds a threshold value, the occupancy rate of the memory of the node exceeds a threshold value, the number of node processes exceeds a threshold value or the utilization rate of the disk space of the node exceeds a threshold value, respectively sending corresponding alarm information to an alarm MQ queue.
13. The method of claim 1, wherein the method for detecting and alerting the service component comprises:
acquiring alarm task information of a service component;
detecting whether the service component process is abnormal;
and if the service component process is abnormal, sending corresponding alarm information to an alarm MQ queue.
14. An intelligent detection alarm device in intelligent scheduling, comprising: the method comprises the following steps of configuring an alarm task module, acquiring the alarm task module and executing the alarm task module; wherein the content of the first and second substances,
the configuration alarm task module is used for configuring a detection alarm task according to the requirement and storing configuration information into a database;
the alarm task acquisition module is used for acquiring detection alarm task information from the database at regular time; and
and the alarm task execution module is used for carrying out corresponding detection alarm according to the detection alarm task information.
15. An intelligent detection alarm electronic device in intelligent scheduling, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-13.
16. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-13.
CN202011005560.8A 2020-09-22 2020-09-22 Intelligent detection alarm method and device in intelligent scheduling Pending CN112181763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011005560.8A CN112181763A (en) 2020-09-22 2020-09-22 Intelligent detection alarm method and device in intelligent scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011005560.8A CN112181763A (en) 2020-09-22 2020-09-22 Intelligent detection alarm method and device in intelligent scheduling

Publications (1)

Publication Number Publication Date
CN112181763A true CN112181763A (en) 2021-01-05

Family

ID=73955856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011005560.8A Pending CN112181763A (en) 2020-09-22 2020-09-22 Intelligent detection alarm method and device in intelligent scheduling

Country Status (1)

Country Link
CN (1) CN112181763A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257200A (en) * 2017-07-14 2019-01-22 北京京东尚科信息技术有限公司 The method and apparatus of big data platform monitoring
CN109428779A (en) * 2017-08-29 2019-03-05 武汉安天信息技术有限责任公司 A kind of monitoring alarm method and device of distributed service
CN109684180A (en) * 2018-12-20 2019-04-26 北京百度网讯科技有限公司 Method and apparatus for output information
CN110289976A (en) * 2018-03-19 2019-09-27 上海秦苍信息科技有限公司 A kind of scheduler task warning system and method
WO2019233047A1 (en) * 2018-06-07 2019-12-12 国电南瑞科技股份有限公司 Power grid dispatching-based operation and maintenance method
CN110661659A (en) * 2019-09-23 2020-01-07 上海艾融软件股份有限公司 Alarm method, device and system and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257200A (en) * 2017-07-14 2019-01-22 北京京东尚科信息技术有限公司 The method and apparatus of big data platform monitoring
CN109428779A (en) * 2017-08-29 2019-03-05 武汉安天信息技术有限责任公司 A kind of monitoring alarm method and device of distributed service
CN110289976A (en) * 2018-03-19 2019-09-27 上海秦苍信息科技有限公司 A kind of scheduler task warning system and method
WO2019233047A1 (en) * 2018-06-07 2019-12-12 国电南瑞科技股份有限公司 Power grid dispatching-based operation and maintenance method
CN109684180A (en) * 2018-12-20 2019-04-26 北京百度网讯科技有限公司 Method and apparatus for output information
CN110661659A (en) * 2019-09-23 2020-01-07 上海艾融软件股份有限公司 Alarm method, device and system and electronic equipment

Similar Documents

Publication Publication Date Title
CN105357038B (en) Monitor the method and system of cluster virtual machine
CN111049705B (en) Method and device for monitoring distributed storage system
CN109257200B (en) Method and device for monitoring big data platform
CN105573824B (en) Monitoring method and system for distributed computing system
CN111786886B (en) Message processing method, device and system, electronic equipment and storage medium
CN111190888A (en) Method and device for managing graph database cluster
CN113900834B (en) Data processing method, device, equipment and storage medium based on Internet of things technology
CN112650576A (en) Resource scheduling method, device, equipment, storage medium and computer program product
CN111026572A (en) Fault processing method and device of distributed system and electronic equipment
US10331484B2 (en) Distributed data platform resource allocator
CN112965799A (en) Task state prompting method and device, electronic equipment and medium
CN113254245A (en) Fault detection method and system for storage cluster
CN110912949B (en) Method and device for submitting sites
CN112181763A (en) Intelligent detection alarm method and device in intelligent scheduling
CN105607983A (en) Data exception monitoring method and apparatus
CN113762910B (en) Document monitoring method and device
CN114490272A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN115170152A (en) Data distribution method, device, equipment and storage medium
CN113127158B (en) Method and device for executing data processing task
CN111597032B (en) Task scheduling management method and device and electronic equipment
CN114049065A (en) Data processing method, device and system
CN113282455A (en) Monitoring processing method and device
CN112131077A (en) Fault node positioning method and device and database cluster system
CN113656239A (en) Monitoring method and device for middleware and computer program product
US9092282B1 (en) Channel optimization in a messaging-middleware environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination