CN112416581A - Distributed calling system for timed tasks - Google Patents

Distributed calling system for timed tasks Download PDF

Info

Publication number
CN112416581A
CN112416581A CN202011273479.8A CN202011273479A CN112416581A CN 112416581 A CN112416581 A CN 112416581A CN 202011273479 A CN202011273479 A CN 202011273479A CN 112416581 A CN112416581 A CN 112416581A
Authority
CN
China
Prior art keywords
task
node
client
server
timing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011273479.8A
Other languages
Chinese (zh)
Other versions
CN112416581B (en
Inventor
肖向徐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wu Ba Tongcheng Information Technology Co ltd
Beijing 58 Information Technology Co Ltd
Original Assignee
Wu Ba Tongcheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wu Ba Tongcheng Information Technology Co ltd filed Critical Wu Ba Tongcheng Information Technology Co ltd
Priority to CN202011273479.8A priority Critical patent/CN112416581B/en
Publication of CN112416581A publication Critical patent/CN112416581A/en
Application granted granted Critical
Publication of CN112416581B publication Critical patent/CN112416581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/133Protocols for remote procedure calls [RPC]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a distributed calling system of a timing task, which comprises: the client cluster comprises a plurality of client nodes and is used for acquiring task registration information, sending the task registration information to the server node, sending heartbeat information and sending an operation instruction of a user to the server node; the server cluster comprises a plurality of server nodes and is used for creating a timing task; when the timing task triggering time is reached, determining a task execution node, and executing the timing task by the task execution node; providing a task execution monitoring interface, receiving an operation instruction of a client node and executing the operation of a timing task; the ZooKeeper cluster is used for storing the state information of the client node and the server node; and the MongoDB cluster is used for storing the task registration information and storing the identifiers of the task execution node and the server node corresponding to the timing task. The invention avoids single point of failure, saves local service resources and realizes the supervision of the task execution process.

Description

Distributed calling system for timed tasks
Technical Field
The invention relates to the technical field of computers, in particular to a distributed calling system for timed tasks.
Background
In the prior art, when configuring a timing task, the timing task is generally configured in a Spring note @ Scheduled manner, for example: the configured timing task is @ Scheduled (cron ═ 000/1.
Because the task execution process is invisible, the pre-perception cannot be controlled for the execution progress of the task and whether the execution is abnormally stopped or not; moreover, since the task is executed locally, that is, the execution of the task and the application are executed in the same JRE (Java Runtime Environment), the influence on the performance of the application itself cannot be controlled, and a situation that the application service performance is sharply reduced due to the abnormal timing task may occur in a large probability; in the timed task execution process, manual intervention cannot be performed on batch data processing; and because the tasks are created locally and executed locally and all the nodes are executed together, the cluster resources cannot be fully utilized.
In summary, the timing task configured by the Spring note @ Scheduled method has the following defects: single point of failure is easy to occur; consuming local service resources; the task execution process is difficult to supervise; the cluster resources cannot be fully utilized.
Disclosure of Invention
In view of the above, embodiments of the present invention have been developed to provide a distributed invocation system for timed tasks that overcomes, or at least partially addresses, the above-identified problems.
According to a first aspect of the present invention, there is provided a distributed invocation system for timed tasks, comprising:
the client cluster comprises a plurality of client nodes, wherein the client nodes are used for acquiring task registration information, sending the task registration information corresponding to the client cluster identifier to the server node, sending heartbeat information to the server node, receiving an operation instruction of a user on a timing task, and sending the operation instruction to the server node;
the server cluster comprises a plurality of server nodes and is used for receiving the task registration information, creating a timing task corresponding to the task registration information and storing the task registration information to the MongoDB cluster; determining the state information of the client node according to the heartbeat information, storing the state information and the heartbeat information of the client node to a ZooKeeper cluster, and storing the state information of the server node to the ZooKeeper cluster; when the timing task triggering time is reached, determining a server node or a client node of which the state information is in an available state as a task execution node, sending the timing task to the task execution node, and executing the timing task by the task execution node; providing a task execution monitoring interface, receiving an operation instruction of a client node through the task execution monitoring interface, and executing the operation of a timing task according to the operation instruction;
the ZooKeeper cluster is used for storing the state information and the heartbeat information of a plurality of client nodes in the client cluster and storing the state information of a plurality of server nodes;
and the MongoDB cluster is used for storing the task registration information and storing the identifiers of the task execution node and the server node corresponding to the timing task.
Optionally, the task execution monitoring interface includes a Console visual operation platform and/or a monitoring application programming interface API, and the server node is configured to execute an operation corresponding to the operation instruction according to the operation instruction received through the Console visual operation platform or the monitoring API.
Optionally, the operation corresponding to the operation instruction includes at least one of creating a timing task, triggering execution of the timing task, suspending execution of the timing task, restarting the timing task, deleting the timing task, displaying an analysis report of the timing task, acquiring and displaying an execution progress of the timing task, and acquiring and displaying a performance index of a task execution node where the timing task is located.
Optionally, the Console is further configured to alarm when an abnormal problem is detected during the execution of the timing task.
Optionally, the timing task includes a local timing task or a remote timing task;
the step of determining that the state information is the available state, which is taken as a task execution node, includes:
if the timing task is determined to be a local timing task and executed for the appointed client node according to the task registration information, determining the appointed client node to be a task execution node;
if the timing task is determined to be a local timing task and executed for the server node according to the task registration information, determining one server node with available state information from the plurality of server nodes as a task execution node;
and if the timed task is determined to be the remote timed task according to the task registration information, determining a client cluster identifier corresponding to the timed task, and determining a client node with available state information from a plurality of client nodes corresponding to the client cluster identifier as a task execution node.
Optionally, the ZooKeeper cluster is further configured to: when a server node is in an unavailable state, taking the server node as a first server node, and triggering a redistribution instruction of a timing task in the first server node;
the other server nodes in the server cluster are further configured to: and if a redistribution instruction of the timing task in the first server node triggered by the ZooKeeper cluster is received, determining a remote timing task corresponding to the first server node from the MongoDB cluster, recreating the timing task, and scheduling the recreated remote timing task.
Optionally, the server node is further configured to:
when the appointed client node is used as a task execution node, if the task execution node is determined to be in an unavailable state according to the heartbeat information, marking the execution progress of the timing task, and when the task execution node recovers the available state, sending the execution progress of the timing task to the task execution node, and starting the execution of the timing task by the task execution node from the execution progress.
Optionally, the determining, as a task execution node, a client node whose state information is an available state from among a plurality of client nodes corresponding to the client cluster identifier includes:
determining at least one client node with available state information from a plurality of client nodes corresponding to the client cluster identification;
and determining one client node from the at least one client node as the task execution node according to the load balancing strategy.
Optionally, the load balancing policy includes: the method comprises the steps that the times of executing timing tasks by a client node, the performance index of the client node and a machine room where the client node is located are determined according to heartbeat information of the client node stored in a ZooKeeper cluster.
Optionally, the client node and the server node in the server cluster communicate with each other through Netty.
Optionally, when the client node is used as a task execution node, the client node is further configured to: when the task related logs in the local cache reach a preset threshold value, the task related logs are sent to the server node;
the server node is further configured to: saving the task related logs to the MongoDB cluster;
the MongoDB cluster is also used for: and saving the task related log.
Optionally, the acquiring task registration information includes: and acquiring task registration information in a dynamic code, configuration file or annotation mode.
The distributed calling system of the timing task obtains task registration information through a client cluster, sends the task registration information to a server node, creates the timing task by the server cluster, stores the task registration information to a MongoDB cluster, also stores heartbeat information and state information of the client node to a ZooKeeper cluster, schedules the timing task through the information of the nodes in the ZooKeeper cluster, selects a proper task execution node to execute the timing task, and provides a task execution monitoring interface, receives an operation instruction of the client node through the task execution monitoring interface, and executes the operation on the timing task according to the operation instruction. The timing tasks are uniformly scheduled through the server cluster, the client cluster is not required to schedule, distributed scheduling of the timing tasks is achieved, appropriate nodes can be selected to execute when the tasks are scheduled, the situation that all the nodes execute the same task at the same time is avoided, therefore, local service resources are saved, cluster resources can be fully utilized, single-point faults are avoided, and the task execution process can be supervised through the task execution monitoring interface.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
Fig. 1 is a schematic structural diagram of a distributed invocation system for timed tasks according to an embodiment of the present invention;
fig. 2 is a flow chart of server node and client node communication in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a schematic structural diagram of a distributed invocation system of a timing task according to an embodiment of the present invention, and as shown in fig. 1, the distributed invocation system of the timing task includes:
the client cluster 101 comprises a plurality of client nodes, wherein the client nodes are used for acquiring task registration information, sending the task registration information corresponding to the client cluster identifier to the server node, sending heartbeat information to the server node, receiving an operation instruction of a user on a timing task, and sending the operation instruction to the server node;
the server cluster 102 comprises a plurality of server nodes and is used for receiving the task registration information, creating a timing task corresponding to the task registration information and storing the task registration information to the MongoDB cluster; determining the state information of the client node according to the heartbeat information, storing the state information and the heartbeat information of the client node to a ZooKeeper cluster, and storing the state information of the server node to the ZooKeeper cluster; when the timing task triggering time is reached, determining a server node or a client node of which the state information is in an available state as a task execution node, sending the timing task to the task execution node, and executing the timing task by the task execution node; providing a task execution monitoring interface, receiving an operation instruction of a client node through the task execution monitoring interface, and executing the operation of a timing task according to the operation instruction;
the ZooKeeper cluster 103 is used for storing the state information and heartbeat information of a plurality of client nodes in the client cluster and storing the state information of a plurality of server nodes;
and the MongoDB cluster 104 is used for storing the task registration information and storing the identifiers of the task execution node and the server node corresponding to the timing task.
The distributed calling system of the timing task can comprise at least one client cluster and one server cluster, so that the server cluster can simultaneously provide registration and scheduling service of the timing task for a plurality of client clusters.
The server node monitors the heartbeat information of the client node and the state information of the current server node through the Monitor, and the Monitor monitoring can be realized by using scheduledThreadPooleExecutor. The state information of the server node may include an identification of the server node (e.g., an IP address), performance, time-to-live, execution plan, etc. The heartbeat information of the client node may include information such as an identifier (e.g., an IP address) of the client node, a task class path, a port number, an IP address, a client cluster identifier, a memory usage rate, a CPU usage rate, and a bus thread count.
When the server node is monitored by the Monitor, the Monitor service needs to be registered firstly, the information such as the memory utilization rate, the CPU utilization rate, the bus program number and the like of the server node is updated, the timing task under the client node in the unavailable state can be recovered, and when no server node in the available state (survival) exists in the ZooKeeper cluster, the administrator can be notified in a mail mode.
The ZooKeeper cluster is mainly used for storing the starting registration information of the server nodes and the client nodes, namely storing the state information and the heartbeat information of a plurality of client nodes in each client cluster, and storing the state information of a plurality of server nodes. The ZooKeeper is a distributed and open-source distributed application program coordination service, is an open-source implementation of Chubby of Google, and is an important component of Hadoop and Hbase. The ZooKeeper cluster uses the curater as a ZooKeeper client framework, and can solve the problem of detail development work of the ZooKeeper client very bottom layer, including connection reconnection, repeated registration of Watcher and NodeExistsException exception and the like. The Curator is used for monitoring the change of the ZK (zookeeper) data packet and storing the change into the memory, so that the system can conveniently and quickly acquire the information of the server node and the client node.
The MongoDB cluster is mainly used for storing relevant information of the timing task, including task registration information and identifications of task execution nodes and server nodes corresponding to the timing task. The task registration information includes an execution plan of a timing task, task snapshot information and the like, and the task snapshot information includes execution time, execution rules, classes which need to be inherited during execution and the like. The server node corresponding to the timing task is a server node for scheduling the timing task.
The server node is also used for scheduling the timing task, namely receiving task registration information sent by the client node, creating the timing task, storing the task registration information into the MongoDB cluster, monitoring the trigger time of each timing task in the MongoDB cluster, determining the execution node of the timing task as the server node or the client node when the trigger time of the timing task is reached, and determining the server node or the client node in an available state from the information of the server node and the client node stored in the ZooKeeper cluster to be used as the task execution node for executing the timing task. The server node uses Quartz as a timing task scheduling framework, and can perform good scheduling on various different execution rules of the timing task. Quartz is an open source job scheduling framework written entirely in Java, providing a simple yet powerful mechanism for job scheduling in Java applications.
And the client node and the server nodes in the server cluster communicate through Netty.
Fig. 2 is a flow chart of the communication between the server node and the client node in the embodiment of the present invention, and as shown in fig. 2, the flow of the communication between the server node and the client node is as follows:
first, the server node opens the service. And starting a scheduler of the timing task, registering the server node information into the ZooKeeper cluster, starting Netty and initializing local information.
And then, the server node starts the Monitor service. The Monitor service is used for monitoring the basic information and the state information of the current server node in each heartbeat period, triggering a timing task recovery strategy when the timing task encounters an abnormal problem, and starting an early warning and alarming strategy. Wherein, the heartbeat cycle can be set according to the requirement, such as 1 minute.
And then, carrying out Netty communication between the client node and the server node, registering client node information, registering the client node information by the server node, and returning the information of successful registration after successful registration. When the server node registers the client node information, firstly, TOKEN authentication (TOKEN) is carried out on the client node, the client node information (including information such as the identification of the client node, the identification of a client cluster, the memory utilization rate, the CPU utilization rate, the number of bus threads and the like) is registered in the ZooKeeper cluster, Monitor monitoring of the client node is started, local information is registered, and Netty communication with the client node is started.
After that, task mode creation of the timing task is performed. The task mode comprises a local mode and a remote mode, the server node creates a local mode task, the server node sends a remote mode task editing instruction to the client node, the client node returns a receiving success, the client node initiates execution of remote mode editing, and the server node returns a receiving success, so that the creation of the remote mode is realized.
And then, the client node dynamically creates a timing task, sends task registration information of the timing task to the server node, and the server node creates the timing task, stores related information and returns the success of task creation after the success of timing task creation.
And finally, the server node schedules the timed task, initiates an execution instruction of the timed task to the client node, executes the timed task by the client node, and returns the task execution success after the execution is successful.
When the client node and the server node communicate, the IP address of the communicating server node can be obtained as follows: 1) after each communication is successful, the IP address of the server node of the next communication is returned from the server node and stored in the local cache; 2) and if the IP address of the server node does not exist in the local cache, acquiring the IP address of the server node from the unified configuration center.
Through Netty communication, a network server and a client program with high performance and high reliability can be rapidly developed, and automatic discovery and registration of tasks are supported. During the Netty communication between the server node and the client node, the communication message can be compressed by Snappy, so that the transmission bandwidth of network data can be reduced.
In an embodiment of the present invention, the acquiring task registration information includes: and acquiring task registration information in a dynamic code, configuration file or annotation mode.
When the client node acquires the task registration information, the task registration information can be acquired through a dynamic code, a configuration file or an annotation mode. The tasks can be registered in real time according to the service requirements through the dynamic codes. Declarative registration tasks can be performed through configuration files, such as the SimpleJob class can be inherited when defining tasks, and execution (execute) methods can be rewritten. The tasks can be marked by registering the tasks in an Annotation mode, for example, the tasks can comprise a Spring tag type declaration task and a Spring annotated declaration task, namely, the timed tasks can be registered by redefining an xsd tag of Spring and an Annotation mode.
In an embodiment of the present invention, the task execution monitoring interface includes a Console visual operation platform and/or a monitoring API, and the server node is configured to execute an operation corresponding to the operation instruction according to the operation instruction received through the Console visual operation platform or the monitoring API.
The monitoring API (Application Programming Interface) is mainly used for implementing task intervention operation, that is, according to an operation instruction sent by the client, an operation corresponding to the operation instruction is executed on a corresponding timing task. The server nodes in the server cluster comprise other core APIs besides the monitoring API, and the client nodes access the server nodes through the other core APIs, so that the server nodes can perform service discovery, node registration, task registration and the like through the core APIs.
The Console visual operation platform is mainly used for monitoring tasks and realizing task intervention operation, and can also be used for giving an alarm when abnormal problems are detected in the execution process of timed tasks. The Console visual operation platform can acquire and display the task execution progress, can also display the server performance index of the server node corresponding to the timing task, and can give an alarm if an abnormal problem (such as overlong execution time, abnormal occurrence during execution, and the like) is detected in the execution process of the timing task, so that a user can manually intervene in the task execution, namely, an operation instruction for restarting the timing task, deleting the timing task, and the like can be given.
The operation corresponding to the operation instruction comprises at least one of creating a timing task, triggering the execution of the timing task, suspending the execution of the timing task, restarting the timing task, deleting the timing task, displaying a timing task analysis report, acquiring and displaying the execution progress of the timing task and acquiring and displaying the performance index of a task execution node where the timing task is located. The method comprises the steps that an operation instruction of a user can be obtained through a Console visual operation platform and/or a monitoring API, the obtained operation instruction can be at least one of operation instructions of creating a timing task, triggering execution of the timing task, suspending execution of the timing task, restarting the timing task, deleting the timing task, displaying a timing task analysis report, obtaining and displaying execution progress of the timing task and obtaining and displaying performance indexes of a task execution node where the timing task is located, and therefore corresponding operation can be executed on the timing task according to the operation instruction. The operation corresponding to the operation instruction can also comprise report statistics of the timing task, and multidimensional task report statistics can be performed, such as the report of the timing task can be counted according to hours, days, months, execution states, client cluster identifiers and the like.
The monitoring of the timing task is realized through the Console visual operation platform and/or the monitoring application programming interface API, and meanwhile, the intervention operation can be carried out on the timing task.
In one embodiment of the invention, the timing task comprises a local timing task or a remote timing task;
the step of determining that the state information is the available state, which is taken as a task execution node, includes:
if the timing task is determined to be a local timing task and executed for the appointed client node according to the task registration information, determining the appointed client node to be a task execution node;
if the timing task is determined to be a local timing task and executed for the server node according to the task registration information, determining one server node with available state information from the plurality of server nodes as a task execution node;
and if the timed task is determined to be the remote timed task according to the task registration information, determining a client cluster identifier corresponding to the timed task, and determining a client node with available state information from a plurality of client nodes corresponding to the client cluster identifier as a task execution node.
The timed tasks to be registered by the client node may comprise Local (Local) timed tasks and/or Remote timed tasks (Remote), while the Local timed tasks may comprise tasks Local to the client node or Local to the server node, and the specified client node is included in the task registration information when the Local timed task is a client node Local task, i.e. the specified client node acts as a task execution node.
When the timing task is determined to be a local timing task and is executed for a designated client node according to the task registration information, the designated client node can be determined to be a task execution node; when the timing task is determined to be a local timing task and is executed by a server node according to the task registration information, state information of all server nodes in the server cluster is obtained from the ZooKeeper cluster, one server node is determined from the server nodes of which the state information is in an available state, and the server node is used as a task execution node; when the timed task is determined to be the remote timed task according to the task registration information, determining a client cluster identifier (used for identifying different users) corresponding to the timed task, acquiring the state information of all client nodes corresponding to the client cluster identifier from the ZooKeeper cluster, determining one client node from the client nodes of which the state information is in the available state, and taking the client node as a task execution node. The multi-system distributed task scheduling of different task modes is realized by corresponding the timing tasks of different task modes to different scheduling modes.
On the basis of the above technical solution, the ZooKeeper cluster is further configured to: when a server node is in an unavailable state, taking the server node as a first server node, and triggering a redistribution instruction of a timing task in the first server node;
the other server nodes in the server cluster are further configured to: and if a redistribution instruction of the timing task in the first server node triggered by the ZooKeeper cluster is received, determining a remote timing task corresponding to the first server node from the MongoDB cluster, recreating the timing task, and scheduling the recreated remote timing task.
The ZooKeeper cluster is used for storing state information of each server node and each client node, determining that the server node is in an unavailable state after the ZooKeeper cluster does not receive heartbeat information of the server node in a preset heartbeat period, taking the server node as a first server node, and triggering a redistribution instruction of a timing task of the first server node by the ZooKeeper cluster at this moment. If other server nodes except the first server node in the server cluster receive a redistribution instruction of a timing task in the first server node triggered by the ZooKeeper cluster, at this time, the remote timing task corresponding to the first server node is determined from the MongoDB cluster, the remote timing tasks are created again, and the newly created remote timing tasks are scheduled, namely when one server node is in an unavailable state, the timing tasks on the server nodes in the unavailable state can be scheduled by other server nodes, so that the problem that the timing tasks cannot be executed because the scheduled timing tasks cannot be scheduled when the server nodes are in the unavailable state is solved.
The MongoDB cluster can also store the execution progress of each timing task, so that the server node which reschedules the timing task can acquire the execution progress of all the timing tasks in the unavailable server nodes from the MongoDB cluster and schedule each timing task from the execution progress.
On the basis of the above technical solution, the server node is further configured to:
when the appointed client node is used as a task execution node, if the task execution node is determined to be in an unavailable state according to the heartbeat information, marking the execution progress of the timing task, and when the task execution node recovers the available state, sending the execution progress of the timing task to the task execution node, and starting the execution of the timing task by the task execution node from the execution progress.
When the timed task is a local timed task and a designated client node (the IP address of the designated client node can be used for identifying the designated client node) is used as a task execution node, the server node determines that the task execution node is in an unavailable state when the heartbeat information of the task execution node is not received after a preset heartbeat period, the execution progress of the timed task is marked, if the task execution node is restarted, the server node receives the heartbeat information of the task execution node again and determines that the task execution node recovers the available state, the execution progress of the timed task is sent to the task execution node, and the task execution node can continue to execute the timed task from the execution progress. The execution progress of the timing task is marked when the task execution node is unavailable, and the task execution node is enabled to continue executing the timing task from the execution progress when the task execution node is restored to the available state, so that the timing task can be continued in time when the task execution node is restored to the available state, and resource consumption caused by repeated execution of the timing task is avoided.
On the basis of the above technical solution, the determining, as a task execution node, a client node whose state information is an available state from among a plurality of client nodes corresponding to the client cluster identifier includes:
determining at least one client node with available state information from a plurality of client nodes corresponding to the client cluster identification;
and determining one client node from the at least one client node as the task execution node according to the load balancing strategy.
Wherein the load balancing policy may include: the method comprises the steps that the times of executing timing tasks by a client node, the performance index of the client node and a machine room where the client node is located are determined according to heartbeat information of the client node stored in a ZooKeeper cluster. The heartbeat information of the client node may include information such as an identifier (e.g., an IP address) of the client node, a task class path, a port number, an IP address, a client cluster identifier, a memory usage rate, a CPU usage rate, and a bus thread count. The performance metrics of the client node may include memory usage, CPU usage, number of bus threads, and the like
When the timing task is a remote timing task, after a client cluster identifier corresponding to the timing task is determined from task registration information stored by a MongoDB cluster, a plurality of client nodes corresponding to the client cluster identifier are determined, state information of the plurality of client nodes is obtained from a ZooKeeper cluster, at least one client node with available state information is determined, one client node is determined from the at least one client node according to a load balancing strategy and serves as a task execution node, and load balancing during scheduling of the timing task is achieved.
When the client node is determined according to the load balancing policy, the number of times that each client node executes the timing task, the performance index of the client node, and the machine room in which the client node is located may be comprehensively considered, that is, the number of times that each client node executes the timing task is the smallest, the performance index of the client node is the optimal, and the performance of the machine room in which the client node is located is the optimal (for example, a client cluster is distributed in two machine rooms, the performance of the client node executing the task in one machine room is better, and the performance of the client node in the other machine room is poorer, and then the client node in the machine room with the better performance is preferentially selected), and these items are comprehensively considered for load.
In one embodiment of the invention, the client node, when acting as a task execution node, is further configured to: when the task related logs in the local cache reach a preset threshold value, the task related logs are sent to the server node;
the server node is further configured to: saving the task related logs to the MongoDB cluster;
the MongoDB cluster is also used for: and saving the task related log.
The method comprises the steps that when a client node serves as a task execution node to execute a timed task, task-related logs are generated and stored in a local cache, if the task-related logs in the local cache reach a preset threshold (for example, the preset threshold is 1000, namely the number of the task-related logs reaches 1000), the task-related logs are sent to a server node, and after the server node receives the task-related logs sent by the client node, the task-related logs are stored in a MongoDB cluster. The MongoDB cluster is also used for storing task related logs, so that the execution process condition of each timing task can be inquired through the task related logs in the MongoDB cluster.
In the embodiment of the invention, the state information and the heartbeat information of the client node are stored through the ZooKeeper cluster, the state information of the server node is stored, the task registration information is stored through the MongoDB cluster, the task execution nodes corresponding to the timing task and the identifiers of the server node are stored, and the log related to the task can also be stored, so that a dual-backup strategy is provided, when the ZooKeeper cluster is unavailable, the condition of the timing task at each task execution node can be obtained from the MongoDB cluster to determine the state information of the client node, and the available client node can be selected according to the state information to schedule the timing task. Since the timing tasks need to be executed periodically, when the timing tasks are executed periodically, the overlapping can be prevented from being executed according to the state of each timing task of the MongoDB cluster.
In an exemplary embodiment, the distributed call system of the timed task may be applied to a house renting service, in the house renting service, a house source may be marked through a house task or a payment guarantee deposit, so that a user may be confident in selecting a house, and sometimes a refund operation needs to be performed on the user who pays the guarantee deposit according to a state of the house source (a non-online display state).
The distributed scheduling system for a timed task provided in this embodiment obtains task registration information through a client cluster, and sends the task registration information to a server node, where the server cluster creates a timed task, stores the task registration information to a MongoDB cluster, and also stores heartbeat information and state information of the client node to a ZooKeeper cluster, schedules the timed task through information of nodes in the ZooKeeper cluster, selects an appropriate task execution node to execute the timed task, and the server node further provides a task execution monitoring interface, receives an operation instruction of the client node through the task execution monitoring interface, and executes an operation on the timed task according to the operation instruction. The timing tasks are uniformly scheduled through the server cluster, the client cluster is not required to schedule, distributed scheduling of the timing tasks is achieved, appropriate nodes can be selected to execute when the tasks are scheduled, the situation that all the nodes execute the same task at the same time is avoided, therefore, local service resources are saved, cluster resources can be fully utilized, single-point faults are avoided, and the task execution process can be supervised through the task execution monitoring interface.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above detailed description is given to the distributed call system for timing task provided by the present invention, and a specific example is applied in this document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A distributed invocation system for timed tasks, comprising:
the client cluster comprises a plurality of client nodes, wherein the client nodes are used for acquiring task registration information, sending the task registration information corresponding to the client cluster identifier to the server node, sending heartbeat information to the server node, receiving an operation instruction of a user on a timing task, and sending the operation instruction to the server node;
the server cluster comprises a plurality of server nodes and is used for receiving the task registration information, creating a timing task corresponding to the task registration information and storing the task registration information to the MongoDB cluster; determining the state information of the client node according to the heartbeat information, storing the state information and the heartbeat information of the client node to a ZooKeeper cluster, and storing the state information of the server node to the ZooKeeper cluster; when the timing task triggering time is reached, determining a server node or a client node of which the state information is in an available state as a task execution node, sending the timing task to the task execution node, and executing the timing task by the task execution node; providing a task execution monitoring interface, receiving an operation instruction of a client node through the task execution monitoring interface, and executing the operation of a timing task according to the operation instruction;
the ZooKeeper cluster is used for storing the state information and the heartbeat information of a plurality of client nodes in the client cluster and storing the state information of a plurality of server nodes;
and the MongoDB cluster is used for storing the task registration information and storing the identifiers of the task execution node and the server node corresponding to the timing task.
2. The system according to claim 1, wherein the task execution monitoring interface includes a Console visual operation platform and/or a monitoring application programming interface API, and the server node is configured to execute an operation corresponding to the operation instruction according to the operation instruction received through the Console visual operation platform or the monitoring API.
3. The system according to claim 2, wherein the operation corresponding to the operation instruction includes at least one of creating a timing task, triggering execution of the timing task, suspending execution of the timing task, restarting the timing task, deleting the timing task, displaying an analysis report of the timing task, acquiring and displaying an execution progress of the timing task, and acquiring and displaying a performance index of a task execution node where the timing task is located.
4. The system of claim 2, wherein the Console visualization operating platform is further configured to alert when an abnormal problem is detected during the execution of the timed task.
5. The system of claim 1, wherein the timed task comprises a local timed task or a remote timed task;
the step of determining that the state information is the available state, which is taken as a task execution node, includes:
if the timing task is determined to be a local timing task and executed for the appointed client node according to the task registration information, determining the appointed client node to be a task execution node;
if the timing task is determined to be a local timing task and executed for the server node according to the task registration information, determining one server node with available state information from the plurality of server nodes as a task execution node;
and if the timed task is determined to be the remote timed task according to the task registration information, determining a client cluster identifier corresponding to the timed task, and determining a client node with available state information from a plurality of client nodes corresponding to the client cluster identifier as a task execution node.
6. The system of claim 5, wherein the ZooKeeper cluster is further configured to: when a server node is in an unavailable state, taking the server node as a first server node, and triggering a redistribution instruction of a timing task in the first server node;
the other server nodes in the server cluster are further configured to: and if a redistribution instruction of the timing task in the first server node triggered by the ZooKeeper cluster is received, determining a remote timing task corresponding to the first server node from the MongoDB cluster, recreating the timing task, and scheduling the recreated remote timing task.
7. The system of claim 5, wherein the server node is further configured to:
when the appointed client node is used as a task execution node, if the task execution node is determined to be in an unavailable state according to the heartbeat information, marking the execution progress of the timing task, and when the task execution node recovers the available state, sending the execution progress of the timing task to the task execution node, and starting the execution of the timing task by the task execution node from the execution progress.
8. The system according to claim 5, wherein said determining a client node with available state information from a plurality of client nodes corresponding to said client cluster identifier as a task execution node comprises:
determining at least one client node with available state information from a plurality of client nodes corresponding to the client cluster identification;
and determining one client node from the at least one client node as the task execution node according to the load balancing strategy.
9. The system of claim 8, wherein the load balancing policy comprises: the method comprises the steps that the times of executing timing tasks by a client node, the performance index of the client node and a machine room where the client node is located are determined according to heartbeat information of the client node stored in a ZooKeeper cluster.
10. The system of claim 1, wherein the client nodes and server nodes in the server cluster communicate via Netty.
11. The system of claim 1, wherein the client node, when acting as a task execution node, is further configured to: when the task related logs in the local cache reach a preset threshold value, the task related logs are sent to the server node;
the server node is further configured to: saving the task related logs to the MongoDB cluster;
the MongoDB cluster is also used for: and saving the task related log.
12. The system of claim 1, wherein the obtaining task registration information comprises: and acquiring task registration information in a dynamic code, configuration file or annotation mode.
CN202011273479.8A 2020-11-13 2020-11-13 Distributed calling system for timed tasks Active CN112416581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011273479.8A CN112416581B (en) 2020-11-13 2020-11-13 Distributed calling system for timed tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011273479.8A CN112416581B (en) 2020-11-13 2020-11-13 Distributed calling system for timed tasks

Publications (2)

Publication Number Publication Date
CN112416581A true CN112416581A (en) 2021-02-26
CN112416581B CN112416581B (en) 2022-02-18

Family

ID=74831269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011273479.8A Active CN112416581B (en) 2020-11-13 2020-11-13 Distributed calling system for timed tasks

Country Status (1)

Country Link
CN (1) CN112416581B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597252A (en) * 2021-03-04 2021-04-02 全时云商务服务股份有限公司 MongoDB server access method and system
CN113190274A (en) * 2021-05-08 2021-07-30 杭州网易云音乐科技有限公司 Node processing method and system, node, medium and computing device
CN113342492A (en) * 2021-06-08 2021-09-03 杭州遥望网络科技有限公司 Task instruction issuing method, device, system, electronic equipment and medium
CN113687932A (en) * 2021-08-30 2021-11-23 上海商汤科技开发有限公司 Task scheduling method, device and system, electronic equipment and storage medium
CN113742063A (en) * 2021-08-06 2021-12-03 天津中新智冠信息技术有限公司 Distributed timing scheduling system and method
CN114244794A (en) * 2021-12-13 2022-03-25 统信软件技术有限公司 Timed task pushing method, computing device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197959A1 (en) * 2011-01-28 2012-08-02 Oracle International Corporation Processing pattern framework for dispatching and executing tasks in a distributed computing grid
CN107092521A (en) * 2016-12-30 2017-08-25 北京小度信息科技有限公司 A kind of distributed task dispatching method, apparatus and system
CN109831520A (en) * 2019-03-07 2019-05-31 网宿科技股份有限公司 A kind of timed task dispatching method and relevant apparatus
CN110213213A (en) * 2018-05-30 2019-09-06 腾讯科技(深圳)有限公司 The timed task processing method and system of application
CN111338774A (en) * 2020-02-21 2020-06-26 华云数据有限公司 Distributed timing task scheduling system and computing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197959A1 (en) * 2011-01-28 2012-08-02 Oracle International Corporation Processing pattern framework for dispatching and executing tasks in a distributed computing grid
CN107092521A (en) * 2016-12-30 2017-08-25 北京小度信息科技有限公司 A kind of distributed task dispatching method, apparatus and system
CN110213213A (en) * 2018-05-30 2019-09-06 腾讯科技(深圳)有限公司 The timed task processing method and system of application
CN109831520A (en) * 2019-03-07 2019-05-31 网宿科技股份有限公司 A kind of timed task dispatching method and relevant apparatus
CN111338774A (en) * 2020-02-21 2020-06-26 华云数据有限公司 Distributed timing task scheduling system and computing device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597252A (en) * 2021-03-04 2021-04-02 全时云商务服务股份有限公司 MongoDB server access method and system
CN113190274A (en) * 2021-05-08 2021-07-30 杭州网易云音乐科技有限公司 Node processing method and system, node, medium and computing device
CN113342492A (en) * 2021-06-08 2021-09-03 杭州遥望网络科技有限公司 Task instruction issuing method, device, system, electronic equipment and medium
CN113742063A (en) * 2021-08-06 2021-12-03 天津中新智冠信息技术有限公司 Distributed timing scheduling system and method
CN113687932A (en) * 2021-08-30 2021-11-23 上海商汤科技开发有限公司 Task scheduling method, device and system, electronic equipment and storage medium
CN114244794A (en) * 2021-12-13 2022-03-25 统信软件技术有限公司 Timed task pushing method, computing device and readable storage medium
CN114244794B (en) * 2021-12-13 2024-01-26 统信软件技术有限公司 Timing task pushing method, computing device and readable storage medium

Also Published As

Publication number Publication date
CN112416581B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN112416581B (en) Distributed calling system for timed tasks
US10348809B2 (en) Naming of distributed business transactions
CN109656782A (en) Visual scheduling monitoring method, device and server
CN111552556B (en) GPU cluster service management system and method
US11546233B2 (en) Virtual network function bus-based auto-registration
US20200186594A1 (en) Rule-based action triggering in a provider network
CN107451147A (en) A kind of method and apparatus of kafka clusters switching at runtime
CN111026602A (en) Health inspection scheduling management method and device of cloud platform and electronic equipment
US11221943B2 (en) Creating an intelligent testing queue for improved quality assurance testing of microservices
CN111324423A (en) Method and device for monitoring processes in container, storage medium and computer equipment
CN113422692A (en) Method, device and storage medium for detecting and processing node faults in K8s cluster
CN111190732A (en) Timed task processing system and method, storage medium and electronic device
CN113312153A (en) Cluster deployment method and device, electronic equipment and storage medium
CN112688816B (en) Rule-based action triggering method and system in provider network
US10122602B1 (en) Distributed system infrastructure testing
CN111611057B (en) Distributed retry method, device, electronic equipment and storage medium
US11108638B1 (en) Health monitoring of automatically deployed and managed network pipelines
CN114090198A (en) Distributed task scheduling method and device, electronic equipment and storage medium
CN108243205B (en) Method, equipment and system for controlling resource allocation of cloud platform
CN106550002B (en) paas cloud hosting system and method
EP3721604B1 (en) Automatic subscription management of computing services
CN115373886A (en) Service group container shutdown method, device, computer equipment and storage medium
CN111176959B (en) Early warning method, system and storage medium of cross-domain application server
CN111163117B (en) Zookeeper-based peer-to-peer scheduling method and device
CN109995617A (en) Automated testing method, device, equipment and the storage medium of Host Administration characteristic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant