WO2018099067A1 - 分布式任务调度方法和系统 - Google Patents

分布式任务调度方法和系统 Download PDF

Info

Publication number
WO2018099067A1
WO2018099067A1 PCT/CN2017/091101 CN2017091101W WO2018099067A1 WO 2018099067 A1 WO2018099067 A1 WO 2018099067A1 CN 2017091101 W CN2017091101 W CN 2017091101W WO 2018099067 A1 WO2018099067 A1 WO 2018099067A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
intermediate server
application server
job
application
Prior art date
Application number
PCT/CN2017/091101
Other languages
English (en)
French (fr)
Inventor
熊杰
Original Assignee
上海壹账通金融科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海壹账通金融科技有限公司 filed Critical 上海壹账通金融科技有限公司
Publication of WO2018099067A1 publication Critical patent/WO2018099067A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/62Establishing a time schedule for servicing the requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/54Presence management, e.g. monitoring or registration for receipt of user log-on information, or the connection status of the users
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Definitions

  • the present invention relates to the field of computer processing, and in particular, to a distributed task scheduling method and system.
  • Distributed task scheduling refers to the processing of multiple tasks divided by an application or service by multiple servers. Distributed task scheduling can solve the problem of insufficient resources of a single server and achieve large-scale data calculation.
  • Quartz, Zookeeper and other technologies can achieve distributed task scheduling.
  • Quartz's clustering method utilizes the lock of the database. At the same time, only one server can obtain the lock execution task at the same time. Without the advantage of the cluster, the processing speed is relatively slow.
  • ZooKeeper is a distributed application coordination service that distributes data to multiple application servers to perform processing and parallel execution, thereby improving processing speed.
  • a distributed task scheduling method and system is provided.
  • a distributed task scheduling system comprising:
  • An application server where the application server has multiple ports for establishing a TCP connection with the intermediate server, and registering the IP address and the job information to the intermediate server;
  • An intermediate server configured to manage an application server, obtain an IP address and job information registered by the application server, and divide an application server having the same job information into a Job group, where the Job group is A plurality of application servers are selected to be a leader application server, and the corresponding task configuration information and the IP addresses of the application servers in the job group are delivered to the leader application server;
  • the leader application server is configured to split the task according to the number of fragments in the task configuration information, and allocate the split sub-task according to the IP address of each application server in the Job group;
  • the application server is further configured to execute the assigned subtask
  • the distributed coordination server is configured to deploy Zookeeper, establish a connection with the intermediate server, and coordinate the intermediate server by the Zookeeper.
  • a distributed task scheduling method includes:
  • the application server initiates a TCP connection request to the intermediate server, and the intermediate server establishes a TCP connection with the application server according to the TCP connection request;
  • the application server registers an IP address and job information of the application server with the intermediate server;
  • the intermediate server acquires an IP address and job information registered by the application server, divides an application server having the same job information into the same job group, and selects a leader application among multiple application servers in the job group.
  • the server sends the task configuration information corresponding to the job information and the IP address of each application server in the Job group to the leader application server;
  • the leader application server splits the task according to the number of fragments in the task configuration information, and allocates the split sub-task according to the IP address of each application server in the Job group.
  • 1 is an architectural diagram of a distributed task scheduling system in an embodiment
  • FIG. 2 is a directory structure diagram of Zookeeper in an embodiment
  • 3 is an architectural diagram of a distributed task scheduling system in another embodiment
  • FIG. 5 is a flow chart of a distributed task scheduling method in another embodiment.
  • a distributed task scheduling system which includes: an application server 102, an intermediate server 104, and a distributed coordination server 106;
  • the application server 102 has a plurality of application servers for establishing a TCP connection with the intermediate server, and registering the IP address and the job information to the intermediate server.
  • the application server is a server that performs tasks specifically. Because it is directly interacting with the client, it is also called a “client application server”.
  • the IP address and job information of the application server 102 are registered to the intermediate server 106.
  • the job information includes the task configuration information, the task identifier, the task execution time, and the like.
  • the task identifier is used to uniquely identify a task, which may be the number of the task.
  • one of the application servers 102 includes Job1 and Job2. Job3. Among them, Job1, Job2, and Job3 represent different tasks.
  • a task is generally split into multiple subtasks and then distributed to multiple application servers 106 for execution in parallel. Each application server 106 only needs to execute the assigned subtasks. Just fine.
  • the intermediate server 104 has a plurality of intermediate servers for managing the application server, obtaining an IP address and job information registered by the application server, and dividing the application server having the same job information into a Job group, and the number of the Job group is greater.
  • An application server is elected to the leader application server, and the corresponding task configuration information and the IP address of each application server in the job group are delivered to the leader application server;
  • the leader application server splits the tasks according to the number of fragments in the task configuration information, and allocates the split subtasks according to the IP addresses of the application servers in the Job group.
  • the intermediate server 104 there are also multiple intermediate servers 104, but the number is smaller than the number of application servers 102.
  • the intermediate server is used to manage the application server.
  • the intermediate server 104 obtains the IP address and job information registered by the application server 102, and then The application server having the same job information is divided into one job group. For example, if the application server 1 and the application server 2 and the application server 3 both have the same Job1, the application servers 1, 2, and 3 are included in the Job1 group. Then, a leader application server is elected among the multiple application servers in the Job group, and the application server that is the earliest access to the intermediate server is generally used as the leader application server, and the task configuration information corresponding to the job information is within the group.
  • the IP addresses of the multiple application servers are delivered to the leader application server, and the leader application server performs the splitting and allocation of the tasks.
  • the task configuration information includes the number of fragments of the task and the corresponding fragmentation algorithm. It should be noted that although the job information in the leader application server itself includes task configuration information, the job configuration information of the job can be dynamically modified through the background, and the intermediate server can obtain the latest job configuration information from the background. After the election of the leader application server, the intermediate server sends the latest job configuration information to the leader application server, so that the leader application server can perform fragmentation according to the latest task configuration information.
  • the leader application server splits the task according to the number of fragments in the task configuration information, and then allocates the split sub-task according to the IP address of each application server in the Job group. For example, if the number of fragments in the task configuration information is six, the leader application server splits the task into six subtasks according to the corresponding fragmentation algorithm, and then allocates the application to the application server in the Job group, including the leader application server. itself.
  • the application server IP address is used to uniquely identify an application server.
  • the six subtasks are 0, 1, 2, 3, 4, and 5;
  • the current Job group includes three application servers including the leader application server, namely Server1, Server2, and Server3, wherein, Server1 is the leader application server.
  • the assignment of subtasks may be a round robin method, a sequential allocation method, or other allocation methods.
  • the method of allocation is not limited herein.
  • the leader application server generally allocates as much as possible, and assigns 6 subtasks to 3 application servers, that is, each application server allocates 2 subtasks.
  • the sequential allocation method as an example, 0 and 1 are assigned to Server1; 2 and 3 are assigned to Server2; 4 and 5 are assigned to Server3. Since the IP address is used to distinguish different application servers in the system, the assignment result of the task is stored correspondingly by the group name, the subtask number, and the IP address. For example, store the subtasks 0 and 1 in the Job1 group and the IP address of Server1, store the subtasks 2 and 3 in the Job1 group and the IP address of Server2, and subtask 4 in the Job1 group. 5 Store with the IP address of Server3.
  • the application server 102 is also used to execute the assigned subtasks.
  • the leader application server splits according to the number of fragments in the task configuration information, and then assigns the split subtask to the application server in the Job group, which application is assigned to which application server. Execution, the application server 102 ultimately executes the assigned subtasks.
  • the distributed coordination server 106 is configured to deploy Zookeeper, establish a connection with the intermediate server, and coordinate the intermediate server by the Zookeeper.
  • Zookeeper is a distributed application coordination service deployed in the distributed coordination server 106 to coordinate management of the intermediate server by establishing a connection with the intermediate server.
  • Zookeeper selects a leader intermediate server among the multiple intermediate servers.
  • the leader intermediate server monitors other intermediate servers.
  • the intermediate server is responsible for the intermediate server.
  • Managed groups are reassigned to other intermediate servers to take over.
  • other intermediate servers monitor the leader intermediate server at the same time. Once the leader intermediate server fails or goes offline, the cluster is triggered to re-elect the leader intermediate server. In this way, through such a monitoring rule, it can be effectively ensured that the application server can perform the task normally in the event that an intermediate server is faulty or dropped.
  • the application server is managed by introducing multiple intermediate servers.
  • the ZooKeeper deployed in the distributed coordination server only needs to coordinate the intermediate server. Since the application server is managed by the intermediate server, the start and end records of the task are all recorded. It only needs to be written to the intermediate server without writing to Zookeeper, which reduces the burden on Zookeeper, and an intermediate server can manage multiple application servers. That is to say, ZooKeeper only needs to coordinate and manage a small number of intermediate servers, which greatly reduces the burden of Zookeeper.
  • the application server is managed by the intermediate server, if the application server needs to be expanded, it only needs to be in the intermediate server. You can register and do not need to operate in Zookeeper, which can reduce the burden of Zookeeper and dynamically expand the application server.
  • the intermediate server 104 is further configured to: according to the job information, find a target intermediate server that actually manages the job information, and return an address of the target intermediate server to the application server; the application server is further configured to use the address and target of the target intermediate server.
  • the intermediate server establishes a TCP connection.
  • different intermediate servers manage different job information.
  • the intermediate server After receiving the job information registered by the application server, the intermediate server first searches whether the job information exists in its own list, that is, determines whether the job information is managed by the intermediate server, and if not found, it needs to find the actual The intermediate server that manages the job information, that is, the target intermediate server, obtains the IP address of the target intermediate server, and then returns the IP address of the target intermediate server to the application server. After the application server receives the IP address of the returned target intermediate server, A TCP connection is established with the target intermediate server according to the IP address, and then the application is added to the corresponding Job group. In addition, since there are multiple jobs in each application server, as shown in FIG.
  • one of the application servers includes Job1, Job2, Job3, and Job1, Job2, and Job3 may be managed by different intermediate servers, so that This will result in an application server needing to maintain multiple TCP channels at the same time.
  • the same class of jobs are preferentially managed by the same intermediate server. As shown in FIG. 1, the jobs of the same application server are managed by the same intermediate server.
  • the intermediate server 104 is further configured to monitor an online application or an offline of an application server in the Job group.
  • the leader application server in the Job group is indicated.
  • the corresponding task is reassigned and receives the new allocation result returned by the leader application server.
  • the intermediate server 104 is further configured to monitor the online or offline of the application server in the Job group maintained by the user, when a new application server joins the job. In the group, the intermediate server 104 instructs the leader application server in the Job group to reallocate the corresponding task, that is, the subtask is also simultaneously allocated to the newly added application server for processing. When there is an application server in the Job group that is disconnected due to a problem such as a failure or a network, the intermediate server 104 also needs to instruct the leader application server to reallocate the corresponding task. For example, there are 3 application servers in the initial Job group, and the task is divided into 10 slices.
  • the allocation results are as follows: ⁇ Server1:[0,1,2], Server2:[3,4,5],Server 3:[6,7,8,9] ⁇ , if an application server crashes, it is reassigned as follows: ⁇ Server1:[0,1,2,3,4], Server 2:[5,6,7, 8,9] ⁇ . If you add an application server, reassign it as follows: ⁇ Server1: [0,1], Server2: [2,3] , Server3: [4,5,6] , Server4: [7,8,9] ⁇ .
  • the leader application server updates the latest allocation to the intermediate server.
  • the distributed coordination server 106 is further configured to elect a leader intermediate server among the plurality of intermediate servers by the Zookeeper;
  • the leader intermediate server is used to monitor other intermediate server nodes in the cluster in real time. If an intermediate server is found to be offline, the group managed by the dropped intermediate server is reassigned to an intermediate server to take over, and the migrate is set in the group.
  • a (migration) node the migrate node is used to mark the migration status of the job group, and when the migration is completed, the migration node is deleted; the intermediate server is also used to monitor the migrate node under the group in real time. If the IP address of the migrate node is found to be the same as its own, the group in which the migrate node is located is taken over.
  • the distributed coordination server 106 deployed with ZooKeeper selects a leader intermediate server among the plurality of intermediate servers through Zookeeper, as shown in FIG. 2, which is a directory structure diagram of Zookeeper in one embodiment, and FIG. 2
  • FIG. 2 On the left is the intermediate server root node that manages the intermediate server and the following intermediate server nodes (including the leader intermediate server node).
  • the right side is the root node of the Job group and the corresponding Job group node, and the child nodes under the Job group node, the child nodes include the owner node, the migration node and the modified node.
  • the leader intermediate server is used as a leader node of the distributed cluster, and the leader node is used to monitor other intermediate server nodes in real time (such as the intermediate server 1 node in FIG.
  • the intermediate server 2 node If the intermediate server is offline, the intermediate group managed by the dropped intermediate server is reassigned an intermediate server to take over, and a migration node is set in the Job group, and the migrate node is used to mark the migration of the Job group. Status, when the migration is complete, delete the migrate node.
  • the other intermediate server monitors the migrate node in the Job group in real time. If the IP address of the migrate node is the same as its own, it takes over the group where the migrate node is located. The IP address of the migrate node is newly allocated for the group. The IP address of the intermediate server.
  • FIG. 2 there is an owner node under the Job group, which is used to identify which intermediate server management the Job group has, and the leader intermediate server listens to this node when managing the middle of the Job group.
  • an intermediate server takeover is reassigned and the migrate node is set under the Job group to be taken over, wherein the migrate node is used to mark the migration status of the Job group.
  • the modified node is set in the Job group. The modified node is monitored in real time by the intermediate server managing the Job group. When the configuration information is changed, the leader application server in the group is notified, and then the node is deleted.
  • the leader intermediate server is also used to determine whether the intermediate server that is offline is taking over the group if it is monitored that the intermediate server is offline, and if so, the intermediate server that is taken over is re-allocated for the group.
  • the leader intermediate server detects that an intermediate server is offline, in addition to setting the migrate node under the Job group currently managed by the intermediate server, it is also necessary to find out whether the intermediate server that is offline is taking over other groups. Group, if it is, reassign the intermediate server that was taken over for the other group. Specifically, referring to FIG. 2, after the leader intermediate server monitors that an intermediate server is offline, the migrating node under the group is traversed, and if the IP address of the intermediate server corresponding to the group where the migrate node is located is in the middle of the dropped line If the server IP address is the same, the intermediate server that is taken over is assigned to the group where the migrate node is located.
  • the application server 102 is further configured to determine, according to the job information, whether the execution time of the task is reached, and if yes, obtain the corresponding fragmentation information from the intermediate server that manages the application server, and start executing according to the fragmentation information. Corresponding subtasks, and record the information that the task starts executing to the intermediate server.
  • the fragmentation result is specifically divided into several pieces, and each piece is executed by the application server.
  • the situation is sent to the intermediate server that manages the group.
  • the application server determines that the task reaches the execution time according to the job information in the user
  • the corresponding fragmentation information is obtained from the intermediate server that manages the application server, where the job information includes a time setting for performing the task; the fragmentation information refers to The fragment number that the application server needs to execute, for example, the execution of slices 0 and 1.
  • the application server executes the corresponding subtask according to the fragmentation information, and records the information of the execution of the task to the intermediate server.
  • the distributed task scheduling system further includes: a database 108, configured to store the job information, receive the record of the start and end of the task sent by the intermediate server, and store the data.
  • a database 108 configured to store the job information, receive the record of the start and end of the task sent by the intermediate server, and store the data.
  • the distributed task scheduling system further includes a database 108, which is used to store job information, that is, store task configuration information corresponding to each task. It is also used to record the status of each task at the beginning and end.
  • the background can be used to view the status of each task through the management platform, or the job configuration information of the job can be manually modified through the management platform.
  • a distributed task scheduling method comprising:
  • Step 402 The application server initiates a TCP connection request to the intermediate server, and the intermediate server establishes a TCP connection with the application server according to the TCP connection request.
  • the application server sends a request for establishing a TCP connection to the intermediate server, and after receiving the TCP connection request, the intermediate server establishes a TCP connection with the application server.
  • Step 404 The application server registers the IP address and the job information of the application server with the intermediate server.
  • the application server After the application server establishes a TCP connection with the intermediate server, the application server registers its own IP address and job information to the intermediate server, where the job information includes task configuration information, task identifier, task execution time, and the like.
  • the IP address of the application server is used to uniquely identify the application server.
  • Step 406 The intermediate server acquires the IP address and the job information registered by the application server, divides the application server having the same job information into the same job group, and elects a leader among the multiple application servers in the job group.
  • the application server sends the task configuration information corresponding to the job information and the IP address of each application server in the Job group to the leader application server.
  • the intermediate server is used to manage the application server.
  • the intermediate server obtains the IP address and job information registered by the application server.
  • the application server having the same job information is divided into a Job group. For example, if the application server 1 and the application server 2 and the application server 3 both have the same Job1, the Job1 group includes the application servers 1, 2 and 3.
  • a leader application server is elected among the multiple application servers in the Job group, and the application server that is the earliest access to the intermediate server is generally used as the leader application server, and the task configuration information and the job group corresponding to the job information are used.
  • the IP addresses of multiple application servers are delivered to the leader application server.
  • Step 408 The leader application server splits the task according to the number of fragments in the task configuration information, and allocates the split sub-task according to the IP address of each application server in the Job group.
  • the task application information is split and allocated by the leader application server, where the task configuration information includes the number of fragments of the task and the corresponding fragmentation algorithm.
  • the leader application server splits the task according to the number of fragments in the task configuration information, and then allocates the split sub-task according to the IP address of each application server in the Job group. For example, if the number of fragments in the task configuration information is six, the leader application server splits the task into six subtasks according to the corresponding fragmentation algorithm, and then allocates the application to the application server in the Job group, including the leader application server. itself.
  • the application server IP address is used to uniquely identify an application server.
  • the subtask assignment can be a round robin method or a sequential allocation method.
  • the assignment result of the task is stored correspondingly by the group name, the subtask number, and the IP address. For example, store the subtasks 0 and 3 in the Job1 group and the IP address of Server1, store the subtasks 1 and 4 in the Job1 group and the IP address of Server2, and subtask 2 in the Job1 group. 5 Store with the IP address of Server3.
  • the method further includes: the intermediate server searching, according to the job information, a target intermediate server that actually manages the job, and the target intermediate server The address is returned to the application server; the application server establishes a TCP connection with the target intermediate server according to the address of the target intermediate server.
  • different intermediate servers manage different job information.
  • the intermediate server After receiving the job information registered by the application server, the intermediate server first searches whether the job information exists in its own list, that is, determines whether the job information is managed by the intermediate server, and if not found, it needs to find the actual The intermediate server that manages the job information, that is, the target intermediate server, obtains the IP address of the target intermediate server, and then returns the IP address of the target intermediate server to the application server. After the application server receives the IP address of the returned target intermediate server, A TCP connection is established with the target intermediate server according to the IP address, and then the application is added to the corresponding Job group.
  • the foregoing method for distributed task scheduling further includes:
  • Step 410 The intermediate server monitors the online or offline of the application server in the Job group.
  • the leader application server in the Job group is instructed to reallocate the corresponding task.
  • the intermediate server is further configured to monitor the online or offline of the application server in the Job group maintained by the user, when a new application server joins the job.
  • the intermediate server instructs the leader application server in the Job group to reallocate the corresponding task, that is, the subtask is also assigned to the newly added application server for processing.
  • the intermediate server also needs to instruct the leader application server to reallocate the corresponding task.
  • Step 412 The leader application server reassigns the task according to the indication according to the number of currently online application servers in the Job group, and returns the allocation result to the intermediate server.
  • the leader application server reassigns the tasks according to the number of currently online application servers in the Job group according to the instructions of the intermediate server, and updates the distribution result to the intermediate server.
  • the task is divided into 10 slices.
  • the allocation results are as follows: ⁇ Server1:[0,1,2], Server2:[3,4,5],Server 3:[6,7,8,9] ⁇ , if an application server crashes, it is reassigned as follows: ⁇ Server1:[0,1,2,3,4], Server 2:[5,6,7, 8,9] ⁇ . If you add an application server, reassign it as follows: ⁇ Server1: [0,1], Server2: [2,3] , Server3: [4,5,6] , Server4: [7,8,9] ⁇ .
  • the foregoing storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only)
  • a nonvolatile storage medium such as a memory or a ROM, or a random access memory (RAM).

Abstract

本发明提出了一种分布式任务调度系统,包括:应用服务器(102),所述应用服务器(102)有多个,用于与中间服务器(104)建立TCP连接,将IP地址和Job信息注册到所述中间服务器(104),执行被分配的子任务;中间服务器(104),所述中间服务器(104)有多个,用于管理应用服务器(102);分布式协调服务器(106),用于部署Zookeeper,与所述中间服务器(104)建立连接,由所述Zookeeper统一对所述中间服务器(104)进行协调。

Description

分布式任务调度方法和系统
本申请要求于2016年11月29日提交中国专利局、申请号为2016110764720、发明名称为“分布式任务调度方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
【技术领域】
本发明涉及计算机处理领域,特别是涉及一种分布式任务调度方法和系统。
【背景技术】
分布式任务调度,是指将一个应用或者业务划分出的多个任务交由多个服务器进行处理。分布式任务调度可解决单个服务器资源不足等问题,达到大规模数据计算的目的。传统的分布式框架有很多,例如, Quartz、Zookeeper等技术都可以实现分布式任务调度。其中,Quartz的集群方式是利用数据库的锁,同一时间只允许有一台服务器能拿到锁执行任务,没有利用集群的优势并行执行,处理的速度比较慢。Zookeeper是分布式应用程序协调服务,其对数据进行分片,分配给多个应用服务器去执行处理,实现了并行执行,从而提升了处理速度。但是由于Zookeeper往往需要管理很多应用服务器,每个任务开始及结束的记录也都需要写入Zookeeper,造成Zookeeper的负担比较重,而且如果要加入新的应用服务器,也需要在Zookeeper上进行操作,进一步加大了其负担,且不能实现应用服务器的动态扩充。
【发明内容】
根据本申请的各种实施例,提供一种分布式任务调度方法和系统。
一种分布式任务调度系统,包括:
应用服务器,所述应用服务器有多个,用于与中间服务器建立TCP连接,将IP地址和Job信息注册到所述中间服务器;
中间服务器,所述中间服务器有多个,用于管理应用服务器,获取所述应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,并将对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到该leader应用服务器;其中,
所述leader应用服务器用于根据所述任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配;
所述应用服务器还用于执行被分配的子任务;及
分布式协调服务器,用于部署Zookeeper,与所述中间服务器建立连接,由所述Zookeeper统一对所述中间服务器进行协调。
一种分布式任务调度方法,包括:
应用服务器向中间服务器发起TCP连接请求,所述中间服务器根据所述TCP连接请求与所述应用服务器建立TCP连接;
所述应用服务器向所述中间服务器注册该应用服务器的IP地址和Job信息;
所述中间服务器获取所述应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为同一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,将所述Job信息对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到所述leader应用服务器;及
所述leader应用服务器根据所述任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
【附图说明】
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中分布式任务调度系统的架构图;
图2为一个实施例中Zookeeper的目录结构图;
图3为另一个实施例中分布式任务调度系统的架构图;
图4为一个实施例中分布式任务调度方法流程图;
图5为另一个实施例中分布式任务调度方法流程图。
【具体实施方式】
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
如图1所示,在一个实施例中,提出了一种分布式任务调度系统,该系统包括:应用服务器102,中间服务器104和分布式协调服务器106;其中,
应用服务器102,应用服务器有多个,用于与中间服务器建立TCP连接,将IP地址和Job信息注册到中间服务器。
在本实施例中,应用服务器102有多个,应用服务器是具体执行任务的服务器,由于是和客户端直接交互的,也称为“客户端应用服务器”。应用服务器102与中间服务器104建立TCP连接后,将该应用服务器102的IP地址和Job信息注册到中间服务器106。其中,Job信息包括任务配置信息、任务标识、任务执行时间等,任务标识用来唯一标识一个任务,可以是任务的编号,如图1中所示,其中一个应用服务器102中包括Job1,Job2,Job3。其中,Job1,Job2,Job3分别代表不同的任务。在本实施例中,为了能够快速执行任务,一般是将一个任务拆分为多个子任务,然后分配给多个应用服务器106来并行执行的,每个应用服务器106只需要执行被分配的子任务即可。
中间服务器104,中间服务器有多个,用于管理应用服务器,获取应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,并将对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到该leader应用服务器;该 leader应用服务器根据任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。
在本实施例中,中间服务器104也有多个,但是比应用服务器102的数量少很多,中间服务器用于管理应用服务器,首先,中间服务器104获取应用服务器102注册的IP地址和Job信息,然后将具有相同Job信息的应用服务器划分为一个Job群组,比如,如果应用服务器1和应用服务器2以及应用服务器3都具有相同的Job1,那么该Job1群组中就包括应用服务器1、2和3。然后,在Job群组中的多个应用服务器中选举出一个leader应用服务器,一般是将最早接入该中间服务器的应用服务器作为leader应用服务器,将该Job信息对应的任务配置信息和群组内的多个应用服务器的IP地址都下发到该leader应用服务器,由该leader应用服务器进行任务的拆分及分配,其中,任务配置信息包括任务的分片数以及对应的分片算法。需要说明的是,虽然leader应用服务器中的Job信息中本身包括任务配置信息,但是Job的任务配置信息是可以通过后台来动态进行修改的,而中间服务器能够从后台获取到最新的Job任务配置信息,所以在选举出leader应用服务器后,中间服务器会将最新的Job任务配置信息下发到leader应用服务器,以便leader应用服务器可以根据该最新的任务配置信息进行分片。
具体的,leader应用服务器根据任务配置信息中的分片数将任务进行拆分,然后根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。比如,任务配置信息中任务的分片数为6个,那么leader应用服务器就按照对应的分片算法将任务拆分为6个子任务,然后分配给Job群组内的应用服务器,包括leader应用服务器本身。其中,应用服务器IP地址用来唯一标识一个应用服务器。具体的,比如,6个子任务分别为0,1,2,3,4,5;当前Job群组中包括leader应用服务器在内有3个应用服务器,分别为Server1,Server2,Server3,其中,假设Server1为leader应用服务器。子任务的分配可以采用轮循的方法、也可以采用顺序的分配方法,还可以是其他分配方法,这里并不对分配的方法作限定。不管是哪种分配方法,leader应用服务器一般是尽量做到平均分配,将6个子任务分配给3台应用服务器,即每个应用服务器分配2个子任务,比如,以顺序的分配方法为例,将0和1分配给Server1;2和3分配给Server2;4和5分配给Server3。由于系统中是采用IP地址来区分不同的应用服务器的,所以任务的分配结果是以群组名、子任务编号与IP地址进行对应存储的。比如,将Job1群组中的子任务0和1与Server1的IP地址进行存储,将Job1群组中的子任务2和3与Server2的IP地址进行存储,将Job1群组中的子任务4和5与Server3的IP地址进行存储。
应用服务器102还用于执行被分配的子任务。
在本实施例中,leader应用服务器按照任务配置信息中的分片数进行拆分,然后将拆分后的子任务分配给Job群组中的应用服务器,即将哪个分片分给哪个应用服务器来执行,最终是由应用服务器102来执行被分配的子任务。
分布式协调服务器106,用于部署Zookeeper,与中间服务器建立连接,由Zookeeper统一对中间服务器进行协调。
在本实施例中,Zookeeper是一个分布式应用程序协调服务,部署在分布式协调服务器106中,通过与中间服务器建立连接来对中间服务器进行协调管理。中间服务器有多个,Zookeeper在该多个中间服务器中选择出一个leader中间服务器,由该leader中间服务器来监控其他中间服务器,当发现某个中间服务器掉线或故障时,则将该中间服务器负责管理的群组重新分配给其他中间服务器来接管。与此同时,其他中间服务器同时监控该leader中间服务器,一旦该leader中间服务器故障或掉线,则触发集群重新选举leader中间服务器。这样,通过这种监控规则,可以有效的保证在某个中间服务器故障或掉线的情况下,应用服务器可以正常的进行任务的执行。
在本实施例中,通过引入多个中间服务器来管理应用服务器,分布式协调服务器中部署的Zookeeper只需要对中间服务器进行协调,由于应用服务器由中间服务器来管理,任务的开始以及结束的记录都只需要写入中间服务器,而不需要写入Zookeeper,减少了Zookeeper的负担,且一个中间服务器可以管理多个应用服务器, 也就是说,Zookeeper只需要对少量的中间服务器进行协调管理即可,大大减少了Zookeeper的负担,进一度的,由于由中间服务器来管理应用服务器,如果需要扩充应用服务器,只需要在中间服务器中进行注册即可,不需要在Zookeeper进行操作,不但能够减少Zookeeper的负担,也能够实现动态扩充应用服务器的目的。
在一个实施例中,中间服务器104还用于根据Job信息查找实际管理该Job信息的目标中间服务器,将目标中间服务器的地址返回给应用服务器;应用服务器还用于根据目标中间服务器的地址与目标中间服务器建立TCP连接。
在本实施例中,不同的中间服务器管理不同的Job信息。中间服务器接收到应用服务器注册的Job信息后,首先,在自身的列表中查找该Job信息是否存在,即判断该Job信息是否是由该中间服务器来管理的,若没有查找到,则需要查找实际管理该Job信息的中间服务器,即目标中间服务器,获取该目标中间服务器的IP地址,然后将该目标中间服务器的IP地址返回给应用服务器,应用服务器接收到返回的目标中间服务器的IP地址后,根据该IP地址与目标中间服务器建立TCP连接,然后申请加入对应的Job群组。此外,由于每个应用服务器中有多个Job,如图1中所示的,其中一个应用服务器中包括Job1,Job2,Job3,而Job1,Job2,Job3可能由不同的中间服务器来管理,这样就会导致一个应用服务器需要同时维持多个TCP通道。为了避免应用服务器同时维持多个TCP通道,将同一类Job优先交由同一个中间服务器来管理,如图1所示,同一个应用服务器的Job由同一个中间服务器管理。
在一个实施例中,中间服务器104还用于监控Job群组中的应用服务器的上线或下线,当Job群组中有应用服务器上线或下线时,指示Job群组中的leader应用服务器将对应的任务进行重新分配,接收leader应用服务器返回的新的分配结果。
在本实施例中,中间服务器104还用于监控其维护的Job群组中的应用服务器的上线或下线,当有新的应用服务器加入该Job 群组时,中间服务器104指示Job群组中的leader应用服务器将对应的任务进行重新分配,即将子任务也同时分配给该新加入的应用服务器进行处理。当Job群组中有应用服务器由于故障或网络等问题掉线时,中间服务器104也需要指示leader应用服务器将对应的任务进行重新分配。比如,最初Job群组中有3台应用服务器,任务分为10个片,分配结果如下:{Server1:[0,1,2],Server2:[3,4,5],Server 3:[6,7,8,9]},如果一台应用服务器崩溃,则重新分配如下:{ Server1:[0,1,2,3,4],Server 2:[5,6,7,8,9] }。如果新增一台应用服务器,则重新分配如下:{ Server1: [0,1], Server2: [2,3] , Server3: [4,5,6] , Server4: [7,8,9]}。leader应用服务器将最新分配的情况更新到中间服务器。
在一个实施例中,分布式协调服务器106还用于通过Zookeeper在多个中间服务器中选举出一个leader中间服务器;所述 leader中间服务器用于实时监控集群中其他中间服务器节点,若发现有中间服务器掉线,则为该掉线的中间服务器管理的群组重新分配给一个中间服务器来接管,并在群组中设置migrate(迁移)节点,所述migrate(迁移)节点用于标注Job群组的迁移状态,当迁移完成后,删除该migrate(迁移)节点;所述中间服务器还用于实时监控群组下的migrate节点,若发现migrate节点的IP地址和自身的相同,则接管该migrate节点所在的群组。
在本实施例中,部署有Zookeeper的分布式协调服务器106通过Zookeeper在多个中间服务器中选举出一个leader中间服务器,如图2所示,为一个实施例中Zookeeper的目录结构示意图,图2种左边为管理中间服务器的中间服务器根节点以及下面的中间服务器节点(包括leader中间服务器节点)。右边为Job群组的根节点以及对应的Job群组节点,还有Job群组节点下的子节点,子节点包括owner(主)节点,migrate(迁移)节点和modified(修改)节点。在本实施例中,将该leader中间服务器作为分布式集群的leader节点,该leader节点用于实时监控其他中间服务器节点(如图2中的中间服务器1节点,中间服务器2节点),若发现有中间服务器掉线,则为掉线的中间服务器管理的Job群组重新分配一个中间服务器来接管,并在该Job群组中设置migrate(迁移)节点,该migrate节点用于标注Job群组的迁移状态,当迁移完成后,删除该migrate节点。其他中间服务器实时监控Job群组下的migrate节点,若发现migrate节点的IP地址和自身的相同,则接管该migrate节点所在的群组,其中,migrate节点的IP地址就是重新为该群组分配的中间服务器的IP地址。
此外,如图2所示,在Job群组下有一个owner节点,该owner节点用于标识该Job群组有哪台中间服务器管理,leader中间服务器监听此节点,当管理该Job群组的中间服务器掉线后,重新分配一个中间服务器接管并在将要被接管的Job群组下设置migrate节点,其中,该migrate节点用于标注Job群组的迁移状态。进一步的,为了能够在不重启中间服务器的情况下,动态的修改Job群组的任务配置信息,当Job群组的任务配置信息被修改后,在该Job群组中设置modified(修改)节点,由管理该Job群组的中间服务器实时的监控modified节点,当发现配置信息变更时,通知该群组中的leader应用服务器,然后删除此节点。
在一个实施例中, leader中间服务器还用于若监控到有中间服务器掉线,则判断掉线的中间服务器是否正在接管群组,若是,则为该群组重新分配接管的中间服务器。
在本实施例中,leader中间服务器若监控到某个中间服务器掉线,除了要在该中间服务器当前管理的Job群组下设置migrate节点,还要查找掉线的该中间服务器是否正在接管其他群组,若是,则为其他群组重新分配接管的中间服务器。具体的,参考图2,在leader中间服务器监控到某个中间服务器掉线后,遍历群组下的migrate节点,若migrate节点所在的群组对应的接管中间服务器的IP地址与该掉线的中间服务器IP地址相同,则重新为该migrate节点所在的群组分配接管的中间服务器。
在一个实施例中,应用服务器102还用于根据Job信息判断是否到达任务的执行时间,若是,则从管理该应用服务器的中间服务器中获取对应的分片信息,根据所述分片信息开始执行对应的子任务,并将任务开始执行的信息记录到中间服务器。
在本实施例中,每个Job群组的leader应用服务器将任务进行分片并分配给对应的中间服务器后,然后将分片结果,即具体分为几片,每一片由那个应用服务器来执行等情况发送到管理该群组的中间服务器。当应用服务器根据自身中的Job信息判断该任务到达执行时间后,从管理该应用服务器的中间服务器中获取对应的分片信息,其中,Job信息包括执行该任务的时间设置;分片信息是指该应用服务器需要执行的分片编号,比如,执行分片0和1。那么该应用服务器就会根据该分片信息执行对应的子任务,并将任务开始执行的信息记录到中间服务器。
如图3所示,在一个实施例中,上述分布式任务调度系统还包括:数据库108,用于存储Job信息,接收中间服务器发送的任务开始及结束的记录并进行存储。
在本实施例中,分布式任务调度系统中还包括数据库108,该数据库用于存储Job信息,即存储每个任务对应的任务配置信息。还用于记录每个任务开始及结束的状态,后台可以通过管控平台来查看每个任务的状态,也可以通过该管控平台手动的对Job的任务配置信息进行修改。
如图4所示,在一个实施例中,提出了一种分布式任务调度方法,该方法包括:
步骤402,应用服务器向中间服务器发起TCP连接请求,中间服务器根据所述TCP连接请求与所述应用服务器建立TCP连接。
在本实施例中,首先,应用服务器向中间服务器发送建立TCP连接的请求,中间服务器接收到该TCP连接请求后,与该应用服务器建立TCP连接。
步骤404,应用服务器向中间服务器注册该应用服务器的IP地址和Job信息。
在本实施例中,应用服务器与中间服务器建立TCP连接后,应用服务器将自身的IP地址和Job信息注册到中间服务器,其中,Job信息包括任务配置信息、任务标识、任务执行时间等。应用服务器的IP地址用于唯一标识该应用服务器。
步骤406,中间服务器获取所述应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为同一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,将所述Job信息对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到所述leader应用服务器。
在本实施例中,在本实施例中,中间服务器也有多个,但是比应用服务器的数量少很多,中间服务器用于管理应用服务器,首先,中间服务器获取应用服务器注册的IP地址和Job信息,然后将具有相同Job信息的应用服务器划分为一个Job群组,比如,如果应用服务器1和应用服务器2以及应用服务器3都具有相同的Job1,那么该Job1群组中就包括应用服务器1、2和3。然后,在Job群组中的多个应用服务器中选举出一个leader应用服务器,一般是将最早接入该中间服务器的应用服务器作为leader应用服务器,将该Job信息对应的任务配置信息和Job群组内的多个应用服务器的IP地址都下发到该leader应用服务器。
步骤408,leader应用服务器根据所述任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。
在本实施例中,由leader应用服务器进行任务的拆分及分配,其中,任务配置信息包括任务的分片数以及对应的分片算法。具体的,leader应用服务器根据任务配置信息中的分片数将任务进行拆分,然后根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。比如,任务配置信息中任务的分片数为6个,那么leader应用服务器就按照对应的分片算法将任务拆分为6个子任务,然后分配给Job群组内的应用服务器,包括leader应用服务器本身。其中,应用服务器IP地址用来唯一标识一个应用服务器。子任务的分配可以采用轮循的方法,也可以采用顺序分配的方法,当然也可以采用其他的分配方法,比如,随机分配方法。以轮循分配方法为例,假设Job群组中有3台应用服务器,该Job任务分为6个分片,分别为0,1,2,3,4,5;将6个分片按照轮循的方法进行分配,将分片0分给第一台应用服务器,分片1分配给第二台应用服务器,分片2分配给第三台应用服务器;然后再将3分给第一台应用服务器,依次循环类推,最终,将分片0和3分配给了第一台应用服务器,分片1和4分配给了第二台应用服务器,分片2和5分配给了第三台应用服务器。由于系统中是采用IP地址来区分不同的应用服务器的,所以任务的分配结果是以群组名、子任务编号与IP地址进行对应存储的。比如,将Job1群组中的子任务0和3与Server1的IP地址进行存储,将Job1群组中的子任务1和4与Server2的IP地址进行存储,将Job1群组中的子任务2和5与Server3的IP地址进行存储。
在一个实施例中,在中间服务器获取所述应用服务器注册的IP地址和Job信息的步骤之后还包括:中间服务器根据所述Job信息查找实际管理该Job的目标中间服务器,将所述目标中间服务器的地址返回给应用服务器;所述应用服务器根据所述目标中间服务器的地址和所述目标中间服务器建立TCP连接。
在本实施例中,不同的中间服务器管理不同的Job信息。中间服务器接收到应用服务器注册的Job信息后,首先,在自身的列表中查找该Job信息是否存在,即判断该Job信息是否是由该中间服务器来管理的,若没有查找到,则需要查找实际管理该Job信息的中间服务器,即目标中间服务器,获取该目标中间服务器的IP地址,然后将该目标中间服务器的IP地址返回给应用服务器,应用服务器接收到返回的目标中间服务器的IP地址后,根据该IP地址与目标中间服务器建立TCP连接,然后申请加入对应的Job群组。
如图5所示,在一个实施例中,上述分布式任务调度的方法还包括:
步骤410,中间服务器监控Job群组中的应用服务器的上线或下线,当群组中有应用服务器上线或下线时,指示Job群组中的leader应用服务器将对应的任务进行重新分配。
在本实施例中,中间服务器还用于监控其维护的Job群组中的应用服务器的上线或下线,当有新的应用服务器加入该Job 群组时,中间服务器指示Job群组中的leader应用服务器将对应的任务进行重新分配,即将子任务也同时分配给该新加入的应用服务器进行处理。当Job群组中有应用服务器由于故障或网络等问题掉线时,中间服务器也需要指示leader应用服务器将对应的任务进行重新分配。
步骤412, leader应用服务器按照所述指示根据Job群组中当前在线的应用服务器数量将所述任务进行重新分配,并将分配结果返回给所述中间服务器。
在本实施例中,leader应用服务器按照中间服务器的指示,根据Job群组中当前在线的应用服务器数量将任务进行重新分配,并将分配结果更新到中间服务器。具体的,比如,最初Job群组中有3台应用服务器,任务分为10个片,分配结果如下:{Server1:[0,1,2],Server2:[3,4,5],Server 3:[6,7,8,9]},如果一台应用服务器崩溃,则重新分配如下:{Server1:[0,1,2,3,4],Server 2:[5,6,7,8,9] }。如果新增一台应用服务器,则重新分配如下:{ Server1: [0,1], Server2: [2,3] , Server3: [4,5,6] , Server4: [7,8,9]}。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种分布式任务调度系统,包括:
    多个应用服务器,用于与中间服务器建立TCP连接,将IP地址和Job信息注册到所述中间服务器;
    多个中间服务器,用于管理应用服务器,获取所述应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,并将对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到该leader应用服务器;其中,
    所述leader应用服务器用于根据所述任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配;
    所述应用服务器还用于执行被分配的子任务;及
    分布式协调服务器,用于部署Zookeeper,与所述中间服务器建立连接,由所述Zookeeper统一对所述中间服务器进行协调。
  2. 根据权利要求1所述的系统,其特征在于,所述中间服务器还用于根据所述Job信息查找实际管理该Job的目标中间服务器,将所述目标中间服务器的地址返回给所述应用服务器;
    所述应用服务器还用于根据所述目标中间服务器的地址与所述目标中间服务器建立TCP连接。
  3. 根据权利要求1所述的系统,其特征在于,所述中间服务器还用于监控Job群组中的应用服务器的上线或下线,当Job群组中有应用服务器上线或下线时,指示Job群组中的leader应用服务器将对应的任务进行重新分配,并接收所述leader应用服务器返回的新的分配结果。
  4. 根据权利要求1所述的系统,其特征在于,所述分布式协调服务器还用于通过Zookeeper在多个中间服务器中选举出一个leader中间服务器;
    所述leader中间服务器还用于实时监控集群中其他中间服务器节点,若发现有中间服务器掉线,则为所述掉线的中间服务器管理的Job群组分配给一个在线的中间服务器来接管,并在所述Job群组中设置migrate节点,所述migrate节点用于标注Job群组的迁移状态,当迁移完成后,删除该migrate节点;
    所述中间服务器还用于实时监控Job群组下的migrate节点,若发现所述migrate节点的IP地址和自身的相同,则接管所述migrate节点所在的Job群组。
  5. 根据权利要求4所述的系统,其特征在于,所述leader中间服务器还用于若监控到有中间服务器掉线,则判断所述掉线的中间服务器是否正在接管Job群组,若是,则为该Job群组重新分配接管的中间服务器。
  6. 根据权利要求1所述的系统,其特征在于,所述应用服务器还用于根据所述Job信息判断是否到达任务的执行时间,若是,则从管理该应用服务器的中间服务器中获取对应的分片信息,根据所述分片信息开始执行对应的子任务,并将任务开始执行的信息记录到中间服务器。
  7. 根据权利要求1所述的系统,其特征在于,所述系统还包括:
    数据库,用于存储Job信息,接收中间服务器发送的任务开始及结束的记录并进行存储。
  8. 一种分布式任务调度方法,所述方法包括:
    应用服务器向中间服务器发起TCP连接请求,所述中间服务器根据所述TCP连接请求与所述应用服务器建立TCP连接;
    所述应用服务器向所述中间服务器注册该应用服务器的IP地址和Job信息;
    所述中间服务器获取所述应用服务器注册的IP地址和Job信息,将具有相同Job信息的应用服务器划分为同一个Job群组,在该Job群组中的多个应用服务器中选举出一个leader应用服务器,将所述Job信息对应的任务配置信息和Job群组内的各个应用服务器的IP地址下发到所述leader应用服务器;
    所述leader应用服务器根据所述任务配置信息中的分片数将任务进行拆分,并根据Job群组内的各个应用服务器的IP地址将拆分后的子任务进行分配。
  9. 根据权利要求8所述的方法,其特征在于,在所述中间服务器获取所述应用服务器注册的IP地址和Job信息的步骤之后还包括:
    所述中间服务器根据所述Job信息查找实际管理该Job的目标中间服务器,将所述目标中间服务器的地址返回给应用服务器;
    所述应用服务器根据所述目标中间服务器的地址和所述目标中间服务器建立TCP连接。
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    所述中间服务器监控Job群组中的应用服务器的上线或下线,当Job群组中有应用服务器上线或下线时,指示Job群组中的leader应用服务器将对应的任务进行重新分配;
    所述leader应用服务器按照所述指示根据Job群组中当前在线的应用服务器数量将所述任务进行重新分配,并将分配结果返回给所述中间服务器。
  11. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    所述中间服务器与分布式协调服务器建立连接;
    所述分布式协调服务器通过部署Zookeeper统一对所述中间服务器进行协调。
  12. 根据权利要求11所述的方法,其特征在于,所述分布式协调服务器通过部署Zookeeper统一对所述中间服务器进行协调包括:
    所述分布式协调服务器通过所述Zookeeper在多个中间服务器中选举一个leader中间服务器;其中,所述leader中间服务器实时监控集群中其他中间服务器节点,若发现有中间服务器掉线,则为所述掉线的中间服务器管理的Job群组分配给一个在线的中间服务器来接管,并在所述Job群组中设置migrate节点,所述migrate节点用于标注Job群组的迁移状态,当迁移完成后,删除该migrate节点。
  13. 根据权利要求11所述的方法,其特征在于,在所述leader中间服务器实时监控集群中其他中间服务器节点之后还包括:
    若监控到有中间服务器掉线,则判断所述掉线的中间服务器是否正在接管Job群组,若是,则为该Job群组重新分配接管的中间服务器。
  14. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述中间服务器实时监控Job群组下的migrate节点,若发现所述migrate节点的IP地址和自身的相同,则接管所述migrate节点所在的Job群组。
  15. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    所述应用服务器根据所述Job信息判断是否达到任务的执行时间;
    若是,则从管理所述应用服务器的中间服务器中获取对应的分片信息,根据所述分片信息开始执行对应的子任务,并将任务开始执行的信息记录到中间服务器。
PCT/CN2017/091101 2016-11-29 2017-06-30 分布式任务调度方法和系统 WO2018099067A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611076472.0 2016-11-29
CN201611076472.0A CN106993019B (zh) 2016-11-29 2016-11-29 分布式任务调度方法和系统

Publications (1)

Publication Number Publication Date
WO2018099067A1 true WO2018099067A1 (zh) 2018-06-07

Family

ID=59414280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091101 WO2018099067A1 (zh) 2016-11-29 2017-06-30 分布式任务调度方法和系统

Country Status (2)

Country Link
CN (1) CN106993019B (zh)
WO (1) WO2018099067A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109088947A (zh) * 2018-09-29 2018-12-25 北京奇虎科技有限公司 基于分层传输的数据分发系统、方法及服务器
CN110287228A (zh) * 2019-05-20 2019-09-27 广西电网有限责任公司 基于电网调度域设备监测实时数据采集的实现方法
CN110532096A (zh) * 2019-08-28 2019-12-03 广东乐之康医疗技术有限公司 一种多节点分组并行部署的系统和方法
CN110928662A (zh) * 2019-11-28 2020-03-27 国网信息通信产业集团有限公司 一种面向微服务架构的分布式定时任务调度器
CN112118291A (zh) * 2020-08-13 2020-12-22 北京思特奇信息技术股份有限公司 一种业务流量的负载均衡系统和方法
CN112231098A (zh) * 2020-09-29 2021-01-15 北京三快在线科技有限公司 任务处理方法、装置、设备及存储介质
CN113760485A (zh) * 2020-07-16 2021-12-07 北京沃东天骏信息技术有限公司 定时任务的调度方法、装置、设备及存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562522A (zh) * 2017-10-12 2018-01-09 国电南瑞科技股份有限公司 一种基于ZooKeeper的分布式应用管理方法
CN109933422A (zh) * 2017-12-19 2019-06-25 北京京东尚科信息技术有限公司 处理任务的方法、装置、介质及电子设备
CN109995842B (zh) * 2018-01-02 2022-12-02 北京奇虎科技有限公司 一种用于分布式服务器集群的分组方法及装置
CN108717379B (zh) * 2018-05-08 2023-07-25 平安证券股份有限公司 电子装置、分布式任务调度方法及存储介质
CN108829505A (zh) * 2018-06-28 2018-11-16 北京奇虎科技有限公司 一种分布式调度系统及方法
CN108958920B (zh) * 2018-07-13 2021-04-06 众安在线财产保险股份有限公司 一种分布式任务调度方法及系统
CN109032796B (zh) * 2018-07-18 2020-12-22 北京京东金融科技控股有限公司 一种数据处理方法和装置
CN111163117B (zh) * 2018-11-07 2023-01-31 北京京东尚科信息技术有限公司 一种基于Zookeeper的对等式调度方法和装置
CN111158896A (zh) * 2018-11-08 2020-05-15 中国移动通信集团上海有限公司 一种分布式进程调度方法及系统
CN110233886B (zh) * 2019-05-30 2021-07-20 华南理工大学 一种面向海量微服务的高可用服务治理系统及实现方法
CN110673933A (zh) * 2019-08-15 2020-01-10 平安普惠企业管理有限公司 基于ZooKeeper的分布式异步队列实现方法、装置、设备及介质
CN111147291B (zh) * 2019-12-18 2024-02-06 深圳前海微众银行股份有限公司 一种服务维护方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095327A (zh) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 一种分布式etl系统及调度方法
CN105338028A (zh) * 2014-07-30 2016-02-17 浙江宇视科技有限公司 一种分布式服务器集群中主从节点选举方法及装置
CN105447097A (zh) * 2015-11-10 2016-03-30 北京北信源软件股份有限公司 数据采集方法及系统
CN105589756A (zh) * 2014-12-03 2016-05-18 中国银联股份有限公司 批处理集群系统以及方法
CN105893497A (zh) * 2016-03-29 2016-08-24 杭州数梦工场科技有限公司 一种任务处理方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8584131B2 (en) * 2007-03-30 2013-11-12 International Business Machines Corporation Method and system for modeling and analyzing computing resource requirements of software applications in a shared and distributed computing environment
US9122535B2 (en) * 2011-11-22 2015-09-01 Netapp, Inc. Optimizing distributed data analytics for shared storage
CN102521044B (zh) * 2011-12-30 2013-12-25 北京拓明科技有限公司 一种基于消息中间件的分布式任务调度方法及系统
US9430290B1 (en) * 2015-03-31 2016-08-30 International Business Machines Corporation Determining storage tiers for placement of data sets during execution of tasks in a workflow
CN104869154A (zh) * 2015-04-27 2015-08-26 江务学 统筹资源可信度与用户满意度的分布式资源调度方法
CN105187327A (zh) * 2015-08-14 2015-12-23 广东能龙教育股份有限公司 一种分布式消息队列中间件

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095327A (zh) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 一种分布式etl系统及调度方法
CN105338028A (zh) * 2014-07-30 2016-02-17 浙江宇视科技有限公司 一种分布式服务器集群中主从节点选举方法及装置
CN105589756A (zh) * 2014-12-03 2016-05-18 中国银联股份有限公司 批处理集群系统以及方法
CN105447097A (zh) * 2015-11-10 2016-03-30 北京北信源软件股份有限公司 数据采集方法及系统
CN105893497A (zh) * 2016-03-29 2016-08-24 杭州数梦工场科技有限公司 一种任务处理方法和装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109088947A (zh) * 2018-09-29 2018-12-25 北京奇虎科技有限公司 基于分层传输的数据分发系统、方法及服务器
CN110287228A (zh) * 2019-05-20 2019-09-27 广西电网有限责任公司 基于电网调度域设备监测实时数据采集的实现方法
CN110287228B (zh) * 2019-05-20 2022-08-23 广西电网有限责任公司 基于电网调度域设备监测实时数据采集的实现方法
CN110532096A (zh) * 2019-08-28 2019-12-03 广东乐之康医疗技术有限公司 一种多节点分组并行部署的系统和方法
CN110532096B (zh) * 2019-08-28 2022-12-30 深圳市云存宝技术有限公司 一种多节点分组并行部署的系统和方法
CN110928662A (zh) * 2019-11-28 2020-03-27 国网信息通信产业集团有限公司 一种面向微服务架构的分布式定时任务调度器
CN113760485A (zh) * 2020-07-16 2021-12-07 北京沃东天骏信息技术有限公司 定时任务的调度方法、装置、设备及存储介质
CN112118291A (zh) * 2020-08-13 2020-12-22 北京思特奇信息技术股份有限公司 一种业务流量的负载均衡系统和方法
CN112118291B (zh) * 2020-08-13 2022-11-18 北京思特奇信息技术股份有限公司 一种业务流量的负载均衡系统和方法
CN112231098A (zh) * 2020-09-29 2021-01-15 北京三快在线科技有限公司 任务处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN106993019A (zh) 2017-07-28
CN106993019B (zh) 2019-11-19

Similar Documents

Publication Publication Date Title
WO2018099067A1 (zh) 分布式任务调度方法和系统
US20220124049A1 (en) Distributed fair allocation of shared resources to constituents of a cluster
US9999030B2 (en) Resource provisioning method
JP4515314B2 (ja) 計算機システムの構成再現方法
US10778750B2 (en) Server computer management system for supporting highly available virtual desktops of multiple different tenants
US10846185B2 (en) Method for processing acquire lock request and server
CN105939389A (zh) 负载均衡方法及装置
JP2002533809A (ja) 漸進変化を伴うオブジェクトハッシング
US6968359B1 (en) Merge protocol for clustered computer system
US10320905B2 (en) Highly available network filer super cluster
WO2015192584A1 (zh) 虚拟路由系统及方法
CN114070822B (zh) 一种Kubernetes Overlay IP地址管理方法
EP3442201B1 (en) Cloud platform construction method and cloud platform
WO2014205847A1 (zh) 一种分区平衡子任务下发方法、装置与系统
US20130205011A1 (en) Service providing system
US10761869B2 (en) Cloud platform construction method and cloud platform storing image files in storage backend cluster according to image file type
WO2012055242A1 (zh) 分布式哈希表网络的负载均衡实现方法及装置
CN111427670A (zh) 任务调度方法和系统
WO2015192583A1 (zh) 一种互联网协议ip地址分配方法、装置、服务器和终端
CN113127444B (zh) 一种数据迁移方法、装置、服务器及存储介质
WO2020158968A1 (ko) 하이브리드 p2p 방식의 클러스터 시스템에서의 작업 노드 확장 방법
CN109005071B (zh) 一种决策部署方法和调度设备
CN111404978A (zh) 一种数据存储方法及云存储系统
CN113259426B (zh) 微服务中解决数据依赖的方法、系统、设备和介质
CN111400110B (zh) 数据库访问管理系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17876360

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/10/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17876360

Country of ref document: EP

Kind code of ref document: A1