CN114237891A

CN114237891A - Resource scheduling method and device, electronic equipment and storage medium

Info

Publication number: CN114237891A
Application number: CN202111555577.5A
Authority: CN
Inventors: 李超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-25

Abstract

The disclosure relates to a resource scheduling method, a resource scheduling device, an electronic device and a storage medium. The method comprises the following steps: receiving resource information sent by a shuffle processing node through a global management component, wherein the global management component and the shuffle processing node are respectively and independently packaged outside a calculation engine; the method comprises the steps that a resource request of a task is obtained from a first scheduling queue through a global management component, the first scheduling queue is used for storing the resource request of the task, and the resource request is sent to the global management component when a computing engine starts the task and is added into the first scheduling queue through the global management component; the resource request is processed by a global management component, and a target shuffle processing node for processing the task is determined based on the shuffle processing node's resource information. According to the scheme disclosed by the invention, the global management of the resource requests of the shuffle processing nodes and the tasks is realized through the global management component, the flow is uniformly scattered on each shuffle processing node, and the global resource load balance is realized.

Description

Resource scheduling method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a resource scheduling method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of computer technology, a shuffle mechanism exists in computing engines inside an enterprise, such as Mapreduce (a kind of map reduction model), Spark (a kind of computing engine), and other tasks. The shuffling mechanism takes over data transfer between MapTask and ReduceTask in tasks.

Currently, the shuffle mechanism assigns a corresponding node to each task according to a local view mode to process the task. In the local view mode, all nodes report resource information such as their own loads, CPUs (Central Processing units) and memories to a remote state storage service. When a task is started, a task manager in the computing engine acquires a resource snapshot of the total number of the shuffled management nodes with reported resource information from the state storage service. And preferentially selecting nodes with more resources left in each task for scheduling according to the resource snapshot.

However, if the number of tasks is large, the resource snapshots acquired by the tasks have a deviation, and a plurality of tasks are simultaneously scheduled to the same node, which causes a node load to increase dramatically, thereby affecting the stability of the shuffling processing.

Disclosure of Invention

The present disclosure provides a resource scheduling method, apparatus, electronic device, computer-readable storage medium, and computer program product to at least solve a problem in the related art that a plurality of tasks are simultaneously scheduled to the same shuffle processing node, thereby affecting the stability of the shuffle service. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a resource scheduling method, the method including:

receiving resource information sent by a shuffle processing node through a global management component, wherein the global management component and the shuffle processing node are respectively and independently packaged outside a calculation engine;

acquiring a resource request of a task from a first scheduling queue through the global management component, wherein the first scheduling queue is used for storing the resource request of the task, and the resource request is sent to the global management component when the computing engine starts the task and is added to the first scheduling queue through the global management component;

processing, by the global management component, the resource request, determining a target shuffle processing node for processing the task based on the shuffle processing node's resource information.

In one embodiment, the global management component comprises a task state service;

the method further comprises the following steps:

when the computing engine starts the task, a task management component sends a resource request of the task to the task state service in the global management component, wherein the task management component is a component independently packaged outside the computing engine;

adding, by the task state service, the resource request to the first scheduling queue.

In one embodiment, the target shuffle processing nodes for processing the task comprise target shuffle processing nodes corresponding to respective partitions of the task;

after the determining a target shuffle processing node for processing the task, further comprising:

establishing a mapping relationship between the partitions of the tasks and the target shuffle processing node through the global management component, and sending the mapping relationship to the task management component through the task state service;

obtaining the mapping relation from the task management component through a shuffle write node, where the shuffle write node is a node independently packaged outside the compute engine and used for receiving key value data obtained by processing the task by a mapping task in the compute engine;

and after the shuffle write-in node acquires the key value data obtained by mapping the task, sending the key value data of each partition to a target shuffle processing node corresponding to each partition according to the mapping relation.

In one embodiment, the global management component comprises a node state service; the receiving, by the global management component, the resource information sent by the shuffle processing node includes:

receiving, by a node state service in the global management component, resource information sent by the shuffle processing node.

In one embodiment, the method further comprises:

when the global management component fails to process the resource request, adding the resource request to a second scheduling queue;

and when a preset request scheduling condition is met, acquiring the resource request from the second scheduling queue, and adding the resource request to the first scheduling queue again.

In one embodiment, the method further comprises:

when the global management component determines that the target shuffle processing node is abnormal, acquiring the abnormal type of the target shuffle processing node;

and re-determining a new target shuffle processing node for processing the task by adopting an exception handling mode corresponding to the exception type.

In one embodiment, the re-determining a new target shuffle processing node for processing the task in an exception handling manner corresponding to the exception type includes:

when the exception type is a first type, the node state of the target shuffle processing node is detected again after waiting for a preset time, when the node state is acquired to be normal, the target shuffle processing node is continuously used, and the first type is an exception type which can be repaired by a system;

and when the exception type is a second type, re-determining a new target shuffle processing node corresponding to the task, wherein the second type is an exception type which cannot be repaired by the system.

According to a second aspect of the embodiments of the present disclosure, there is provided a resource scheduling apparatus, the apparatus including:

a receiving module configured to execute receiving resource information sent by a shuffle processing node through a global management component, the global management component and the shuffle processing node being independently encapsulated outside a compute engine, respectively;

a request obtaining module configured to execute a resource request for obtaining a task from a first scheduling queue through the global management component, where the first scheduling queue is used for storing the resource request for the task, and the resource request is sent to the global management component when the computing engine starts the task and is added to the first scheduling queue through the global management component;

a first resource scheduling module configured to perform processing of the resource request by the global management component, determine a target shuffle processing node for processing the task based on resource information of the shuffle processing node.

the device further comprises:

a request sending module configured to execute, when the computing engine starts the task, sending a resource request of the task to the task state service in the global management component through a task management component, where the task management component is a component independently packaged outside the computing engine;

a first queue update module configured to perform adding the resource request to the first scheduling queue through the task state service.

the device further comprises:

a relationship generation module configured to perform establishing, by the global management component, a mapping relationship between partitions of the task and the target shuffle processing node, the mapping relationship being sent to the task management component by the task state service;

a relationship obtaining module configured to perform obtaining the mapping relationship from the task management component through a shuffle write node, where the shuffle write node is a node independently encapsulated outside the compute engine and is used to receive key value data obtained by processing the task by a mapping task in the compute engine;

and the data sending module is configured to send the key value data of each partition to the target shuffle processing node corresponding to each partition according to the mapping relationship after the shuffle write node acquires the key value data obtained by mapping the task.

In one embodiment, the global management component comprises a node state service; the receiving module is configured to execute receiving the resource information sent by the shuffle processing node through a node state service in the global management component.

In one embodiment, the apparatus further comprises:

a second queue update module configured to perform adding the resource request to a second scheduling queue when the global management component fails to process the resource request;

and the third queue updating module is configured to acquire the resource request from the second scheduling queue and add the resource request to the first scheduling queue again when a preset request scheduling condition is met.

In one embodiment, the apparatus further comprises:

a type determination module configured to perform, when the global management component determines that an exception occurs in the target shuffle processing node, obtaining an exception type of the target shuffle processing node;

a second resource scheduling module configured to perform re-determining a new target shuffled processing node for processing the task using an exception handling manner corresponding to the exception type.

In one embodiment, the second resource scheduling module is configured to perform:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the resource scheduling method of any of the above first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the resource scheduling method of any one of the above first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which includes instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the resource scheduling method of any one of the above first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

by deploying an independently packaged global management component and shuffle processing nodes outside of the compute engine. The global management component receives resource requests of tasks and stores the resource requests to a first scheduling queue. The global management component receives the resource information sent by the shuffle processing nodes, processes the resource requests of the tasks in the first scheduling queue according to the resource information of the shuffle processing nodes, and determines the target shuffle processing nodes for processing the tasks, so that the global management of the shuffle processing nodes and the resource requests of the tasks is realized through the global management component, the flow is uniformly scattered on each shuffle processing node, and the load balance of global resources is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a resource scheduling method according to an example embodiment.

Fig. 2 is a flow chart illustrating a method of resource scheduling in accordance with an example embodiment.

FIG. 3 is a flow diagram illustrating the handling of an exception condition in accordance with an exemplary embodiment.

FIG. 4 is a flow diagram illustrating the handling of an exception condition in accordance with an exemplary embodiment.

FIG. 5 is a flow diagram illustrating a shuffle process for tasks in accordance with an exemplary embodiment.

Fig. 6 is a flowchart illustrating a resource scheduling method according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating a resource scheduling apparatus according to an example embodiment.

FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should also be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are both information and data that are authorized by the user or sufficiently authorized by various parties.

The present disclosure provides a resource scheduling method, which can be applied to the application environment shown in fig. 1. Wherein the terminal 110 interacts with the server 120 through the network. At least one compute engine is deployed in the server 120, and a global management component and a shuffle processing node, each independently packaged, are deployed outside of the at least one compute engine. The server 120 receives the resource information transmitted by the shuffle processing node in real time or at regular time through the global management component. Each time the server 120 acquires a task uploaded by the terminal 110, the server sends a resource request of the task to the global management component, and adds the resource request to the first scheduling queue through the global management component. And when the global management component processes the resource request of the task in the first scheduling queue, determining a target shuffle processing node for processing the task according to the latest reported resource information of the shuffle processing node.

The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The portable wearable device can be a smart watch, a smart bracelet, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

Fig. 2 is a flowchart illustrating a resource scheduling method according to an exemplary embodiment, as shown in fig. 2, including the following steps.

In step S210, the resource information transmitted by the shuffle processing node is received by the global management component.

The global management component is a component independently packaged outside the computing engine, may have its own attributes and methods, and may be used for, but not limited to, being responsible for global resource scheduling, global task management, life cycle management of the shuffle processing node, heartbeat request of the shuffle processing node, managing life cycles of all tasks, and the like.

The shuffle processing node is a node independently encapsulated outside the compute engine, may have its own attributes and methods, and may be but not limited to responsible for aggregating and sorting data of tasks according to partitions of the tasks, and overflowing and writing to a remote file system (which may be an HDFS). The number of shuffle processing nodes may be plural.

Specifically, each shuffle processing node is connected to the global management component, and reports the current own load information and the remaining resource information to the global management component in real time or at regular time (for example, every S seconds). And caching the load information and the resource information of each shuffle processing node into a memory through a global management component.

In step S220, a resource request of a task is obtained from a first scheduling queue through the global management component, where the first scheduling queue is used to store the resource request of the task, and the resource request is sent to the global management component when the computing engine starts the task and is added to the first scheduling queue through the global management component.

Wherein a task may be a collection of work that a user requires to be done by a compute engine in a process of a problem computation or in a transaction. The task can be uploaded to the computing engine by a user through the terminal device, and can also be scheduled to the computing engine through the task scheduler.

In particular, whenever a task is started on the compute engine side, a resource request for the task may be sent to the global management component by a task manager on the compute engine side. The resource request is added to the first scheduling queue by the global management component. And the global management component calls the first scheduling thread to periodically execute the scheduling process, acquires the resource request of the resource to be allocated from the first scheduling queue and starts a new scheduling process.

In one example, the first scheduling queue may be an order queue. The sequence queue is based on the first-in first-out principle, and the resource request stored in the first scheduling queue first is processed preferentially. And when the first scheduling thread executes the scheduling flow regularly, the resource requests firstly stored to the first scheduling queue are obtained according to the storage sequence of the resource requests. In another example, the first scheduling queue may be a priority queue. The priority is based on the principle of highest priority, and the resource request with the highest priority is processed preferentially. The priority may be determined according to a task type of the task, and the like. And when the first scheduling thread executes the scheduling flow regularly, acquiring the resource request with the highest priority from the first scheduling queue according to the priority of the resource request.

In step S230, the resource request is processed by the global management component, and a target shuffle processing node for processing the task is determined based on the resource information of the shuffle processing node.

Specifically, the global management component acquires the latest reported resource information of each shuffle processing node from the memory through the first scheduling thread. The plurality of shuffle processing nodes are sorted in the order of the resource information from high to low, and the shuffle processing node with the highest sort (i.e., the largest resource) is selected from the sorted shuffle processing nodes as the target shuffle processing node of the processing task.

In one embodiment, the resource request may also carry a resource requirement of the task. The first scheduling thread may compare the resource requirements with the latest reported resource information of each shuffle processing node, and determine a shuffle processing node that satisfies the resource requirements as a target shuffle processing node.

In the resource scheduling method, the global management component and the shuffle processing node which are respectively and independently packaged are deployed outside the computing engine. The global management component receives resource requests of tasks and stores the resource requests to a first scheduling queue. The global management component receives the resource information sent by the shuffle processing nodes, processes the resource requests of the tasks in the first scheduling queue according to the resource information of the shuffle processing nodes, and determines the target shuffle processing nodes for processing the tasks, so that the global management of the shuffle processing nodes and the resource requests of the tasks is realized through the global management component, the flow is uniformly scattered on each shuffle processing node, and the load balance of global resources is realized.

In an exemplary embodiment, the method further comprises: when the global management component fails to process the resource request, adding the resource request to a second scheduling queue; and when the preset request scheduling condition is met, acquiring the resource request from the second scheduling queue, and adding the resource request to the first scheduling queue again.

Specifically, when the global management component determines that there are no vacant shuffle processing nodes currently, or the resource information of each shuffle processing node currently does not satisfy the resource requirement of the task, and the like, it determines that the resource request processing fails. The global management component may add the resource request that failed processing to the second scheduling queue. And when the request scheduling condition is met, calling a second scheduling thread to add the resource request in the second scheduling queue to the first scheduling queue so that the global management component can retry the resource request which fails in processing.

In this embodiment, a retry mechanism of the resource processing request is deployed to support retry processing of the resource processing request which fails in processing, so that it can be ensured that the resource request of the task can be effectively processed, and stability of resource scheduling is improved.

In an exemplary embodiment, the global management component includes an RpcService (remote procedure call service), which may include, but is not limited to, a task state service (SessionStateService). When a task manager (ApplicationMaster) on the computing engine side starts a task, a task management component starts the task accordingly. The task management component may be deployed corresponding to a task manager in the computing engine, belong to the same process as the task manager in the task engine, and share the same JVM (Java Virtual Machine). In one example, the task management component may be embedded as an SDK (Software Development Kit) in a task manager on the compute engine side. The task management component sends a resource request of the task to a task state service in the global management component so as to add the resource request to the first scheduling queue through the task state service.

In the embodiment, the task state service is deployed in the global management component, so that the resource request of the task can be isolated from other data, thereby avoiding the mutual influence among other data and ensuring the processing accuracy of the resource request.

In an exemplary embodiment, the remote procedure call service in the global management component further comprises a node status service (WorkerStatSeServer). In step S210, the global management component receives the resource information sent by the shuffle processing node, which may be implemented in the following manner.

Specifically, each shuffle processing node sends its own load information and resource information to a node state service in the global management component in real time or on a regular basis. And caching the received load information and the resource information into a memory through the node state service so as to be used when the first scheduling thread is scheduled.

In the embodiment, the node state service is deployed in the global management component, so that the resource information of the shuffle processing node can be isolated from other data, thereby avoiding mutual influence with other data and ensuring the accuracy of the resource information.

In an exemplary embodiment, the target shuffle processing nodes for processing the task include target shuffle processing nodes corresponding to respective partitions of the task. As shown in fig. 3, after determining the target shuffle processing node for processing the task at step S230, the method further includes the following steps:

in step S310, a mapping relationship between the partition of the task and the target shuffle processing node is established by the global management component, and the mapping relationship is sent to the task management component by the task state service.

Specifically, the global management component calls a first scheduling thread to acquire a resource request of a task from a first scheduling queue, and acquires a partition of the task. And acquiring the latest reported resource information of each shuffle processing node from the memory. The plurality of shuffle processing nodes are ordered in order of resource information from high to low. And the first scheduling thread allocates corresponding target shuffle processing nodes to each partition according to the sequence from high to low, and further establishes a mapping relation between the partitions and the target shuffle processing nodes. The first scheduling thread sends the mapping relation to the task state service, and the mapping relation is sent to the task management component through the task state service.

In one example, the task includes a partitions, and the global management component may determine, through a first dispatch thread, a shuffle processing nodes having the largest resource from the shuffle processing nodes, use the a shuffle processing nodes as target shuffle processing nodes, and generate a one-to-one correspondence between the partitions and the target shuffle processing nodes.

In another example, when the number of shuffle processing nodes that meet the resource requirements of the partition is less than the number of partitions, the global management component may perform a merge process on the partitions, so that the same shuffle processing node can process data under multiple partitions, thereby generating a mapping between the partitions and the target shuffle processing node.

In step S320, a mapping relationship is obtained from the task management component by the shuffle write node.

In one embodiment, the shuffle write node may be a stand-alone packaged node deployed outside of the compute engine. In another embodiment, the shuffle write node may be deployed corresponding to a mapping task on the compute engine side, and both may share the same process. In one example, the shuffled write node may be embedded as an SDK in a mapping task on the compute engine side.

In another embodiment, the shuffle write node and the compute engine side mapping task may have a one-to-one correspondence. In one example, if the calculation engine side includes M mapping tasks, at least M shuffle write nodes may be deployed, and there are M shuffle write nodes in one-to-one correspondence with the mapping tasks, where each shuffle write node is used to further process the key value data obtained by processing the mapping task corresponding to the shuffle write node.

Specifically, when a task manager on the compute engine side starts a task, the task management component starts accordingly. A task manager on the calculation engine side starts a mapping task, and a shuffle write node set corresponding to the mapping task is started accordingly. The shuffle write node, upon startup, requests a mapping relationship between the partition and the target shuffle write node from a task management component.

In step S330, after the shuffle write node obtains the key value data obtained by mapping the task, the key value data of each partition is sent to the target shuffle processing node corresponding to each partition according to the mapping relationship.

Specifically, each mapping task processes the task, producing a series of key-value data. For each key value data output by each mapping task, the partition of each key value data can be obtained by calculating the hash value of each key value and then performing modulo on the hash value by adopting the number of reduction tasks. And each mapping task aggregates the key value data under the same partition, and writes the aggregated key value data into a local buffer register of a corresponding shuffle write node. And when the number of the key value data in the local buffer register reaches a threshold value, sending the key value data corresponding to each partition to the target shuffle processing node corresponding to the partition through the shuffle write node corresponding to each mapping task according to the mapping relation between the partition and the target shuffle processing node.

In one embodiment, after receiving the key value data corresponding to each partition, the target shuffle processing node may also sort the key value data corresponding to each partition according to a pre-deployed sort logic. The sorted key-value data corresponding to the respective partitions is stored to a file system (e.g., a Hadoop distributed file system).

In this embodiment, by deploying the shuffle write node and the shuffle processing nodes which are respectively and independently packaged, and establishing a mapping relationship between the partition and the shuffle processing nodes, aggregation processing can be performed on multiple pieces of mapping task output data from the partition dimension, so that the reduction task does not need to pull data from a local file corresponding to each mapping task, thereby reducing the number of data IO (input/output) times and improving the input/output efficiency of the shuffle.

In an exemplary embodiment, the method further comprises: when the global management component determines that the target shuffle processing node is abnormal, acquiring the abnormal type of the target shuffle processing node; and re-determining a new target shuffle processing node for processing the task by adopting an exception handling mode corresponding to the exception type.

The exception type may include, but is not limited to, a connection exception (e.g., a connection timeout), the cache space of the shuffled processing node is full, a Cyclic Redundancy Check (CRC) exception, a downtime, and the like, and may be identified by an error code or the like. And an exception handling mode corresponding to the exception type is deployed in the global management component. When detecting that the target shuffle processing node has an abnormality, the global management component processes the target shuffle processing node having the abnormality according to an abnormality processing mode corresponding to the type of the abnormality, and determines a new target shuffle processing node to be used for processing the task.

In the embodiment, high-availability and high-fault-tolerance construction is performed on the shuffle processing nodes, and the exception processing mode corresponding to the exception type is deployed, so that when the shuffle processing nodes are abnormal, a new shuffle processing node can be determined in time, and normal processing of tasks can be guaranteed.

In an exemplary embodiment, the anomalies may be classified into a first type and a second type according to whether the anomalies can be repaired by the system on their own. The first type is an exception type which can be repaired by the system, for example, a connection exception (such as connection timeout) and the cache space of the shuffle processing node are full. The second type is an exception type that the system cannot repair, for example, a CRC (Cyclic Redundancy Check) exception, a chunk out of expectation, a downtime, and the like. In this embodiment, the new target shuffle processing node for processing the task is redetermined by using an exception handling manner corresponding to the exception type, which may specifically be implemented by the following processes:

specifically, when the global management component determines that the exception type of the target shuffle processing node is a first type, a preset time duration is waited, so that the target shuffle processing node can execute a restart operation. After the preset time length, the global management component can detect the node state of the target shuffle processing node through a heartbeat request and the like, and when the state of the target shuffle processing node is determined to be normal, the target shuffle processing node is continuously used.

In one embodiment, if the global management component determines that the number of reboots of the target shuffle processing node reaches a preset number, but detects that the state of the target shuffle processing node is still abnormal, the exception type of the target shuffle processing node may be updated to the second type, and an exception handling manner corresponding to the second type may be executed.

When the global management component determines that the exception type of the target shuffle processing node is a second type, a new target shuffle processing node corresponding to the task is redetermined. In one embodiment, a new resource request corresponding to the task may be regenerated, and the new resource request may be added to the first scheduling queue. The processing manner of the new resource request in the first scheduling queue by the global management component may refer to the above processing manner of the resource request, and is not specifically described herein. In one embodiment, the first scheduling queue may be a priority queue. The priority of the new resource request can be higher than that of other resource requests, so that the global management component can process the new resource request preferentially, and the processing efficiency of the current task is improved.

In the embodiment, the exception types are divided according to the repair capability of the system, and different exception handling modes are adopted for different types, so that the pressure of the system on handling exception nodes can be reduced, and the optimization of the overall performance of the shuffle service is facilitated.

Fig. 4 illustrates an exception handling manner of the shuffle processing node, which may be specifically implemented by the following steps.

In step S402, the shuffle write node acquires the key value data obtained by the mapping task process corresponding to the shuffle write node itself, and transmits the key value data to the data block remote call service in the destination shuffle processing node.

In step S404, if the target shuffle processing node fails to process the key value data, an exception notification is sent to the shuffle write node. The exception notification carries the exception type.

In step S406, the shuffle write node reports the exception type of the target shuffle processing node to the task management component.

In step S408, the task management component sends a new resource request of the task to the global management component, and attempts to restart the target shuffle processing node when the exception type is the first type. When the exception type is the second type, the global management component determines a new target shuffle processing node, and proceeds to step S410 to step S414.

In step S410, the global management component sends the mapping relationship between the task and the new target shuffle processing node to the task management component.

In step S412, the task management component sends a mapping relationship between the task and the new target shuffle processing node to the shuffle write node.

In step S414, the shuffle write node switches the data stream to a new target shuffle processing node in accordance with the mapping relationship.

FIG. 5 is a flowchart illustrating a process for handling tasks, as shown in FIG. 5, applied to a shuffle service deployed independently outside of a compute engine, in accordance with an illustrative embodiment. The Shuffle service includes a task management component (App Shuffle Master, ASM), a global management component (Shuffle Master), a Shuffle write node (Shuffle Writer), a Shuffle process node (Shuffle Worker), and a Shuffle read node (Shuffle Reader), which are independently packaged. The function of each component/node is explained below.

A global management component: and the system is responsible for global resource scheduling, global task management, life cycle management of the shuffle processing node, heartbeat request of the shuffle processing node and the like.

A task management component: the task manager can be deployed corresponding to a task manager (ApplicationMaster) in a computing engine, belongs to the same process with the task manager, and shares the same JVM. In one example, the task management component may be embedded in the task manager as an SDK. The task management component is responsible for resource management of the individual tasks, handling RPC (Remote Procedure Call) requests for the shuffle write node and the shuffle read node, and managing the life cycles of the shuffle write node and the shuffle read node.

Shuffle write node: the SDK can be embedded in a mapping task (Map) on the side of a computing engine and is responsible for sending key value data obtained by processing of the mapping task to a corresponding shuffle processing node according to the partition dimension. And safely exiting after the shuffle processing node completely persists the sorted key-value data.

A shuffle processing node: and the key value data are aggregated and sorted according to the partition dimension and are overflowed to the remote HDFS. After the overflow write is complete, the task management component and the shuffle write node are notified of the persisted result.

The shuffle reading node can be deployed corresponding to a reduction task (Reduce) in a computing engine, belongs to the same process with the reduction task, and shares the same JVM. In one embodiment, the shuffle read node may be embedded in the reduction task as an SDK. In another embodiment, the shuffle read node and the reduce task may have a one-to-one correspondence. The shuffle reading node is responsible for pulling a shuffle file set to be processed from the HDFS, and returning the shuffle file set to a reduction task at the side of a computing engine after carrying out local deduplication according to consistency metadata.

As shown in fig. 5, taking the task as Mapreduce Job as an example, the data processing of the task can be implemented by the following steps.

In step S502, a Mapreduce Job (i.e., the task manager in Mapreduce) is started, whereupon the task management component is started.

In step S504, the task management component applies for a resource from the global management component, requests the global management component to perform resource scheduling, and determines a mapping relationship between a partition of the task and the target shuffle processing node. The shuffle service mode is started after the application is successful.

Fig. 6 illustrates a flow diagram of resource scheduling. As shown in fig. 6, this can be achieved by the following steps.

In step S6042, each shuffle processing node transmits current load information and resource information to a node status service in the global management component.

In step S6044, the task management component sends a resource request of the task to the task status service in the global management component.

In step S6046, the resource request is added to the first scheduling queue by the task state service.

In step S6048, the global management component calls the first scheduling thread, and obtains the resource request of the task from the first scheduling queue through the first scheduling thread.

In step S6050, the first scheduling thread determines a target shuffle processing node corresponding to each partition of the task, based on the resource information that is newly reported by each shuffle processing node.

Specifically, the first dispatch thread acquires a shuffle processing node whose status is available. The shuffle processing nodes are added to a priority queue. The plurality of shuffle processing nodes are ordered according to resource information (e.g., CPU, memory). And distributing the shuffle processing node with the most residual resources to each partition of the task, and establishing a mapping relation between the partition and the target shuffle processing node.

In step S6052, if the allocation is successful, the first scheduling thread sends the mapping relationship between the partition and the target shuffle processing node to the task state service to send the mapping relationship to the task management component through the task state service.

In step S6054, if the allocation fails, the first scheduling thread sends the resource request of the task to the second scheduling queue.

In step S6056, a second scheduling thread is invoked to schedule the resource request in the second scheduling queue.

In step S6058, the second scheduling thread is invoked to add the resource request in the second scheduling queue to the first scheduling queue again, and steps S6048 to S6054 are repeated.

In step S506, the task manager starts a mapping task. The shuffle write node starts with the map task, both sharing the same process. After the shuffle write node starts, a task management component obtains a mapping relationship between the partitions and the target shuffle processing node.

In step S508, each mapping task writes the processed key value data into the local buffer register of the corresponding shuffle write node through the interface. And then the shuffle write node actively sends the key value data in the buffer register to the target shuffle processing node corresponding to the partition according to the mapping relation between the partition and the target shuffle processing node.

In step S510, each target shuffle processing node sorts the key value data according to the partition dimension, and persists the sorted key value data in the HDFS.

In step S512, each target shuffle processing node transmits the sorted key-value data to the task management component on the storage path of the HDFS.

In step S514, after the mapping phase is complete, the task manager initiates a reduce task and the shuffle read node initiates with the reduce task. And the started shuffle reading node acquires the storage paths corresponding to the partitions from the task management component.

In step S516, the shuffle read node reads key value data from the HDFS according to the storage path corresponding to each partition. And returning the read key value data to the reduction task at the side of the computing engine after local deduplication.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

It is understood that the same/similar parts between the embodiments of the method described above in this specification can be referred to each other, and each embodiment focuses on the differences from the other embodiments, and it is sufficient that the relevant points are referred to the descriptions of the other method embodiments.

Fig. 7 is a block diagram illustrating a resource scheduler X00 according to an example embodiment. Referring to fig. 7, the apparatus includes a receiving module X02, a request acquiring module X04, and a first resource scheduling module X06.

A receiving module X02, configured to execute receiving the resource information sent by the shuffle processing node through a global management component, wherein the global management component and the shuffle processing node are respectively and independently packaged outside the computing engine; the request acquisition module X04 is configured to execute a resource request for acquiring a task from a first scheduling queue through a global management component, wherein the first scheduling queue is used for storing the resource request of the task, and the resource request is sent to the global management component when a computing engine starts the task and is added to the first scheduling queue through the global management component; a first resource scheduling module X06 configured to perform processing of resource requests by a global management component to determine a target shuffle processing node for a processing task based on the shuffle processing node's resource information.

In an exemplary embodiment, the global management component includes a task state service; the apparatus X00 further includes: the request sending module is configured to execute a resource request for sending the task to a task state service in the global management component through the task management component when the computing engine starts the task, wherein the task management component is a component independently packaged outside the computing engine; a first queue update module configured to perform adding the resource request to the first scheduling queue through the task state service.

In an exemplary embodiment, the target shuffle processing nodes for processing the task include target shuffle processing nodes corresponding to respective partitions of the task; the apparatus X00 further includes: the relationship generation module is configured to execute the establishment of the mapping relationship between the partitions of the tasks and the target shuffle processing nodes through the global management component and send the mapping relationship to the task management component through the task state service; the system comprises a relation acquisition module, a calculation engine and a data processing module, wherein the relation acquisition module is configured to execute the step of acquiring a mapping relation from a task management component through a shuffle write-in node, and the shuffle write-in node is a node which is independently packaged outside the calculation engine and is used for receiving key value data obtained by processing a task by a mapping task in the calculation engine; and the data sending module is configured to execute sending the key value data of each partition to the target shuffle processing node corresponding to each partition according to the mapping relation after the shuffle writing node acquires the key value data obtained by mapping the task.

In an exemplary embodiment, the global management component includes a node state service; a receiving module X02 configured to perform receiving the resource information sent by the shuffle processing node through a node state service in the global management component.

In an exemplary embodiment, the apparatus X00 further includes: a second queue update module configured to perform adding the resource request to a second scheduling queue when the global management component fails to process the resource request; and the third queue updating module is configured to acquire the resource request from the second scheduling queue and add the resource request to the first scheduling queue again when the preset request scheduling condition is met.

In an exemplary embodiment, the apparatus X00 further includes: the type determining module is configured to execute the step of obtaining the exception type of the target shuffle processing node when the global management component determines that the target shuffle processing node has an exception; and a second resource scheduling module configured to perform re-determining a new target shuffled processing node for processing the task using an exception handling manner corresponding to the exception type.

In an exemplary embodiment, the second resource scheduling module is configured to perform: when the exception type is a first type, the node state of the target shuffle processing node is detected again after waiting for a preset time length, when the acquired node state is normal, the target shuffle processing node is continuously used, and the first type is an exception type which can be repaired by a system; and when the exception type is a second type, re-determining a new target shuffle processing node corresponding to the task, wherein the second type is the exception type which cannot be repaired by the system.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 8 is a block diagram illustrating an electronic device S00 for resource scheduling in accordance with an example embodiment. For example, the electronic device S00 may be a server. Referring to FIG. 8, electronic device S00 includes a processing component S20 that further includes one or more processors and memory resources represented by memory S22 for storing instructions, such as applications, that are executable by processing component S20. The application program stored in the memory S22 may include one or more modules each corresponding to a set of instructions. Further, the processing component S20 is configured to execute instructions to perform the above-described method.

The electronic device S00 may further include: the power supply module S24 is configured to perform power management of the electronic device S00, the wired or wireless network interface S26 is configured to connect the electronic device S00 to a network, and the input/output (I/O) interface S28. The electronic device S00 may operate based on an operating system stored in the memory S22, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory S22 comprising instructions, executable by the processor of the electronic device S00 to perform the above method is also provided. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product comprising instructions executable by a processor of the electronic device S00 to perform the above method.

It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for scheduling resources, the method comprising:

2. The method of claim 1, wherein the global management component comprises a task state service;

the method further comprises the following steps:

3. The method of resource scheduling according to claim 2, wherein the target shuffle processing nodes for processing the task comprise target shuffle processing nodes corresponding to respective partitions of the task;

4. The resource scheduling method according to any one of claims 1 to 3, wherein the global management component comprises a node state service; the receiving, by the global management component, the resource information sent by the shuffle processing node includes:

5. The method for scheduling resources according to any one of claims 1 to 3, further comprising:

6. The method for scheduling resources according to any one of claims 1 to 3, further comprising:

7. An apparatus for scheduling resources, the apparatus comprising:

the request acquisition module is configured to execute a resource request for acquiring a task from a first scheduling queue through the global management component, wherein the first scheduling queue is used for storing the resource request of the task, and the resource request is sent to the global management component when the computing engine starts the task and is added to the first scheduling queue through the global management component;

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the resource scheduling method of any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the resource scheduling method of any one of claims 1 to 6.

10. A computer program product comprising instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the resource scheduling method of any one of claims 1 to 6.