CN113407331A

CN113407331A - Task processing method and device and storage medium

Info

Publication number: CN113407331A
Application number: CN202010187951.XA
Authority: CN
Inventors: 李军; 许晋晗; 邓文通; 吴超楠; 刘梵; 胡聪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2021-09-17

Abstract

The embodiment of the application provides a method and a device for task processing and a storage medium, which are used for reducing the time delay of task processing. In the application, a driving node executes pre-starting according to a starting instruction of a Yarn scheduler, wherein the driving node is selected from a Spark cluster by the Yarn scheduler according to a virtual task request sent by a client; the driving node acquires a target task from a message queue of the executable task and executes the acquired target task, wherein the executable task in the message queue is sent by a client; and the driving node sends the execution result of the target task to the message queue so that the client acquires the execution result of the target task from the message queue. The driving node is selected from the Spark cluster in advance by the Yarn scheduler according to the virtual task request sent by the client, so that the driving node is directly executed when a target task exists, the time for selecting the driving node by the Yarn scheduler is saved, and the time delay of task processing is reduced.

Description

Task processing method and device and storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a method and an apparatus for task processing, and a storage medium.

Background

When the client has actual task requirements, the client starts the actual tasks and sends a request for executing the actual tasks to the Yarn scheduler. After receiving the actual task request, the Yarn scheduler selects idle computing nodes from a Spark cluster managed by the Yarn scheduler, and the idle computing nodes execute the actual task; if no idle computing node exists at this time, the actual task needs to wait until the idle computing node exists, and the actual task can not be executed. The actual task is not processed in time, and the processing time of the actual task is increased.

Disclosure of Invention

The application provides a method, a device and a storage medium for task processing, which are used for reducing the time delay of task processing.

In a first aspect, an embodiment of the present application provides a method for task processing, which is applied to a server including a Spark on Yarn architecture, and the method includes:

the driving node executes pre-starting according to the starting instruction of the Yarn scheduler, wherein the driving node is selected from the Spark cluster by the Yarn scheduler according to the virtual task request sent by the client;

the driving node acquires a target task from a message queue of the executable task and executes the acquired target task, wherein the executable task in the message queue is sent by a client;

and the driving node sends the execution result of the target task to the message queue so that the client acquires the execution result of the target task from the message queue.

In a second aspect, an embodiment of the present application provides a method for task processing, which is applied to a client, and the method includes:

the client sends the executable task serving as a target task to the message queue so that the driving node acquires the target task from the message queue;

and the client detects the message queue and receives the execution result of the target task fed back by the driving node.

In a third aspect, an embodiment of the present application provides an apparatus for task processing, where the apparatus includes: the device comprises a pre-starting unit, an execution unit and a first sending unit; wherein:

the device comprises a pre-starting unit, a driving node and a processing unit, wherein the pre-starting unit is used for executing pre-starting according to a starting instruction of a Yarn scheduler, and the driving node is selected from a Spark cluster by the Yarn scheduler according to a virtual task request sent by a client;

the execution unit is used for acquiring a target task from a message queue of an executable task and executing the acquired target task, wherein the executable task in the message queue is sent by a client;

and the first sending unit is used for sending the execution result of the target task to the message queue so that the client acquires the execution result of the target task from the message queue.

In one possible implementation, the execution unit is further configured to: detecting a message queue of an executable task in real time; and when the executable task exists in the message queue of the executable task, acquiring the executable task as a target task.

In one possible implementation, the execution unit is further configured to: analyzing the target task; when the data type of the target task is an RDD (resource Distributed Dataset) data type, distributing the target task to the execution node, and enabling the driving node and the execution node to execute the target task together;

and the execution node sends a resource request to the Yarn scheduler to obtain the resource request after determining that the data type of the target task is the RDD data type.

In a fourth aspect, an embodiment of the present application provides an apparatus for task processing, where the apparatus includes: a second transmitting unit and a receiving unit, wherein:

the second sending unit is used for sending the executable task serving as a target task to the message queue so that the driving node can acquire the target task from the message queue;

and the receiving unit is used for detecting the message queue and receiving the execution result of the target task fed back by the driving node.

In a possible implementation manner, the second sending unit is further configured to: and carrying out serialization processing on the executable task.

In a possible implementation manner, the second sending unit is further configured to: and sending a virtual task request to the Yarn scheduler so that the Yarn scheduler selects a driving node for detecting the message queue from the Spark cluster.

In a fifth aspect, an embodiment of the present application provides a task processing device, including: a memory and a processor, wherein the memory is configured to store computer instructions; the processor is used for executing the computer instructions to realize the task processing method provided by the embodiment of the application.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored, and when executed by a processor, the computer instructions implement a method for task processing provided by the embodiment of the present application.

The beneficial effect of this application is as follows:

according to the method, the device and the storage medium for task processing, the driving node is selected from a Spark cluster by a Yarn scheduler according to a virtual task request sent by a client, executes pre-starting, detects a message queue of an executable task in real time, reads the executable task from the message queue as a target task when the executable task sent by the client exists in the message queue, executes the target task, determines an execution result of the target task, and sends the execution result of the target task to the message queue, so that the client obtains the execution result of the target task. When an executable task arrives, the drive node directly reads the executable task from the message queue and executes the task as a target task, the execution result is determined, the scheduling time of the Yarn scheduler is skipped, the time for the Yarn scheduler to select the drive node is saved, the time consumption is optimized, and the time delay of task processing is reduced.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating an architecture of task processing in the related art;

FIG. 2 is a schematic diagram of an application scenario of task processing provided in the present application;

FIG. 3 is a system architecture diagram of a task process provided herein;

fig. 4 is a schematic diagram of a driver node for sending a virtual task request provided in the present application;

FIG. 5 is a schematic diagram of a method for sending executable tasks provided herein;

fig. 6 is a schematic diagram of a driving node feeding back an execution result according to the present application;

FIG. 7 is a schematic overall flow chart of task processing provided herein;

FIG. 8 is a diagram illustrating the effect of task processing provided by the present application;

FIG. 9 is a flowchart of a method for task processing provided herein;

FIG. 10 is a flow chart of a method of alternative task processing provided herein;

FIG. 11 is a diagram of a task processing device according to the present application;

FIG. 12 is a block diagram of another task processing apparatus provided in the present application;

FIG. 13 is a block diagram of a computing device provided herein.

Detailed Description

The architecture and the task scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that with the occurrence of a new task scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art:

a physical machine can be split into a plurality of nodes, and each node is also called a computing node.

The Driver node, also called Spark Driver, is selected from the computing nodes in the Spark cluster when starting a task through the Spark computing framework, and is used for performing global work, such as resource application, task allocation, and the like.

The execution node, also called Spark execution, is a node that actually processes data in a task and is responsible for running the task.

And the Yarn scheduler is responsible for resource scheduling and is mainly used for distributing computing nodes to tasks.

Serialization is the conversion of the state information of the executable target task into a storable or transmitted form.

The design concept of the embodiments of the present application will be briefly described below.

As shown in fig. 1, which is a schematic diagram of a task processing architecture in the related art, it can be known from fig. 1 that after an executable task exists at a client, a jar file containing the executable task is sent to a Yarn scheduler; after receiving the jar file containing the executable task, the Yarn scheduler selects an idle computing node from the Spark cluster and sends the jar file containing the executable task to the idle computing node; and the computing node analyzes the jar file, acquires the executable task and executes the logic of the executable task.

However, there may be special cases where there are no free computing nodes, but the executable task has to be executed, and therefore the executable task is required to wait for being executed, i.e. waiting for a free computing node. Obviously, in the process of task processing, the Yarn scheduler is required to schedule idle computing nodes, the idle computing nodes process executable tasks, and a certain scheduling time is required in the scheduling process; and when no idle computing node exists, the executable task is also required to wait to be processed, so that the time for processing the task is increased.

In view of the foregoing, embodiments of the present application provide a method, an apparatus, and a storage medium for task processing.

In the method, a driving node is pre-started, a message queue of an executable task is detected, after the executable task sent by a client exists in the message queue, the executable task is read from the message queue as a target task, and the driving node executes the logic of the target task, wherein the driving node is selected from a Spark cluster in advance by a Yarn scheduler according to a virtual task request sent by the client.

Therefore, when the client sends the executable task, the driving node reads the executable task and takes the executable task as the target task, and directly executes the target task, skips the scheduling process of the Yarn scheduler, and reduces the probability of no idle driving node. According to the method and the device, time for selecting the driving node by the Yarn scheduler is saved, so that time consumption is optimized, and time delay of task processing is reduced.

Fig. 2 is a diagram of an application scenario of task processing according to an embodiment of the present application. The server 20 is communicatively connected to a plurality of client-installed terminal devices 21 via a network, which may be, but is not limited to, a local area network, a metropolitan area network, a wide area network, or the like. The terminal equipment 21 may be a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a notebook, a mobile phone, etc., or a Computer with mobile terminal equipment, including various portable, pocket, hand-held, Computer-embedded, or vehicle-mounted mobile devices, which can provide voice, data, or voice and data connectivity to a user, and exchange voice, data, or voice and data with a radio access network. Server 20 may be any background running device capable of providing internet services for managing stored data.

In the application scenario, the terminal device 21 is configured to install and run a client, the client of the terminal device 21 is configured to receive an executable task instruction triggered by a user, determine an executable task, send the executable task to the server 20, execute logic of the received executable task by the server 20, and feed back an execution result corresponding to the executable task to the client of the terminal device 21.

In the present application, a client of the terminal device 21 receives a virtual task instruction and sends the virtual task instruction to the server 20, the server 20 selects a computing node for executing a target task in a Spark cluster in advance, and detects whether an executable task arrives in real time by the preselected computing node, and when it is determined that an executable task arrives, the executable task is taken as the target task, and the computing node executes the target task, and feeds back an execution result of the target task to the client of the terminal device 21.

When the client of the terminal device 21 determines that the preset condition is met, the client actively sends a virtual task request to the server 20, so that the server 20 selects a computing node for executing the target task in the Spark cluster in advance, and the preselected computing node monitors whether the target task arrives in real time, and when the target task arrives, the computing node executes the target task and returns a task execution result to the client of the terminal device 21.

The application scene is mainly applied to a data platform of a client, the data platform uses a Spark distributed computing framework, and Spark clusters and using interfaces thereof are integrated. The method comprises the steps of receiving a task triggered by a user on a data platform, completing task processing by means of a Spark cluster, and being widely applied to the fields of data analysis, machine learning model training, user credit rating, abnormal behavior detection, speech shielding and the like.

And (3) detecting abnormal behaviors of the user: each user may have a large number of activities each day, each activity corresponding to a target task, such as login time, login location, purchase of props, presentation of props, addition of friends, modification of user profiles, etc. And detecting whether the user account is stolen or not, whether the user uses a plug-in or not and the like according to the behavior data. Because the number of users is large, the number of behaviors is large, and one server is difficult to complete analysis quickly, a plurality of servers need to be used for collaborative analysis to accelerate the analysis speed, and the plurality of servers form a Spark cluster. How the Spark cluster performs the division operation to complete the user abnormal behavior detection needs to be managed by using a Spark distributed computing framework.

Such as user speech masking, which is to mask sensitive words and abusive words, etc.: the users can speak and communicate in the game, but sensitive words, non-civilized words and the like cannot be released. But which statements should be masked require machine learning training. Therefore, a large number of utterances of the user are collected, and whether the results should be masked or not is manually noted, and machine learning training is performed. Because the data volume is large, the training speed of a single machine is low, the speed can be accelerated by adopting the combined training of a plurality of machines, and the plurality of machines form a Spark cluster, thereby providing a machine learning framework.

In the application, a client sends a virtual task request to a Yarn scheduler in a server, the Yarn scheduler selects a free computing node in a Spark cluster in advance according to the virtual task request, the selected computing node is used as a driving node, the driving node is started in advance, whether an executable task sent by the client exists in a message queue or not is detected in real time, when the executable task sent by the client exists in the message queue, the executable task is read from the message queue and is used as a target task, the logic of the target task is executed, and the execution result of the target task is sent to the message queue, so that the client obtains the execution result of the target task from the message queue. By the task processing mode provided by the application, when a target task comes, the logic of the target task is directly executed through the driving node, the scheduling time of the Yarn scheduler is reduced, the scheduling time of the Yarn scheduler is saved, the time consumption is optimized, and the time delay of task processing is reduced.

Based on the application scenario discussed in fig. 2, a task processing method provided in the embodiment of the present application is described below.

As shown in fig. 3, a system architecture diagram for task processing according to an embodiment of the present invention includes a client 30 and a server 31, where the server 31 includes a Yarn scheduler 310, a Spark cluster 311, and a message queue 312.

In the task processing embodiment provided by the application, in order to reduce the time delay of task processing, when a target task arrives, the logic of the target task is directly executed through a driving node; and not by the logic that the Yarn scheduler starts to select a compute node for executing the target task when the target task is reached, and the target task is executed by the selected compute node after the compute node for executing the target task is selected. According to the method and the device, the scheduling time of the Yarn scheduler is saved, so that the time consumption is optimized, and the task processing time delay is reduced.

The method comprises the steps that a virtual task request is sent through a client, a Yarn scheduler is started in advance, so that the Yarn scheduler selects idle computing nodes in a Spark cluster in advance, the selected computing nodes serve as driving nodes for executing a target task, and the driving nodes are used for detecting whether executable tasks sent by the client exist in a message queue or not in advance; the time for starting the task and selecting the computing node by the Yarn scheduler is advanced; and then the client sends the executable task to the message queue, after the pre-selected driving node detects that the executable task exists in the message queue, the executable task is read from the message queue as a target task, the logic of the target task is directly executed, the processing process of the target task and other implementation modes are completed, the scheduling time of the Yarn scheduler is saved, and the task processing time delay is reduced.

Thus, the present application primarily includes two parts:

the method comprises the steps that firstly, a client submits a virtual task to a Yarn dispatcher, the Yarn dispatcher selects idle computing nodes in a Spark cluster in advance, and the selected computing nodes serve as driving nodes;

and secondly, the client sends the executable task to the message queue so that the driving node reads the executable task and feeds back the execution result of the executable task.

And the present application is illustrated by the following examples.

The first embodiment is as follows: the client submits the virtual tasks to the Yarn dispatcher, and the Yarn dispatcher selects the computing nodes in the Spark cluster in advance.

As shown in fig. 4, the schematic diagram for sending a virtual task request and acquiring a driving node provided by the present application includes the following steps:

in step 400, the client sends a virtual task request to the Yarn scheduler.

When the client sends the virtual task request to the Yarn scheduler, the client can send the virtual task request to the Yarn scheduler after receiving an instruction for sending the virtual task request; or after determining that the preset condition is met, the client automatically sends the virtual task request to the Yarn scheduler, where the preset condition may be that the number of the driving nodes for detecting the message queue is less than the preset number.

Step 401, after receiving the virtual task request sent by the client, the Yarn scheduler allocates a driving node for executing the virtual task to the virtual task, and sends the virtual task to the driving node.

In the application, the Yarn scheduler sends a virtual task to the driving node, and sends a start instruction to the driving node for the Yarn scheduler to instruct the driving node to execute pre-start.

After receiving the virtual task request sent by the client, the Yarn scheduler needs to allocate resources to the virtual task to execute the logic of the virtual task. At the moment, the Yarn scheduler selects idle computing nodes from the Spark cluster as driving nodes to execute virtual tasks, and at the moment, the driving nodes execute pre-starting after receiving the request sent by the Yarn scheduler. And when the executable task arrives, the driving node directly executes the executable target task.

In the application, the client may send a plurality of virtual task requests to the Yarn scheduler at a time, so that the Yarn scheduler needs to select a plurality of idle computing nodes in the Spark cluster for subsequent use.

After the driver node is selected, when the driver node waits for an executable task, whether the executable task sent by the client exists in a message queue of the executable task or not is mainly detected and waited.

Example two: and sending the executable task, and executing the logic of the executable task by the driving node.

As shown in fig. 5, the present application provides a schematic diagram of sending executable tasks; the method comprises the following steps:

at step 500, the client sends an executable task to a message queue.

Since the executable task is transmitted to the driver node through the message queue, the executable task needs to be serialized before being sent into the message queue.

The serialized executable task is described in a json character string form, and the following is a sample example after the executable task is serialized, provided by the embodiment of the application:

it should be noted that the types of tasks that can be performed in the present application include, but are not limited to:

spark SQL, a function that has already been compiled.

Step 501, a driving node detects a message queue of an executable task in real time, and when the executable task is detected to be in the message queue, acquires the executable task from the message queue as a target task and executes logic of the target task.

Because the executable task in the message queue is described in the form of json character strings, the executable task is analyzed after the drive node acquires the executable task from the message queue, and the logic of the executable task is described according to json.

When the driving node executes the logic of the target task, the driving node determines whether other execution nodes are needed to execute in a matched manner according to the data volume of the target task so as to accelerate the task processing speed.

Therefore, after the driving node acquires the target task, the target task is analyzed, the data volume of the target task is acquired, and when the fact that the data volume of the target task is larger than the preset data volume is detected, the driving node sends a resource request to the Yarn scheduler to request allocation of other execution nodes. And the Yarn scheduler selects idle computing nodes from the Spark cluster and returns the computing nodes serving as execution nodes to the driving nodes.

The driving node distributes the target task to the execution node returned by the Yarn scheduler, the driving node and the execution node execute the logic of the target task together, and the task processing speed is accelerated by adopting a parallel execution mode.

In the application, after the driving node executes the target task, the execution result corresponding to the target task is fed back to the client. Fig. 6 is a schematic diagram illustrating a driving node feeding back an execution result according to an embodiment of the present disclosure; the method comprises the following steps:

step 600, the driving node sends the execution result of the target task to a message queue.

Wherein, the task execution result includes but is not limited to:

successful task processing, failed task processing, and task exception.

Step 601, the client detects the message queue in real time, and when detecting the execution result in the message queue, obtains the execution result from the message queue.

In the application, when the client and the driver node transmit the executable task and the execution result through the message queue, the executable task and the execution result corresponding to the executable task are obtained through a UUID (universal Unique Identifier).

Specifically, when an executable task is temporarily available at the client, a UUID is generated, the UUID is added into a UUID list in the message queue, and then the executable task is added into a container using the UUID as an index. The driving node acquires the executable task according to the UUID, executes the logic of the executable task, determines the execution result, feeds the execution result back to the message queue and takes the UUID corresponding to the executable task as an index in the container, and the client reads the execution result according to the UUID. The same UUID is used for sending the executable task and reading the execution result, so that the confusion caused when a plurality of clients need to process the task is prevented.

Fig. 7 is a schematic overall flow chart of task processing according to an embodiment of the present application. As can be seen from fig. 7, the client first sends a virtual task request to the Yarn scheduler to pre-start the Yarn scheduler; the Yarn scheduler sends the virtual task request to a Spark cluster, and selects an idle computing node as a driving node; then the client sends the executable task to the message queue, the driver node reads the executable task in the message queue as the target task, and executes the logic of the target task; and the driving node sends the execution result of the target task to the message queue, and the client acquires the execution result from the message queue. In the prior art, the target task is directly sent to the Yarn scheduler, and after the target task is received, the Yarn scheduler selects an idle computing node as a driving node, and the driving node executes the logic of the target task.

As shown in fig. 8, an effect schematic diagram comparing time when the technical solution provided by the embodiment of the present application and the related technical solution perform task processing is shown. As can be seen from fig. 8, the time for task processing in the present application is shorter than the time for task processing in the related art, so that the technical solution provided by the embodiment of the present application reduces the time delay for task processing.

Based on the same inventive concept, an embodiment of the present application provides a method for task processing, and as shown in fig. 9, a flowchart of the method for task processing provided by the embodiment of the present application includes the following steps:

step 900, the driving node executes pre-starting according to the starting instruction of the Yarn scheduler, wherein the driving node is selected from the Spark cluster by the Yarn scheduler according to the virtual task request sent by the client;

step 901, a driving node acquires a target task from a message queue of executable tasks and executes the acquired target task, wherein the executable tasks in the message queue are sent by a client;

and step 902, the driving node sends the execution result of the target task to the message queue, so that the client acquires the execution result of the target task from the message queue.

In one possible implementation, the target task acquired by the driver node includes:

the driving node analyzes the target task;

when the data type of the target task is a distributed data set RDD data type, the driving node distributes the target task to the execution node, and the driving node and the execution node execute the target task together;

and the execution node is obtained by sending a resource request to the Yarn scheduler after the driving node determines that the data type of the target task is the RDD data type.

In one possible implementation manner, the method for acquiring, by a driver node, a target task from a message queue of an executable task includes:

the driving node detects a message queue of an executable task in real time;

when the executable task exists in the message queue of the executable task, the driving node acquires the executable task as a target task.

Based on the same inventive concept, an embodiment of the present application further provides another method for task processing, as shown in fig. 10, which is a flowchart of the method for task processing provided by the embodiment of the present application, and includes the following steps:

step 1000, the client sends the executable task as a target task to a message queue so that the driving node acquires the target task from the message queue;

step 1001, the client detects the message queue and receives the execution result of the target task fed back by the driving node.

In one possible implementation, before the client sends the executable task as the target task to the message queue, the client performs serialization processing on the executable task.

In one possible implementation manner, before the client sends the executable task as the target task to the message queue, the client sends a virtual task request to the Yarn scheduler, so that the Yarn scheduler selects a driving node for detecting the message queue from the Spark cluster.

Based on the same inventive concept, an embodiment of the present application further provides a device 1100 for task processing, as shown in fig. 11, the device 1100 includes: a pre-boot unit 1101, an execution unit 1102, and a first sending unit 1103; wherein:

a pre-boot unit 1101, configured to perform pre-boot according to a boot instruction of the yann scheduler, where the driving node is selected by the yann scheduler from the Spark cluster according to the virtual task request sent by the client;

an execution unit 1102, configured to acquire a target task from a message queue of an executable task, and execute the acquired target task, where the executable task in the message queue is sent by a client;

a first sending unit 1103, configured to send an execution result of the target task to the message queue, so that the client obtains the execution result of the target task from the message queue.

In a possible implementation manner, the execution unit 1102 is further configured to: detecting a message queue of an executable task in real time; and when the executable task exists in the message queue of the executable task, acquiring the executable task as a target task.

In a possible implementation manner, the execution unit 1102 is further configured to: analyzing the target task; when the data type of the target task is a distributed data set RDD data type, the target task is distributed to the execution nodes, and the driving node and the execution nodes jointly execute the target task;

Based on the same inventive concept, an embodiment of the present application further provides a device 1200 for task processing, as shown in fig. 12, the device 1200 includes: a second transmitting unit 1201 and a receiving unit 1202, wherein:

a second sending unit 1201, configured to send the executable task as a target task to the message queue, so that the driver node obtains the target task from the message queue;

the receiving unit 1202 is configured to detect the message queue, and receive an execution result of the target task fed back by the driving node.

In a possible implementation manner, the second sending unit 1201 is further configured to: and carrying out serialization processing on the executable task.

In a possible implementation manner, the second sending unit 1201 is further configured to: and sending a virtual task request to the Yarn scheduler so that the Yarn scheduler selects a driving node for detecting the message queue from the Spark cluster.

For convenience of description, the above parts are separately described as units (or modules) according to functional division. Of course, the functionality of the various elements (or modules) may be implemented in the same one or more pieces of software or hardware in practicing the present application.

After the method and the apparatus for task processing according to the exemplary embodiment of the present application are introduced, a computing device for task processing according to another exemplary embodiment of the present application is introduced next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In one possible implementation, a task processing computing device provided by an embodiment of the present application may include at least a processor and a memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform any of the steps of the task processing methods of the various exemplary embodiments of this application.

A task processing computing device 1300 according to this embodiment of the present application is described below with reference to fig. 13. The task processing computing device 1300 of fig. 13 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.

As shown in fig. 13, the components of the task processing computing device 1300 may include, but are not limited to: the at least one processor 1301, the at least one memory 1302, and the bus 1303 connecting the different system components (including the memory 1302 and the processor 1301) together.

Bus 1303 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 1302 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)13021 and/or cache memory 13022, and may further include Read Only Memory (ROM) 13023.

Memory 1302 may also include a program/utility 13025 having a set (at least one) of program modules 13024, such program modules 13024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The task processing computing device 1300 can also communicate with one or more external devices 1304 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the task processing computing device 1300, and/or with any devices (e.g., router, modem, etc.) that enable the task processing computing device 1300 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1305. Also, the task processing computing device 1300 can communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 1306. As shown in fig. 13, the network adapter 1306 communicates with the other modules for the task processing computing device 1300 over the bus 1303. It should be understood that although not shown in FIG. 13, other hardware and/or software modules may be used in conjunction with the task processing computing device 1300, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, the various aspects of the task processing method provided in the present application may also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps in the task processing method according to the various exemplary embodiments of the present application described above in this specification, when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product generated by the task processing of the embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user equipment, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of task processing, the method comprising:

the method comprises the steps that a driving node executes pre-starting according to a starting instruction of a Yarn scheduler, wherein the driving node is selected from a Spark cluster by the Yarn scheduler according to a virtual task request sent by a client;

the driving node acquires a target task from a message queue of an executable task and executes the acquired target task, wherein the executable task in the message queue is sent by a client;

2. The method of claim 1, wherein the driver node performs the retrieved target task, comprising:

the driving node analyzes the target task;

when the data type of the target task is a distributed data set (RDD) data type, the driving node distributes the target task to an execution node, so that the driving node and the execution node execute the target task together;

3. The method of claim 1, wherein the driver node fetching the target task from a message queue of executable tasks, comprises:

the driving node detects the message queue of the executable task in real time;

and when the executable task exists in the message queue of the executable task, the driving node acquires the executable task as a target task.

4. A method of task processing, the method comprising:

5. The method of claim 4, wherein prior to the client sending the executable task as the target task into the message queue, further comprising:

and the client carries out serialization processing on the executable task.

6. A method according to any of claims 4 to 5, further comprising:

the client sends a virtual task request to the Yarn scheduler so that the Yarn scheduler selects a driving node for detecting the message queue from the Spark cluster.

7. An apparatus for task processing, the apparatus comprising: the device comprises a pre-starting unit, an execution unit and a first sending unit; wherein:

the pre-starting unit is used for executing pre-starting according to a starting instruction of a Yarn scheduler, wherein the driving node is selected from a Spark cluster by the Yarn scheduler according to a virtual task request sent by a client;

the first sending unit is configured to send an execution result of the target task to the message queue, so that the client obtains the execution result of the target task from the message queue.

8. An apparatus for task processing, the apparatus comprising: a second transmitting unit and a receiving unit, wherein:

and the receiving unit is used for detecting a message queue and receiving the execution result of the target task fed back by the driving node.

9. A computing device comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 3 or 4 to 6.

10. A computer-readable medium storing a computer program for execution by a computing device, the program, when executed on the computing device, causing the computing device to perform the steps of the method of any of claims 1 to 3 or 4 to 6.