CN111274067A

CN111274067A - Method and device for executing calculation task

Info

Publication number: CN111274067A
Application number: CN201811473744.XA
Authority: CN
Inventors: 姚思雨
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2020-06-12
Anticipated expiration: 2038-12-04
Also published as: CN111274067B

Abstract

The invention discloses a method and a device for executing a computing task, and relates to the technical field of computers. One embodiment of the method comprises: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data returned by the main node and generated by executing the calculation task; if the master node is monitored to be incapable of executing the calculation task, selecting a new master node from the slave nodes according to a preset rule; and sending a new computing instruction to the new host node, wherein the new computing instruction carries the original data, and receiving the original data returned by the new host node and generated by continuously executing the computing task according to the original data. This embodiment guarantees both high performance advantages and high availability of the master node.

Description

Method and device for executing calculation task

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for executing a computing task.

Background

Spark is a computing engine, and for Spark clusters, the deployment mode is mainly Spark on horn, and the Spark on horn deployment mode is divided into two modes: cluster mode and Client mode.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the Spark on YarnClient mode has the problem of single point failure and does not have high availability; while the Spark on Yarn Cluster mode is executed by any node when executing the calculation task, the high performance advantage of the node cannot be utilized.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for performing a computing task, which can simultaneously ensure high performance advantage and high availability of a master node.

To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of performing a computing task.

The method for executing the computing task comprises the following steps: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data returned by the main node and generated by executing the calculation task; if the master node is monitored to be incapable of executing the calculation task, selecting a new master node from the slave nodes according to a preset rule; and sending a new computing instruction to the new host node, wherein the new computing instruction carries the original data, and receiving the original data returned by the new host node and generated by continuously executing the computing task according to the original data.

In one embodiment, after sending the computing instruction to the master node, the method further comprises: sending a data recording instruction to the main node at intervals; receiving raw data generated by executing the computing task returned by the main node, wherein the raw data comprises: and receiving the original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

In one embodiment, the method further comprises the steps of sorting the slave nodes in advance according to the sequence of the performance of the slave nodes from high to low; selecting a new master node from the slave nodes according to a preset rule, comprising: and taking the slave node ranked at the top in the sequence as a new master node.

In one embodiment, monitoring that the master node is unable to perform the computing task comprises: and sending a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within a preset time, determining that the main node cannot execute the calculation task.

In one embodiment, after sending a new computing instruction to the new master node, the method further comprises: and receiving an ending instruction returned by the new main node after the new main node finishes executing the computing task, and clearing the original data according to the ending instruction.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an apparatus for performing a computing task.

The device for executing the calculation task of the embodiment of the invention comprises: the first transceiving unit is used for sending a computing instruction to a main node, wherein the computing instruction carries a computing task and receiving original data which is returned by the main node and generated by executing the computing task; the processing unit is used for selecting a new main node from the slave nodes according to a preset rule if the main node is monitored to be incapable of executing the calculation task; and the second transceiver unit is used for sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data, and receives the original data returned by the new main node and generated by continuously executing the calculation task according to the original data.

In one embodiment, the apparatus further comprises: the preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node; the first transceiver unit is specifically configured to: and receiving the original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

In one embodiment, the preprocessing unit is further specifically configured to: sequencing all the slave nodes in advance according to the sequence of the performance of all the slave nodes from high to low; the processing unit is specifically configured to: and taking the slave node ranked at the top in the sequence as a new master node.

In an embodiment, the processing unit is further specifically configured to: and sending a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within a preset time, determining that the main node cannot execute the calculation task.

In an embodiment, the processing unit is further specifically configured to: after sending a new calculation instruction to the new main node, if receiving an end instruction returned by the new main node after the new main node finishes executing the calculation task, clearing the original data according to the end instruction.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.

An electronic device of an embodiment of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the method for executing the computing task provided by the embodiment of the invention.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium.

A computer-readable medium of an embodiment of the present invention has a computer program stored thereon, and the computer program, when executed by a processor, implements the method for performing computing tasks provided by an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps that a main node is controlled to execute a calculation task firstly by sending a calculation instruction carrying the calculation task to the main node, so that the high-performance advantages of the main node, such as strong scheduling performance, high calculation task execution speed and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is appointed to continue executing the calculation task according to original data returned by the main node, and therefore high availability is guaranteed, and meanwhile, the high-performance advantages and the high availability of the main node are guaranteed. When the execution of the computing task is finished, all the original data are cleared, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing task is ensured.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is an exemplary framework diagram of a prior art method of performing a computing task;

FIG. 2 is a schematic diagram of a main flow of a method of performing a computing task according to a first embodiment of the invention;

FIG. 3 is an exemplary framework diagram of a method of performing a computing task according to a first embodiment of the invention;

FIG. 4 is a signaling interaction diagram of a method of performing a computational task according to a second embodiment of the present invention;

FIG. 5 is an exemplary framework diagram of a method of performing a computing task according to a second embodiment of the invention;

FIG. 6 is a schematic diagram of the main elements of an apparatus for performing computing tasks, according to an embodiment of the invention;

FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

Spark (Spark is a general-purpose engine which can be used for completing various operations including SQL query, text processing, machine learning, etc., and before Spark appears, we generally need to learn various engines to respectively handle these requirements) has become the most mainstream technology in the field of big data distributed computing. However, even Spark on Yarn is classified into two modes: cluster mode and Client mode.

As shown in fig. 1, in the Spark on Yarn Client mode, a driver is only deployed on a Client node, the performance of the Client node is far better than that of each slave node in a Spark cluster, a computing task is submitted to the Spark cluster by the Client node, and the Spark cluster finishes the execution of the computing task by scheduling and coordinating each slave node through the driver only deployed on the Client node. Since the performance of the client node is higher than that of each slave node, the deployment of the driver on the client node improves the scheduling capability of the driver and speeds up the execution of the computing task compared with the deployment of the driver on the slave node (assuming the driver is deployed on the slave node). And all logs of the Spark cluster are recorded on the client node, and when the Spark cluster is large in scale, real-time positioning can be rapidly completed through the logs of the client node. However, in this mode, as long as the Client node fails, the computing task cannot be executed, and the computing task must be executed until the Client node returns to normal, so that the Spark on Yarn Client mode has a single point of failure problem, that is, has no high availability.

As shown in fig. 1, in the Spark on Yarn Cluster mode, a driver is deployed on a plurality of slave nodes, a client node submits a computing task to a Spark Cluster, and the Spark Cluster schedules and coordinates each slave node through any driver deployed on the slave node to complete the execution of the computing task. When the slave node executing the computing task fails, other slave nodes with the driver deployed continue to execute the computing task, the computing task does not stop executing due to the failure of the slave node executing the computing task, and high availability is guaranteed. However, since the slave node performing the computing task is arbitrary, a high-performance client node may not participate in the execution of the computing task, and the Spark on Yarn Cluster mode may not fully utilize the high-performance advantages of the client node.

In order to solve the problems in the prior art, a first embodiment of the present invention provides a method for performing a computing task, as shown in fig. 2 and 3, applied to a distributed coordination server, the method including:

step S201, sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data which is returned by the main node and generated by executing the calculation task.

In this step, in implementation, as shown in fig. 3, a high-performance client node with a driver deployed thereon is used as a master node, and the driver deployed on the master node schedules and coordinates each slave node in the Spark cluster, so as to complete execution of a computing task. Therefore, the high-performance main node executes the calculation task first, and the high-performance advantages of the main node are guaranteed, such as strong scheduling capability, high execution speed of executing the calculation task, high real-time positioning and the like. In addition, the distributed coordination server may send the calculation instruction to the master node according to the address of the master node. The master node may send the raw data to the distributed coordination server according to the address of the distributed coordination server. Furthermore, the original data may be the information of the operation breakpoint, and specifically may include: resource scheduler (DAGScheduler) information of dependent relation jobs in a computing task and resource scheduler (tasskscheduler) information between dependent tasks in a computing task.

Step S202, if it is monitored that the master node cannot execute the calculation task, a new master node is selected from the slave nodes according to a preset rule.

In this step, if the high-performance node on which the driver is deployed includes a client node and a plurality of slave nodes, and the client node as the master node cannot execute the computation task, a new master node may be selected from the high-performance slave nodes on which the driver is deployed, so that the scheduling capability of the driver is not reduced, and the execution speed of the computation task is kept at a fast level all the time. If the high-performance node to which the driver is deployed only includes the client node, and the client node serving as the master node cannot execute the computing task, a new master node may be selected from the slave nodes (for example, slave node 1 or slave node 2) to which the driver is deployed, so that the scheduling capability of the driver is reduced, the execution speed of the computing task is also reduced, but the computing task is still executed, and the high-performance advantage of the master node is ensured. The computing task can no longer stop being executed because the client node as the main node can not execute the computing task, and meanwhile, the high performance advantage and the high availability of the main node are guaranteed. In specific implementation, selecting a new master node from the slave nodes according to the preset rule may be performed in the manner of the second embodiment, or may be performed in a random manner, and it should be understood that, without affecting the embodiment of the present invention, a person skilled in the art may flexibly set the preset rule.

Step S203, sending a new computing instruction to the new master node, where the new computing instruction carries the original data, and receiving original data returned by the new master node and generated by continuing to execute the computing task according to the original data.

In this step, when implemented, the distributed coordination server may send a new calculation instruction to the new master node according to the address of the new master node. It should be understood that the new master node continues to execute the computing task according to the original data, the original data is generated when the computing task continues to be executed, and the original data generated when the computing task continues to be executed is returned to the distributed coordination server by the new master node. It should be noted that if the slave node 1 is selected as a new master node to continue to execute the calculation task, the slave node 1 serving as the new master node continues to execute the calculation task, and when the calculation task is not executed, the distributed coordination server monitors that the slave node 1 serving as the new master node cannot execute the calculation task, and selects a new master node from the slave nodes according to a preset rule, for example, selects the slave node 2 as the new master node, and sends a new calculation instruction to the slave node 2 serving as the new master node, where the new calculation instruction carries original data generated by the slave node 1 continuing to execute the calculation task, and the slave node 2 serving as the new master node continues to execute the calculation task until the calculation task is executed.

In order to solve the problems in the prior art, a second embodiment of the present invention provides a method for executing a computing task, which is applied to a distributed coordination server, and in the second embodiment, a specific process is described with reference to fig. 4 and 5 as follows:

firstly, sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task; and sequencing the slave nodes in the order of high performance of the slave nodes.

In this step, when implemented specifically, the distributed coordination server sends a calculation instruction to the master node, and the sending method may specifically be: the address of the client node, the address of the slave node 1, and the address of the slave node 2 are stored in advance in the distributed coordination server, and the distributed coordination server transmits a calculation instruction to the master node based on the address of the client node as the master node. The distributed coordination server is also called Zookeeper (Zookeeper is a distributed, open source distributed application program coordination service, and can provide consistency services for distributed applications, and the functions include configuration maintenance, domain name service, distributed synchronization, group service, and the like), which will be exemplified below. It should be noted that, the Zookeeper stores the performance of the slave node 1 and the performance of the slave node 2, and sorts the slave node 1 and the slave node 2 according to the order from high to low of the performance of the slave node, with the sorting result: slave node 1, slave node 2.

In addition, in the embodiment of the present invention, a client node in a spark cluster submits a computing task input by a user to the spark cluster, a Zookeeper acquires the computing task from the spark cluster, and generates a computing instruction according to the computing task, the computing instruction carries the computing task, the Zookeeper sends the computing instruction to the client node serving as a master node, and the client node executes the computing task first.

And secondly, after the calculation instruction is sent to the main node, a data recording instruction is sent to the main node at intervals.

In this step, when the specific implementation is performed, timing may be started when the Zookeeper sends the calculation instruction to the master node, and the Zookeeper sends the recording instruction to the master node every 5 minutes. In addition, the time length of each period of time is determined according to the calculation amount of the calculation task, for example, when the calculation amount of the calculation task is small, the Zookeeper sends a recording data instruction to the master node every 5 minutes; and when the calculation amount of the calculation task is large, sending a data recording instruction to the main node every 15 minutes by the Zookeeper. And the recording data instruction is an instruction carried by the Zookeeper, and is used for requesting original data generated by executing the computing task from the node executing the computing task.

And thirdly, receiving the original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

In this step, the address of the Zookeeper is stored in advance in the client node as the master node, and the address of the Zookeeper includes the ip address and the port of the distributed coordination server. The master node executes the computing task according to the received computing instruction, wherein the computing task is executed by a driver deployed in the high-performance client node, so that the advantages of the high-performance client node, such as strong scheduling capability, high computing task execution speed, high real-time positioning speed and the like, are fully utilized. And if receiving a data recording instruction in the process of executing the calculation task, the main node sends the original data generated by executing the calculation task to the Zookeeper according to the address of the Zookeeper. In addition, because the Zookeeper sends the recording data instruction once every other time, the master node sends the original data generated by executing the calculation task in every other time to the Zookeeper. And the Zookeeper receives the original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

And fourthly, sending a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within a preset time, determining that the main node cannot execute the calculation task.

In this step, when the method is specifically implemented, it may be confirmed that the master node cannot execute the calculation task by using a heartbeat detection mechanism of the Zookeeper, and the specific process is as follows: and the Zookeeper sends a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within a preset time, the Zookeeper confirms that the main node cannot execute the calculation task. It should be appreciated that one skilled in the art may flexibly identify ways in which a host node may not perform a computing task without affecting embodiments of the present invention. It should be noted that, if a response message returned by the host node according to the monitoring instruction is received within a preset time, the Zookeeper confirms that the host node can execute the computing task, and the host node continues to execute the computing task.

In addition, the fourth step is executed after the calculation instruction is sent in the first step, and the first step, the second step, the fourth step, and the like are only used for convenience of description and are not in the actual execution sequence of each step.

And fifthly, after confirming that the master node can not execute the calculation task, taking the slave node arranged at the top in the sequence as a new master node.

In this step, when embodied, as can be seen from fig. 5, the drivers are deployed at the client node, the slave node 1 and the slave node 2, so that the client node, the slave node 1 and the slave node 2 can perform the computation task, and the slave node 3 does not deploy the driver, so that the slave node 3 cannot perform the computation task. In addition, the slave node ranked first is the slave node 1, and thus Zookeeper takes the slave node 1 as a new master node. Moreover, when the high-performance client node cannot execute the computing task, the Zookeeper selects a new main node, and the new main node continues to execute the computing task, so that the high-performance advantage of the client node is fully utilized, and the high availability is ensured.

And sixthly, sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data, and receiving the original data returned by the new main node and generated by continuously executing the calculation task according to the original data.

In the step, in specific implementation, the Zookeeper sends a new computing instruction to the new master node by using the address of the slave node 1 stored in advance, and the new computing instruction carries the original data generated by the master node executing the computing task.

And after sending a new calculation instruction to a new main node, the Zookeeper sends a data recording instruction to the new main node at intervals. It should be noted that the specific implementation of sending the recorded data instruction to the new master node is the same as the second step, and details are not described here.

And the slave node 1 serving as a new master node continues to execute the calculation task according to the original data in the new calculation instruction, and if a recorded data instruction sent by the Zookeeper is received in the process of continuing to execute the calculation task, the original data generated by continuing to execute the calculation task is sent to the Zookeeper according to the address of the pre-stored Zookeeper. In addition, because the Zookeeper sends the recorded data instruction once every other time, the new master node sends the original data generated by continuing to execute the calculation task at every other time to the Zookeeper. And the Zookeeper receives the original data which is returned by the new main node and generated by continuously executing the calculation task according to the original data.

And seventhly, receiving an ending instruction returned by the new main node after the new main node finishes executing the computing task, and clearing the original data according to the ending instruction.

In this step, it should be understood that, the slave node 1 serving as a new master node may not fail after executing the computing task, and at this time, the slave node 1 serving as a new master node sends an end instruction to the Zookeeper when executing the computing task, and the Zookeeper clears all the original data according to the end instruction, where all the original data includes the original data generated by the client node executing the computing task and the original data generated by the slave node 1 continuing to execute the computing task. And the slave node 1 serving as a new master node may fail in the process of continuing to execute the computing task, so that the computing task cannot be continued to be executed, and at this time, the slave node 2 serving as a new master node continues to execute the computing task by the slave node 2 in the manner described in the sixth step and the seventh step until the computing task is executed. In addition, all the original data can carry the identification of the calculation task, and all the original data can be quickly cleared according to the identification of the calculation task. Moreover, after all the original data are eliminated, the Zookeeper can process the next calculation task, the processing of the next calculation task is not influenced by the calculation task processed at this time, and the accuracy of executing the calculation task is improved.

In the embodiment of the invention, the main node is controlled to execute the calculation task firstly by sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node can not execute the calculation task, a new main node is appointed to continue executing the calculation task according to the original data returned by the main node, and thus, the high availability is ensured, and the high performance advantages and the high availability of the main node are ensured at the same time. When the execution of the computing task is finished, all the original data are cleared, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing task is ensured.

A method of performing a computing task is described above in conjunction with fig. 2-5, and an apparatus for performing a computing task is described below in conjunction with fig. 6.

In order to solve the problems of the prior art, a third embodiment of the present invention provides an apparatus for performing a computing task, as shown in fig. 6, the apparatus including:

the first transceiver 601 is configured to send a calculation instruction to a host node, where the calculation instruction carries a calculation task, and receive raw data generated by executing the calculation task and returned by the host node.

A processing unit 602, configured to select a new master node from the slave nodes according to a preset rule if it is monitored that the master node cannot execute the computation task.

A second transceiving unit 603, configured to send a new computing instruction to the new host node, where the new computing instruction carries the raw data, and receive the raw data returned by the new host node and generated by continuing to execute the computing task according to the raw data.

It should be understood that the manner of implementing the third embodiment is the same as that of implementing the first embodiment, and thus, the description thereof is omitted.

To solve the above problems, a fourth embodiment of the present invention provides an apparatus for performing a computing task, including:

the preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node; and sequencing the slave nodes in advance according to the sequence of the performance of the slave nodes from high to low.

And the first transceiving unit is used for sending a calculation instruction to the main node, wherein the calculation instruction carries a calculation task, and receiving original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

And the processing unit is used for taking the slave node ranked at the top in the sequence as a new master node if the master node is monitored to be incapable of executing the calculation task.

In this unit, when the processing unit is implemented specifically, the processing unit is configured to send a monitoring instruction to the host node, and if a response message returned by the host node according to the monitoring instruction is not received within a preset time, it is determined that the host node cannot execute the calculation task. The processing unit is further specifically configured to, after sending a new calculation instruction to the new master node, clear the original data according to the end instruction if an end instruction returned by the new master node after executing the calculation task is received.

And the second transceiver unit is used for sending a new calculation instruction to the new main node, wherein the new calculation instruction carries the original data, and receives the original data returned by the new main node and generated by continuously executing the calculation task according to the original data.

It should be understood that the manner of implementing the fourth embodiment is the same as that of implementing the second embodiment, and thus, the description thereof is omitted.

FIG. 7 illustrates an exemplary system architecture 700 for a method of performing computing tasks or an apparatus for performing computing tasks to which embodiments of the invention may be applied.

As shown in fig. 7, the system architecture 700 may include

terminal devices

701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the

terminal devices

701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. The

terminal devices

701, 702, 703 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The

terminal devices

701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for executing the computing task provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the device for executing the computing task is generally disposed in the server 705.

It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first transceiver unit, a processing unit, and a second transceiver unit. The names of these units do not form a limitation on the unit itself under certain circumstances, for example, the processing unit may be further described as "a unit that selects a new master node from the slave nodes according to a preset rule if it is detected that the master node cannot perform the calculation task".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data returned by the main node and generated by executing the calculation task; if the master node is monitored to be incapable of executing the calculation task, selecting a new master node from the slave nodes according to a preset rule; and sending a new computing instruction to the new host node, wherein the new computing instruction carries the original data, and receiving the original data returned by the new host node and generated by continuously executing the computing task according to the original data.

According to the technical scheme of the embodiment of the invention, the main node is controlled to execute the calculation task at first by sending the calculation instruction carrying the calculation task to the main node, so that the high performance advantages of the main node, such as strong scheduling performance, high speed of executing the calculation task and the like, are fully utilized, if the main node cannot execute the calculation task, a new main node is appointed to continue executing the calculation task according to the original data returned by the main node, and thus, the high availability is ensured, and the high performance advantages and the high availability of the main node are ensured at the same time. When the execution of the computing task is finished, all the original data are cleared, so that the execution of each computing task does not influence the execution of the next computing task, and the accuracy of the execution of the computing task is ensured.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of performing a computing task, comprising:

sending a calculation instruction to a main node, wherein the calculation instruction carries a calculation task and receives original data returned by the main node and generated by executing the calculation task;

if the master node is monitored to be incapable of executing the calculation task, selecting a new master node from the slave nodes according to a preset rule;

and sending a new computing instruction to the new host node, wherein the new computing instruction carries the original data, and receiving the original data returned by the new host node and generated by continuously executing the computing task according to the original data.

2. The method of claim 1, wherein after sending the computing instruction to the master node, the method further comprises: sending a data recording instruction to the main node at intervals;

receiving raw data generated by executing the computing task returned by the main node, wherein the raw data comprises:

and receiving the original data generated by executing the calculation task and sent by the main node according to the recorded data instruction.

3. The method of claim 2, further comprising: sequencing all the slave nodes in advance according to the sequence of the performance of all the slave nodes from high to low;

selecting a new master node from the slave nodes according to a preset rule, comprising:

and taking the slave node ranked at the top in the sequence as a new master node.

4. The method of claim 1, wherein monitoring that the master node is unable to perform the computing task comprises:

and sending a monitoring instruction to the main node, and if a response message returned by the main node according to the monitoring instruction is not received within a preset time, determining that the main node cannot execute the calculation task.

5. The method of claim 1, wherein after sending a new computing instruction to the new master node, the method further comprises:

and receiving an ending instruction returned by the new main node after the new main node finishes executing the computing task, and clearing the original data according to the ending instruction.

6. An apparatus for performing computing tasks, comprising:

the first transceiving unit is used for sending a computing instruction to a main node, wherein the computing instruction carries a computing task and receiving original data which is returned by the main node and generated by executing the computing task;

the processing unit is used for selecting a new main node from the slave nodes according to a preset rule if the main node is monitored to be incapable of executing the calculation task;

7. The apparatus of claim 6, further comprising:

the preprocessing unit is used for sending a data recording instruction to the main node at intervals after sending a calculation instruction to the main node;

the first transceiver unit is specifically configured to:

8. The apparatus of claim 7, wherein the preprocessing unit is further configured to:

sequencing all the slave nodes in advance according to the sequence of the performance of all the slave nodes from high to low;

the processing unit is specifically configured to:

9. The apparatus according to claim 6, wherein the processing unit is further specifically configured to:

10. The apparatus according to claim 6, wherein the processing unit is further specifically configured to:

after sending a new calculation instruction to the new main node, if receiving an end instruction returned by the new main node after the new main node finishes executing the calculation task, clearing the original data according to the end instruction.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.