CN111181774A

CN111181774A - High-availability method, system, terminal and storage medium for MapReduce task

Info

Publication number: CN111181774A
Application number: CN201911283083.9A
Authority: CN
Inventors: 道玉明; 张东东
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-05-19

Abstract

The invention provides a high-availability method, a system, a terminal and a storage medium for a MapReduce task, which comprise the following steps: selecting a standby node from a cluster and storing the standby node; monitoring the state of an execution node of the MapReduce task; and if the state of the execution node is monitored to be abnormal, forwarding the task of the abnormal execution node to a standby node. The invention can ensure that the task is normally executed without interruption, thereby saving human resources, ensuring the product quality, taking log as record in the whole process, and being faster and more convenient for subsequent duplication.

Description

High-availability method, system, terminal and storage medium for MapReduce task

Technical Field

The invention relates to the technical field of big data Insight platforms, in particular to a high-availability method, a high-availability system, a high-availability terminal and a high-availability storage medium for a MapReduce task.

Background

In the big data Insight platform, MapReduce is one of the core components. The distribution of the Insight needs to comprise two parts, namely a distributed file system (HDFS) and a distributed computing framework (MapReduce), and the two parts are absent. The key on which the Insight component relies for performing tasks is MapReduce, which is therefore an important factor for large data platforms. The current MapReduce task is executed on a data node, but the execution of the task is influenced by various conditions such as downtime, unavailable network, poor performance and the like of the data node, so that the execution is failed, and the risk of log checking is avoided when the data node is down; the reason cannot be traced, the labor and time are wasted, and the cluster resources are wasted.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, the present invention provides a highly available method, system, terminal and storage medium for MapReduce task, so as to solve the above-mentioned technical problems.

In a first aspect, the present invention provides a highly available method for MapReduce task, including:

selecting a standby node from a cluster and storing the standby node;

monitoring the state of an execution node of the MapReduce task;

and if the state of the execution node is monitored to be abnormal, forwarding the task of the abnormal execution node to a standby node.

Further, the selecting a standby node from the cluster includes:

collecting performance parameters of all nodes of a cluster;

selecting a plurality of idle nodes with optimal performance parameters as standby nodes, wherein the number of the standby nodes is not less than the number of the execution nodes.

Further, the monitoring of the state of the execution node of the MapReduce task includes:

acquiring performance parameters of an execution node, wherein the performance parameters are I/O, Job, a disk, a CPU, a network, a memory, a power supply and weighted summation of running time;

judging whether the performance parameters of the execution nodes exceed a preset threshold value:

and if so, judging that the state of the execution node is abnormal.

Further, the method further comprises:

and saving the log storage path of the execution node.

And monitoring and storing the task execution progress of the execution node in real time.

Further, the forwarding the task of the abnormal execution node to the standby node includes:

forwarding the task execution progress and the task data of the abnormal execution node to a standby node;

and setting the log storage path of the abnormal execution node as the log storage path of the standby node.

In a second aspect, the present invention provides a high availability system for MapReduce task, including:

the standby selecting unit is configured to select a standby node from the cluster and store the standby node;

the state monitoring unit is configured to monitor the state of the execution node of the MapReduce task;

and the task forwarding unit is configured to forward the task of the abnormal execution node to the standby node if the abnormal state of the execution node is monitored.

Further, the spare selecting unit includes:

the parameter acquisition module is configured for acquiring performance parameters of all nodes of the cluster;

and the node screening module is configured to select a plurality of idle nodes with optimal performance parameters as standby nodes, wherein the number of the standby nodes is not less than the number of the execution nodes.

Further, the node monitoring unit includes:

the parameter calculation module is configured to acquire performance parameters of the execution node, wherein the performance parameters are weighted summation of I/O, Job, a disk, a CPU, a network, a memory, a power supply and running time;

the parameter judgment module is configured to judge whether the performance parameter of the execution node exceeds a preset threshold value;

and the abnormity determining module is configured for determining that the state of the execution node is abnormal if the performance parameter of the execution node exceeds a preset threshold value.

In a third aspect, a terminal is provided, including:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.

In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

The beneficial effect of the invention is that,

according to the high-availability method, the system, the terminal and the storage medium for the MapReduce task, provided by the invention, the standby node is selected from the cluster, the state of the execution node of the MapReduce task is monitored in real time, and the task of the abnormal execution node is transferred to the standby node once the state of the execution node is abnormal, so that the high-availability of the MapReduce task is realized. The invention can ensure that the task is normally executed without interruption, thereby saving human resources, ensuring the product quality, taking log as record in the whole process, and being faster and more convenient for subsequent duplication.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. Among them, the execution subject of fig. 1 may be a highly available system for a MapReduce task.

As shown in fig. 1, the method 100 includes:

step 110, selecting a standby node from the cluster and storing the standby node;

step 120, monitoring the state of the execution node of the MapReduce task;

step 130, if the execution node state is monitored to be abnormal, forwarding the task of the abnormal execution node to the standby node.

In order to facilitate understanding of the present invention, the high availability method of the MapReduce task provided by the present invention is further described below by using the principle of the high availability method of the MapReduce task of the present invention and combining with the process of scheduling and managing the MapReduce task in the embodiment.

Specifically, the highly available method for the MapReduce task comprises the following steps:

and S1, selecting a standby node from the cluster and storing the standby node.

The task data node recommendation module: and saving the data nodes recommended by the MapReduce task data node recommending module as standby nodes, wherein the recommended nodes are optimal performance nodes. And the number of the selected standby nodes is not less than the sum of the number of the nodes currently executing the MapReduce task.

The number of the standby nodes set in this embodiment is the best implementation, and in other implementations, the number of the standby nodes may be set by itself as needed.

And S2, monitoring the state of the execution node of the MapReduce task.

Monitoring indexes of I/O, Job, a disk, a CPU, a network, a memory, a power supply and operation time of a current MapReduce task execution node, calculating comprehensive parameters of the execution node by using the collected index parameters, wherein the calculation method is to carry out weighted summation on the index parameters, and the weight of each index parameter is set according to the performance requirement of each index on the task executed by the server. And when the comprehensive parameters of the execution nodes exceed the set threshold, judging that the state of the execution nodes is abnormal.

And S3, if the state of the execution node is monitored to be abnormal, forwarding the task of the abnormal execution node to a standby node.

And storing paths of executing logs of all current MapReduce tasks, and providing that after the MapReduce task switching data nodes are executed after the MapReduce task forwarding module is triggered, the logs can be continuously written in the paths.

And monitoring the current task execution progress, and ensuring that the task is continuously executed instead of being executed from the beginning after the MapReduce task forwarding module executes.

When the execution node in the abnormal state is monitored in step S2, a standby node is randomly selected, the task execution progress and the task data of the execution node in the abnormal state are forwarded to the selected standby node, and the log storage path of the abnormal execution node is set as the log storage path of the standby node. And controlling the standby node to continuously execute the task of the abnormal state execution node.

As shown in fig. 2, the system 200 includes:

a standby selecting unit 210 configured to select a standby node from the cluster and store the standby node;

the state monitoring unit 220 is configured to monitor the state of the execution node of the MapReduce task;

and the task forwarding unit 230 is configured to forward the task of the abnormal execution node to the standby node if it is monitored that the state of the execution node is abnormal.

Optionally, as an embodiment of the present invention, the spare selecting unit includes:

Optionally, as an embodiment of the present invention, the node monitoring unit includes:

Fig. 3 is a schematic structural diagram of a terminal system 300 according to an embodiment of the present invention, where the terminal system 300 may be used to execute a high availability method for a MapReduce task according to the embodiment of the present invention.

The terminal system 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.

The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Therefore, the standby nodes are selected from the cluster, the states of the execution nodes of the MapReduce task are monitored in real time, and the task of the abnormal execution node is transferred to the standby nodes once the states of the execution nodes are abnormal, so that the high availability of the MapReduce task is realized. The invention can ensure that the task is normally executed without interruption, namely, the human resources are saved, the product quality is ensured, the log is taken as the record in the whole process, the subsequent duplication is faster and more convenient, the technical effect which can be achieved by the embodiment can be referred to the description above, and the details are not repeated here.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.

The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A high-availability method for a MapReduce task is characterized by comprising the following steps:

selecting a standby node from a cluster and storing the standby node;

monitoring the state of an execution node of the MapReduce task;

2. The method of claim 1, wherein selecting the standby node from the cluster comprises:

collecting performance parameters of all nodes of a cluster;

3. The method according to claim 1, wherein the monitoring of the state of the executing node of the MapReduce task comprises:

and if so, judging that the state of the execution node is abnormal.

4. The method of claim 1, further comprising:

and saving the log storage path of the execution node.

5. The method of claim 4, wherein forwarding the task of the abnormal execution node to the standby node comprises:

6. A high availability system for a MapReduce task, comprising:

7. The system of claim 6, wherein the alternate picking unit comprises:

8. The system of claim 6, wherein the node monitoring unit comprises:

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-5.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.