CN105243125A

CN105243125A - PrestoDB cluster running method and apparatus, cluster and data query method and apparatus

Info

Publication number: CN105243125A
Application number: CN201510633927.3A
Authority: CN
Inventors: 吕信
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2015-09-29
Filing date: 2015-09-29
Publication date: 2016-01-13
Anticipated expiration: 2035-09-29
Also published as: CN105243125B

Abstract

The present invention discloses a PrestoDB cluster running method and apparatus, a cluster and a data query method and apparatus. The PrestoDB cluster running method comprises: a ZooKeeper cluster receiving respective IP addresses and ports sent by at least two coordinator nodes; determining the first coordinator node indicated by the received IP address and port as a surviving coordinator node, taking other coordinator nodes except the surviving coordinator node as standby coordinator nodes, and notifying the surviving coordinator node to a computing node; in a query executing process, detecting current survivability of the surviving coordinator node; and if detecting that the surviving coordinator node has a failure, selecting a coordinator node in the standby coordinator nodes as a new surviving coordinator node, and notifying the new surviving coordinator node to the computing node. According to the methods and the apparatuses provided by the present invention, the availability efficiency of the PrestoDB cluster is improved, and high availability of the PrestoDB cluster is achieved.

Description

The operation method of PrestoDB cluster, device, cluster and data query method and apparatus

Technical field

The embodiment of the present invention relates to computer technology, particularly relates to the method and apparatus of a kind of operation method of PrestoDB cluster, device, PrestoDB cluster and PrestoDB cluster data query.

Background technology

Along with the rise of large data, the business datum amount of Internet firm rises year by year, therefore large data technique is all carried out in inside by Ge great Internet firm, and builds data warehouse for core business system, and current data warehouse is divided into two types: off-line data warehouse and Real-time Data Warehouse.Wherein, the representative products in off-line data warehouse is exactly hive, and this product is MapReduce due to bottom Computational frame, is therefore suitable for off-line analysis and the calculating of super large data set, and the data analysis higher for requirement of real-time and calculating are also not suitable for; The representative products of Real-time Data Warehouse is PrestoDB, this product is developed by FaceBook, the distributed data that have employed PipeLine calculates and transmission mode, can meet within-20 minutes, meet the requirement of real-time data analysis and calculating at 100 milliseconds for the analysis and calculation of large data.

Because PrestoDB is a distributed computing framework based on internal memory, when carrying out data analysis and calculating, all calculating (Worker) node in PrestoDB cluster carries out actual data processing and calculating, and coordinates (Coordinator) node and mainly carry out carrying out between the scheduling of query task and each node the state of the calculation task that heartbeat detection and each Worker node run and information gathers and adds up.Therefore Coordinator node is the equal of the management node of whole PrestoDB cluster, for all Worker nodes under the overall leadership and inquiry and calculation task.

In prior art, PrestoDB cluster can only specify a Coordinator node, Coordinator node will be caused like this to there is Single Point of Faliure: once the server generation hardware fault at Coordinator node place, then need PrestoDB cluster to be stopped service, amendment PrestoDB cluster configuration file reassigns a new server as Coordinator node, restart cluster.After above-mentioned a series of operation, PrestoDB cluster just subnormally again can provide service, but like this operation can cause PrestoDB cluster to be stop providing service within a period of time, and what reduce PrestoDB cluster can service efficiency.

Summary of the invention

In view of this, the embodiment of the present invention provides the method and apparatus of a kind of operation method of PrestoDB cluster, device, PrestoDB cluster and PrestoDB cluster data query, can service efficiency with what improve PrestoDB cluster.

First aspect, embodiments provide a kind of operation method of PrestoDB cluster, described method comprises:

The respective IP address that ZooKeeper cluster reception at least two coordinator nodes send and port;

The IP address that first receives by ZooKeeper cluster and the coordinator node that port represents are defined as coordinator node of surviving, coordinator node beyond described survival coordinator node is as coordinator node for subsequent use, described survival coordinator node is informed to computing node, is accepted querying command by described survival coordinator node and calculation task is handed down to computing node;

In the process performing inquiry, ZooKeeper cluster detects the current viability of described survival coordinator node;

If ZooKeeper cluster detects that described survival coordinator node breaks down, then in described coordinator node for subsequent use, elect a coordinator node, as new survival coordinator node, the survival coordinator node that this is new informs to computing node.

Second aspect, the embodiment of the present invention additionally provides a kind of running gear of PrestoDB cluster, and described device comprises:

Address accept module, for receiving the respective IP address and port that at least two coordinator nodes send;

Node determination module, the coordinator node represented for the IP address that receives first and port is defined as coordinator node of surviving, coordinator node beyond described survival coordinator node is as coordinator node for subsequent use, described survival coordinator node is informed to computing node, is accepted querying command by described survival coordinator node and calculation task is handed down to computing node;

Survival detection module, in the process performing inquiry, detects the viability that described survival coordinator node is current;

Election module, if for detecting that described survival coordinator node breaks down, then elect a coordinator node in described coordinator node for subsequent use, as new survival coordinator node, the survival coordinator node that this is new informs to computing node.

The third aspect, the embodiment of the present invention additionally provides a kind of PrestoDB cluster, comprises at least two coordinator nodes, computing node and ZooKeeper cluster;

Described ZooKeeper cluster comprises the running gear of the PrestoDB cluster described in any embodiment of the present invention.

Fourth aspect, the embodiment of the present invention additionally provides a kind of method of PrestoDB cluster data query, and adopt the PrestoDB cluster described in any embodiment of the present invention to perform, described method comprises:

Client specifies IP address and the port of ZooKeeper cluster;

Client receives querying command, and obtains current survival coordinator node from described ZooKeeper cluster;

Described querying command is submitted to described survival coordinator node by client, carries out process obtain calculation task and calculation task is handed down to computing node carrying out query count by described survival coordinator node to described querying command;

Client obtains Query Result from described survival coordinator node.

5th aspect, the embodiment of the present invention additionally provides a kind of device of PrestoDB cluster data query, and described device comprises:

Address designated module, is used to specify IP address and the port of ZooKeeper cluster;

Inquire-receive module, for receiving querying command, and obtains current survival coordinator node from described ZooKeeper cluster;

Module is submitted in inquiry to, for described querying command is submitted to described survival coordinator node, carries out process obtain calculation task and calculation task is handed down to computing node carrying out query count by described survival coordinator node to described querying command;

Result acquisition module, for obtaining Query Result from described survival coordinator node.

The operation method of the PrestoDB cluster that the embodiment of the present invention provides, device, the method and apparatus of PrestoDB cluster and PrestoDB cluster data query, by configuring at least two coordinator nodes in PrestoDB cluster, computing node and ZooKeeper cluster, in described at least two coordinator nodes, elect a coordinator node as survival coordinator node by ZooKeeper cluster, other coordinator nodes are as coordinator node for subsequent use, and in the process performing inquiry, detect the viability that this survival coordinator node is current, once detect that described survival coordinator node breaks down, in described coordinator node for subsequent use, then re-elect a coordinator node as new survival coordinator node, avoid PrestoDB cluster after coordinator node breaks down, stop service, what improve PrestoDB cluster can service efficiency, achieve the High Availabitity of PrestoDB cluster.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the operation method of a kind of PrestoDB cluster that the embodiment of the present invention one provides;

Fig. 2 is the structural representation of the running gear of a kind of PrestoDB cluster that the embodiment of the present invention two provides;

Fig. 3 is the deployment schematic diagram of a kind of PrestoDB cluster that the embodiment of the present invention three provides;

Fig. 4 is the process flow diagram of the method for a kind of PrestoDB cluster data query that the embodiment of the present invention four provides;

Fig. 5 is the structural representation of the device of a kind of PrestoDB cluster data query that the embodiment of the present invention five provides.

Embodiment

Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.

In embodiments of the present invention, in the configuration file of PrestoDB cluster, specify the address of at least two coordinator nodes and the address of ZooKeeper cluster.When comprising two coordinator nodes, ZooKeeper cluster is when comprising three station servers, and configuration specification is as coordinator node address=IP address 1: port one; IP address 2: port 2, namely specify two nodes coordinator node alternatively that IP address is address 1 and address 2, port is respectively port one and port 2.ZooKeeper address=IP address 1: port one; IP address 2: port 2; IP address 3: port 3, namely specify the ZooKeeper cluster that IP address 1, IP address 2 and IP address 3 three station server form, and port one, port 2 and port 3 is all generally equal.Behind the address of such appointment coordinator node and ZooKeeper cluster, be convenient to the communication between each node follow-up.In configuration file, specify the address of at least two coordinator nodes and ZooKeeper cluster can realize by adding configuration item, namely two configuration items are added, a configuration item is used to specify IP address and the port of at least two coordinator nodes, and another configuration item is used to specify IP address and the port of ZooKeeper cluster.

Embodiment one

Fig. 1 is the process flow diagram of the operation method of a kind of PrestoDB cluster that the embodiment of the present invention one provides, the situation of new survival coordinator node is elected when the present embodiment survival coordinator node be applicable in PrestoDB cluster breaks down, the method can be performed by ZooKeeper cluster, specifically comprises the steps:

Step 110, the respective IP address that ZooKeeper cluster reception at least two coordinator nodes send and port.

In the present embodiment, PrestoDB cluster comprises at least two coordinator nodes, multiple computing node and ZooKeeper cluster.Wherein, in PrestoDB cluster, configure at least two coordinator nodes, after being convenient to survival the faults such as the machine of delaying appearring in coordinator node, from other coordinator node, elect the survival coordinator node made new advances, the service of PrestoDB cluster need not be stopped, thus ensure the high availability of PrestoDB cluster.In PrestoDB cluster, preferred disposition two coordinator nodes can realize above-mentioned functions.

After coordinator node starts, self IP address and port is sent to ZooKeeper cluster, this IP address and port are saved in coordinator node list after receiving the IP address and port that coordinator node sends by ZooKeeper cluster, are convenient to the follow-up election to survival coordinator node.

Wherein, ZooKeeper be one distributed, the distributed application program coordination service of open source code, its target is the key service that packaged complexity is easily made mistakes, and the system of the interface be simple and easy to and performance efficiency, function-stable is supplied to user.When disposing ZooKeeper cluster, preferred deployment odd number node, because ZooKeeper cluster just can allow whole cluster delay machine so that machine number of delaying is more than half.

Step 120, the IP address that first receives by ZooKeeper cluster and the coordinator node that port represents are defined as coordinator node of surviving, coordinator node beyond described survival coordinator node is as coordinator node for subsequent use, described survival coordinator node is informed to computing node, is accepted querying command by described survival coordinator node and calculation task is handed down to computing node.

When ZooKeeper cluster starts election survival coordinator node, the coordinator node that the IP address received first and port represent is defined as coordinator node of surviving, the coordinator node that subsequently received IP address and port represent is defined as coordinator node for subsequent use, described survival coordinator node is informed to computing node, accept querying command by described survival coordinator node and process is carried out to querying command and obtain calculation task, calculation task is handed down to computing node, performs concrete calculation task by computing node.

Step 130, in the process performing inquiry, ZooKeeper cluster detects the current viability of described survival coordinator node.

Perform in the process of inquiry at PrestoDB cluster, ZooKeeper cluster is by detecting the current viability of described survival coordinator node in real time with the communication interaction of survival coordinator node.

Wherein, the viability that ZooKeeper cluster detects described survival coordinator node current preferably includes:

ZooKeeper cluster receives the status information that described survival coordinator node sends every setting-up time;

If arrive the status information that described setting-up time ZooKeeper cluster does not receive described survival coordinator node, then extend to the second setting-up time, if arrive described second setting-up time, do not receive described status information, then determine that described survival coordinator node breaks down.

Survival coordinator node sends once the status information of oneself every setting-up time to ZooKeeper cluster, when arriving setting-up time, ZooKeeper cluster receives the status information that survival coordinator node sends, and thinks that survival coordinator node normally works, does not have fault; When arriving setting-up time, ZooKeeper cluster does not receive the status information that described survival coordinator node sends, then extend time to the second setting-up time of receiving status information, if arrive described second setting-up time, still do not receive the status information that survival coordinator node sends, then determine that described survival coordinator node breaks down.

Step 140, if ZooKeeper cluster detects that described survival coordinator node breaks down, then elects a coordinator node in described coordinator node for subsequent use, and as new survival coordinator node, the survival coordinator node that this is new informs to computing node.

When ZooKeeper cluster detects that the faults such as the machine of delaying appear in described survival coordinator node, a coordinator node is elected in described coordinator node for subsequent use, using this coordinator node as new survival coordinator node, this new survival coordinator node is informed to the computing node in PrestoDB cluster, follow-uply accept querying command by this new survival coordinator node and be responsible for the issuing of calculation task.

The present embodiment receives by ZooKeeper cluster the respective IP address and port that at least two coordinator nodes send, the coordinator node that the IP address received first and port represent is defined as coordinator node of surviving, coordinator node beyond described survival coordinator node is as coordinator node for subsequent use, described survival coordinator node is informed to computing node, accepted querying command by described survival coordinator node and calculation task is handed down to computing node, in the process performing inquiry, ZooKeeper cluster detects the current viability of described survival coordinator node, if described survival coordinator node breaks down, then in coordinator node for subsequent use, elect a coordinator node, as new survival coordinator node, the survival coordinator node that this is new informs to computing node, once detect that survival coordinator node breaks down, a new survival coordinator node is elected immediately from coordinator node for subsequent use, avoid PrestoDB cluster after coordinator node breaks down, stop service, what improve PrestoDB cluster can service efficiency, achieve the High Availabitity of PrestoDB cluster.

On the basis of technique scheme, after the survival coordinator node that this is new informs to computing node, also comprise:

ZooKeeper cluster index gauge operator node is forced unsuccessfully by calculation task that the survival coordinator node that breaks down is submitted to.

ZooKeeper cluster is after electing the survival coordinator node made new advances, the survival coordinator node that this is new informs to computing node, and index gauge operator node forces the survival coordinator node i.e. calculation task of front survival coordinator node submission unsuccessfully by breaking down, continue to avoid computing node to perform this calculation task, thus save the memory headroom of computing node, that improves PrestoDB cluster further can service efficiency.

Embodiment two

Fig. 2 is the structural representation of the running gear of a kind of PrestoDB cluster that the embodiment of the present invention two provides, as shown in Figure 2, the running gear of the PrestoDB cluster described in the present embodiment is configured in ZooKeeper cluster, comprising: address accept module 210, node determination module 220, survival detection module 230 and election module 240.

Wherein, address accept module 210 is for the respective IP address that receives at least two coordinator nodes and send and port;

Node determination module 220 is defined as the coordinator node that the IP address that receives first and port represent coordinator node of surviving, coordinator node beyond described survival coordinator node is as coordinator node for subsequent use, described survival coordinator node is informed to computing node, is accepted querying command by described survival coordinator node and calculation task is handed down to computing node;

Survival detection module 230, in the process performing inquiry, detects the viability that described survival coordinator node is current;

If election module 240 is for detecting that described survival coordinator node breaks down, then in described coordinator node for subsequent use, elect a coordinator node, as new survival coordinator node, the survival coordinator node that this is new informs to computing node.

Preferably, also comprise:

Indicating module, after informing to computing node at the survival coordinator node that this is new, index gauge operator node is forced unsuccessfully by calculation task that the survival coordinator node that breaks down is submitted to.

Preferably, described detection module comprises:

Receiving element, for receiving the status information that described survival coordinator node sends every setting-up time;

Determining unit, if do not receive the status information of described survival coordinator node for arriving described setting-up time, then extend to the second setting-up time, if arrive described second setting-up time, do not receive described status information, then determine that described survival coordinator node breaks down.

The said goods can perform the operation method of the PrestoDB cluster that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.

Embodiment three

Fig. 3 is the deployment schematic diagram of a kind of PrestoDB cluster that the embodiment of the present invention three provides, and as shown in Figure 3, the PrestoDB cluster described in the present embodiment comprises: at least two coordinator nodes 310, computing node 320 and ZooKeeper clusters 330.

Wherein, ZooKeeper cluster 330 comprises the PrestoDB cluster described in any embodiment of the present invention, ZooKeeper cluster is used for election survival coordinator node in described at least two coordinator nodes, and detect the viability of described survival coordinator node, detecting that described survival coordinator node breaks down, elect a coordinator node as new survival coordinator node again, and this new survival coordinator node is informed to computing node.Described ZooKeeper cluster preferably includes at least three station servers.

When performing inquiry, survival coordinator node carries out heartbeat detection between query task and each computing node and the state and the information that gather and add up the calculation task that each computing node runs for dispatching; Computing node is used for carrying out data processing and calculating.

The present embodiment by configuring at least two coordinator nodes, computing node and ZooKeeper cluster in PrestoDB cluster, can realize the High Availabitity of PrestoDB cluster, and that improves PrestoDB cluster can service efficiency.

Embodiment four

Fig. 4 is the process flow diagram of the method for a kind of PrestoDB cluster data query that the embodiment of the present invention four provides, the present embodiment is applicable to the situation of the PrestoDB cluster data query according to any embodiment of the present invention, the method can be performed by client, specifically comprises the steps:

Step 410, client specifies IP address and the port of ZooKeeper cluster.

Client, when submit Query, needs IP address and the port of specifying ZooKeeper cluster, is convenient to follow-uply get survival coordinator node from ZooKeeper cluster.

Step 420, client receives querying command, and obtains current survival coordinator node from described ZooKeeper cluster.

Client receives the querying command of user's input, according to IP address and the port of ZooKeeper cluster, from described ZooKeeper cluster, obtains current survival coordinator node.

Step 430, described querying command is submitted to described survival coordinator node by client, carries out process obtain calculation task and calculation task is handed down to computing node carrying out query count by described survival coordinator node to described querying command.

Described querying command is submitted to described survival coordinator node by client, described survival coordinator node is resolved described querying command and is obtained query execution plan, the query execution stage is produced according to described query execution plan, the described query execution stage is separated into multiple calculation task, multiple calculation task is handed down to computing node, carry out query count by computing node according to calculation task, the query count result of described survival coordinator node to computing node gathers and adds up, and obtains Query Result.

Step 440, client obtains Query Result from described survival coordinator node.

Client obtains final Query Result from described survival coordinator node.

The present embodiment specifies IP address and the port of ZooKeeper cluster by client, after receiving querying command, current survival coordinator node is obtained from ZooKeeper cluster, querying command is submitted to described survival coordinator node, carry out process by survival coordinator node to described querying command to obtain calculation task and calculation task is handed down to computing node carrying out query count, Query Result is obtained from described survival coordinator node, achieve according to PrestoDB cluster data query, ensure that the high availability of PrestoDB cluster, what improve PrestoDB cluster can service efficiency.

Embodiment five

Fig. 5 is the structural representation of the device of a kind of PrestoDB cluster data query that the embodiment of the present invention five provides, as shown in Figure 5, the device of the PrestoDB cluster data query described in the present embodiment is configured in client, comprising: address designated module 510, inquire-receive module 520, inquiry submit module 530 and result acquisition module 540 to.

Wherein, address designated module 510 is used to specify IP address and the port of ZooKeeper cluster;

Inquire-receive module 520 for receiving querying command, and obtains current survival coordinator node from described ZooKeeper cluster;

Inquiry submits to module 530 for described querying command is submitted to described survival coordinator node, carries out process obtain calculation task and calculation task is handed down to computing node carrying out query count by described survival coordinator node to described querying command;

Result acquisition module 540 is for obtaining Query Result from described survival coordinator node.

The said goods can perform the method that the embodiment of the present invention four provides, and possesses the corresponding functional module of manner of execution and beneficial effect.

Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims

1. an operation method for PrestoDB cluster, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, after the survival coordinator node that this is new informs to computing node, also comprises:

3. method according to claim 1 and 2, is characterized in that, the viability that ZooKeeper cluster detects described survival coordinator node current comprises:

4. a running gear for PrestoDB cluster, is characterized in that, comprising:

5. device according to claim 4, is characterized in that, also comprises:

6. the device according to claim 4 or 5, is characterized in that, described detection module comprises:

7. a PrestoDB cluster, is characterized in that, comprises at least two coordinator nodes, computing node and ZooKeeper cluster;

Described ZooKeeper cluster comprises the running gear of the arbitrary described PrestoDB cluster of claim 4-6.

8. PrestoDB cluster according to claim 7, is characterized in that, described ZooKeeper cluster comprises at least three station servers.

9. a method for PrestoDB cluster data query, adopt the PrestoDB cluster described in claim 7 or 8 to perform, it is characterized in that, described method comprises:

Client specifies IP address and the port of ZooKeeper cluster;

Client obtains Query Result from described survival coordinator node.

10. a device for PrestoDB cluster data query, is characterized in that, comprising: