WO2016065776A1

WO2016065776A1 - Method for tightly coupled scalable big-data interaction

Info

Publication number: WO2016065776A1
Application number: PCT/CN2015/072975
Authority: WO
Inventors: 王恩东; 张东; 亓开元; 刘成平; 辛国茂; 杨勇; 卢俊佐
Original assignee: 浪潮电子信息产业股份有限公司
Priority date: 2014-10-28
Filing date: 2015-02-13
Publication date: 2016-05-06
Also published as: CN104348913B; CN104348913A

Abstract

Provided is a method for tightly coupled scalable big-data interaction; by means of constructing a distributed tightly coupled client driver layer and while ensuring consistency, single point failure of a client or server is avoided and the communication overhead between clients is reduced, causing a system to have near-linear scalability when based on metadata query class, satisfying the requirements of online high concurrency interaction and analysis. The described method ensures data read/write consistency; although delays occur with a read operation alone, the consistency in the order of read versions is ensured. When it is necessary to read the newest version, a data synchronization process may be executed proactively. Moreover, the method has good fault tolerance; as long as the number of failed nodes is less than half of the number of nodes, the read/write data of the other nodes is not affected; after a node responds, only one successful read/write operation is needed in order for synchronization to take place.

Description

A tightly coupled and scalable big data interaction method

Technical field

The invention relates to the field of big data technology, in particular to a tightly coupled and scalable big data interaction method.

Background technique

With the advent of the era of big data, computing models and systems for data-intensive applications are emerging for the needs of big data business applications in the industry, such as offline batch processing system MapReduce, massive data high concurrent processing system HBase, memory computing framework Spark and streaming. The processing framework Storm, as well as the traditional high-performance computing framework MPI. In these big data processing modes, since new programming models are introduced and the learning cost is large, the interactive analysis mode and effect demand similar to the traditional database application based on various big data processing systems are the most extensive. In the interactive analysis, the data is stored in the form of a table, and the SQL statement is used as a programming interface to support retrieval, statistics, association, sorting, etc., to achieve high concurrency and low latency processing effects. The current MapReduce-based Hive, Spark-based Shark belongs to this type of interactive analysis engine.

However, the existing interactive analysis engine supports the table structure and the SQL statement mode, and the underlying data system adopts the distributed architecture, but the interaction analysis effect in practical applications is still very poor. For example, Hive uses the MapReduce engine to adopt a strict synchronization and step-by-step materialization mode in each processing stage, and the processing delay is large. Although Shark is based on the memory calculation engine, the processing performance is optimized through pipelined and intermediate result caching, but because of its traditional Clinet /Sever mode, and SQL parsing, path planning and metadata processing Server side only supports single-point deployment, but it restricts the high concurrent interaction processing effect. Therefore, there is a need for a new driver architecture that meets the needs of online high-concurrency interactive analysis of big data.

Summary of the invention

It is an object of the present invention to provide a tightly coupled and extensible method of big data interaction.

The object of the present invention is achieved in the following manner. By constructing a distributed tightly coupled client driver layer, on the basis of ensuring consistency, a single point of failure of the client or the server can be avoided, and the relationship between the clients is reduced. The communication overhead makes the system have near-linear scalability in the scenario dominated by the metadata query class, which satisfies the online high-concurrency interactive analysis requirements of big data. The specific steps are as follows:

1) deploying multiple application instances in the application server, and performing load balancing among the application instances;

2) Dynamically linking the client driver in the process space of each instance, the client accepts the interaction request sent by the application, completes the Sql parsing, the execution path optimization, the task scheduling, the sending operation request, and the result aggregation;

3) The application instance gets the returned result and processes it in the business logic layer to avoid a single point on the client or server. It fails and reduces the communication overhead between clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is mainly based on read and query operations, it can effectively expand and support. High concurrency, when metadata write operations occur, there is a problem of metadata synchronization, so it is necessary to ensure read and write consistency through inter-node interaction;

4) The read-write synchronization process reads the current version from the node first when reading and writing each time; after the data is updated, the version number is incremented by 1, and a write data update request is sent to all the nodes; after the node receives the new version update, If you have not agreed to a higher version before, you are in favor of returning, otherwise notify the sender of the latest version number;

5) After not receiving more than half of the consent votes, take the maximum version number returned by each node. If the maximum version number is the same as that issued by itself, it indicates that the update conflicts, waiting for the latest version data synchronization, otherwise it reads from more than half of the nodes. The latest version data, after receiving the latest version data, reset the current version to continue the update;

6) After receiving the consent votes of more than half of the nodes, submit the results to all nodes; after receiving the confirmation of half of the nodes, the read and write operations are completed;

In the class scenario based on metadata read operation, it has good scalability. However, when metadata write operation occurs, there is a problem of metadata synchronization. Therefore, it is necessary to ensure read and write consistency through inter-node interaction. The read and write synchronization process of the coupled system is as follows:

(7) When reading and writing each time, first read the current version d _v from this node;

(8) After the data update, the version number v+1, send a write request d _v+1 to all n nodes;

(9) After node n _i receives d _v+1 , if it does not agree to a higher version before, v _i <v+1 is in favor of returning, otherwise the sender's latest version number v _{i is} notified;

(10) After not receiving more than half of the consent tickets, take the maximum version number v _m returned by each node;

4.1) When v _m = v+1, it indicates an update conflict, waiting for the latest version v _m synchronization;

4.2) Otherwise, read the latest version v _m to n/2+1 nodes;

4.3) After receiving the maximum version number, set the current version v=v _m to continue the step (2);

(11) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;

(12) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed;

(7) The simple read operation is affected by the step (6), and the delay phenomenon occurs, but the order of reading the versions is guaranteed to be consistent. In the case that the latest version needs to be read, step 4.1) is actively performed to synchronize the data;

(8) As long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, and synchronization can be performed through steps 4.2) and 4.3).

The object of the present invention is that the above method can ensure the read/write consistency of data, and although the read operation may be delayed, the order of reading the versions can be ensured. In the case that you need to read the latest version, you can actively perform a data synchronization process. In addition, the method is very fault tolerant, only If the number of failed nodes is less than half, the read and write data of other nodes is not affected. When the node replies, only one read and write operation can be synchronized through the steps.

DRAWINGS

Figure 1 is a single client, single server system architecture diagram;

Figure 2 is a single client, multi-server system architecture diagram;

Figure 3 is a multi-client, multi-server separation system architecture diagram;

Figure 4 is a multi-node tightly coupled system architecture diagram;

Figure 5 is a diagram of the read and write synchronization process of the multi-node drive architecture.

detailed description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and embodiments, in which the present invention can be applied to the technical problems, and the implementation of the technical effects can be fully understood and implemented. It should be noted that, if not conflicting, each of the features of the embodiments of the present invention and the embodiments are within the protection scope of the present invention.

In the traditional client server mode, the single-client and single-server systems shown in Figure 1 have a single point of failure and performance bottleneck on the server side. The single-client and multi-server systems shown in Figure 2 establish a cluster on the server. However, there is a single point of failure and performance bottleneck on the client. The multi-client and multi-server separation system shown in Figure 3 establishes a cluster on the client and the server respectively, and can perform load balancing on both ends, although a single point can be avoided. Failure, improve concurrent performance, but if the client and server are physically isolated, the node resource requirements are too large, even if the physical centralized deployment mode is still a many-to-many complex topology, various The system and communication overhead occupied by routing messages and sending and receiving data increases exponentially with the number of nodes, which seriously affects the performance of big data systems in environments with limited network bandwidth.

The multi-node tight coupling system is shown in Figure 4:

(1) deploying n application instances in the application server, and performing load balancing among the application instances;

(2) dynamically linking the client driver in the process space of each instance;

(3) The client driver accepts the interactive request sent by the application, completes the Sql parsing, performs the operation compiling and path optimization, and sends an operation request to the distributed big data processing system;

(4) The big data processing system performs processing on each processing node, and returns the result to the client driver summary processing;

(5) The application instance gets the returned result and processes it at the business logic layer;

The above architecture can avoid single point of failure of the client or the server, and reduce the communication overhead between the clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is read and queried. Class operations are dominant, so they can be effectively extended and support high concurrency.

1. Multi-node tightly coupled system read and write synchronization method

The above is very scalable in a class scenario dominated by metadata read operations, but metadata occurs. When writing operations, there is a problem of metadata synchronization, so it is necessary to ensure read and write consistency through inter-node interaction. The read-write synchronization process of a multi-node tightly coupled system is shown in Figure 5:

(1) When reading and writing each time, first read the current version d _v from this node;

(2) After the data update, the version number v+1, send a write request d _v+1 to all n nodes;

(3) After node n _i receives d _v+1 , if it does not agree to a higher version before, v _i <v+1 is in favor of returning, otherwise the sender is notified of the latest version number v _i

(4) After not receiving more than half of the consent votes, take the maximum version number v _m returned by each node,

4.2) Otherwise, read the latest version v _m to n/2+1 nodes,

4.3) After receiving the maximum version number, set the current version v=v _m to continue (2)

(5) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;

(6) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed.

The above method can ensure the read and write consistency of the data. Although the simple read operation is affected by the step (6), a delay phenomenon occurs, but the order of reading the versions can be ensured. In case you need to read the latest version, you can take the initiative to perform step 4.1) to synchronize the data. In addition, the method has good fault tolerance, as long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, through steps 4.2), 4.3) You can sync.

The drive architecture and synchronization method for big data interaction processing proposed by the present invention can be applied to big data processing systems such as MapReduce, Spark, HBase, etc., by constructing a client driver layer, enabling customers on the basis of ensuring consistency. The driver layer has near-linear scalability in the scenario dominated by the metadata query class, meeting the needs of online high-concurrency interactive analysis of big data. Taking the driver architecture built on MapReduce as an example, in the case where the original Hive single-point mode only supports 100 concurrency, the 5-node tightly coupled driver architecture can achieve 500 concurrency.

In addition to the technical features described in the specification, they are known to those skilled in the art.

Claims

A tightly coupled and extensible big data interaction method, characterized in that by constructing a distributed tightly coupled client driver layer, on the basis of ensuring consistency, a single point of failure of the client or the server can be avoided, and the customer is reduced. The communication overhead between the two ends makes the system have near-linear scalability in the scenario dominated by the metadata query class, and meets the requirements of online high-concurrency interactive analysis of big data. The specific steps are as follows:

1) deploying multiple application instances in the application server, and performing load balancing among the application instances;

2) Dynamically linking the client driver in the process space of each instance, the client accepts the interaction request sent by the application, completes the Sql parsing, the execution path optimization, the task scheduling, the sending operation request, and the result aggregation;

3) The application instance gets the returned result and processes it at the business logic layer, avoiding the single point of failure of the client or the server, and reducing the communication overhead between the clients. Since the client driver of the above architecture only needs to save a small amount of system metadata. State, and metadata is based on read and query class operations, so it can effectively extend and support high concurrency. When metadata write operations occur, there is a problem of metadata synchronization. Therefore, it is necessary to ensure consistent reading and writing through inter-node interaction. Sex

4) The read-write synchronization process reads the current version from the node first when reading and writing each time; after the data is updated, the version number is incremented by 1, and a write data update request is sent to all the nodes; after the node receives the new version update, If you have not agreed to a higher version before, you are in favor of returning, otherwise notify the sender of the latest version number;

5) After not receiving more than half of the consent votes, take the maximum version number returned by each node. If the maximum version number is the same as that issued by itself, it indicates that the update conflicts, waiting for the latest version data synchronization, otherwise it reads from more than half of the nodes. The latest version data, after receiving the latest version data, reset the current version to continue the update;

6) After receiving the consent votes of more than half of the nodes, submit the results to all nodes; after receiving the confirmation of half of the nodes, the read and write operations are completed;
A distributed multi-node tightly coupled big data interaction method according to claim 1, characterized in that it has good scalability in a class scenario dominated by metadata read operations, but a metadata write operation occurs. At the time, there is a problem of metadata synchronization. Therefore, it is necessary to ensure read and write consistency through inter-node interaction. The read-write synchronization process of a multi-node tightly coupled system is as follows:

(1) When reading and writing each time, first read the current version d v from this node;

(2) After the data update, the version number v+1, send a write request d v+1 to all n nodes;

(3) After node n i receives d v+1 , if it does not agree to a higher version before, v i <v+1 is in favor of returning,

Otherwise notify the sender of the latest version number v i ;

(4) When no more than half of the consent tickets have been received, the maximum version number v m returned by each node is taken;

4.1) When v m = v+1, it indicates an update conflict, waiting for the latest version v m synchronization;

4.2) Otherwise, read the latest version v m to n/2+1 nodes;

4.3) After receiving the maximum version number, set the current version v=v m to continue the step (2);

(5) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;

(6) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed;

(7) The simple read operation is affected by the step (6), and the delay phenomenon occurs, but the order of reading the versions is guaranteed to be consistent. In the case that the latest version needs to be read, step 4.1) is actively performed to synchronize the data;

(8) As long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, and synchronization can be performed through steps 4.2) and 4.3).