WO2016065776A1 - Method for tightly coupled scalable big-data interaction - Google Patents

Method for tightly coupled scalable big-data interaction Download PDF

Info

Publication number
WO2016065776A1
WO2016065776A1 PCT/CN2015/072975 CN2015072975W WO2016065776A1 WO 2016065776 A1 WO2016065776 A1 WO 2016065776A1 CN 2015072975 W CN2015072975 W CN 2015072975W WO 2016065776 A1 WO2016065776 A1 WO 2016065776A1
Authority
WO
WIPO (PCT)
Prior art keywords
read
nodes
write
node
data
Prior art date
Application number
PCT/CN2015/072975
Other languages
French (fr)
Chinese (zh)
Inventor
王恩东
张东
亓开元
刘成平
辛国茂
杨勇
卢俊佐
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2016065776A1 publication Critical patent/WO2016065776A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the invention relates to the field of big data technology, in particular to a tightly coupled and scalable big data interaction method.
  • the existing interactive analysis engine supports the table structure and the SQL statement mode, and the underlying data system adopts the distributed architecture, but the interaction analysis effect in practical applications is still very poor.
  • Hive uses the MapReduce engine to adopt a strict synchronization and step-by-step materialization mode in each processing stage, and the processing delay is large.
  • Shark is based on the memory calculation engine, the processing performance is optimized through pipelined and intermediate result caching, but because of its traditional Clinet /Sever mode, and SQL parsing, path planning and metadata processing Server side only supports single-point deployment, but it restricts the high concurrent interaction processing effect. Therefore, there is a need for a new driver architecture that meets the needs of online high-concurrency interactive analysis of big data.
  • the object of the present invention is achieved in the following manner.
  • a distributed tightly coupled client driver layer on the basis of ensuring consistency, a single point of failure of the client or the server can be avoided, and the relationship between the clients is reduced.
  • the communication overhead makes the system have near-linear scalability in the scenario dominated by the metadata query class, which satisfies the online high-concurrency interactive analysis requirements of big data.
  • the application instance gets the returned result and processes it in the business logic layer to avoid a single point on the client or server. It fails and reduces the communication overhead between clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is mainly based on read and query operations, it can effectively expand and support. High concurrency, when metadata write operations occur, there is a problem of metadata synchronization, so it is necessary to ensure read and write consistency through inter-node interaction;
  • the read-write synchronization process reads the current version from the node first when reading and writing each time; after the data is updated, the version number is incremented by 1, and a write data update request is sent to all the nodes; after the node receives the new version update, If you have not agreed to a higher version before, you are in favor of returning, otherwise notify the sender of the latest version number;
  • step 4.1 is actively performed to synchronize the data
  • the object of the present invention is that the above method can ensure the read/write consistency of data, and although the read operation may be delayed, the order of reading the versions can be ensured. In the case that you need to read the latest version, you can actively perform a data synchronization process.
  • the method is very fault tolerant, only If the number of failed nodes is less than half, the read and write data of other nodes is not affected. When the node replies, only one read and write operation can be synchronized through the steps.
  • Figure 1 is a single client, single server system architecture diagram
  • Figure 2 is a single client, multi-server system architecture diagram
  • Figure 3 is a multi-client, multi-server separation system architecture diagram
  • Figure 4 is a multi-node tightly coupled system architecture diagram
  • Figure 5 is a diagram of the read and write synchronization process of the multi-node drive architecture.
  • the single-client and single-server systems shown in Figure 1 have a single point of failure and performance bottleneck on the server side.
  • the single-client and multi-server systems shown in Figure 2 establish a cluster on the server.
  • the multi-client and multi-server separation system shown in Figure 3 establishes a cluster on the client and the server respectively, and can perform load balancing on both ends, although a single point can be avoided.
  • the client driver accepts the interactive request sent by the application, completes the Sql parsing, performs the operation compiling and path optimization, and sends an operation request to the distributed big data processing system;
  • the big data processing system performs processing on each processing node, and returns the result to the client driver summary processing
  • the application instance gets the returned result and processes it at the business logic layer
  • the above architecture can avoid single point of failure of the client or the server, and reduce the communication overhead between the clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is read and queried. Class operations are dominant, so they can be effectively extended and support high concurrency.
  • the above method can ensure the read and write consistency of the data. Although the simple read operation is affected by the step (6), a delay phenomenon occurs, but the order of reading the versions can be ensured. In case you need to read the latest version, you can take the initiative to perform step 4.1) to synchronize the data. In addition, the method has good fault tolerance, as long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, through steps 4.2), 4.3) You can sync.
  • the drive architecture and synchronization method for big data interaction processing proposed by the present invention can be applied to big data processing systems such as MapReduce, Spark, HBase, etc., by constructing a client driver layer, enabling customers on the basis of ensuring consistency.
  • the driver layer has near-linear scalability in the scenario dominated by the metadata query class, meeting the needs of online high-concurrency interactive analysis of big data.
  • the driver architecture built on MapReduce as an example, in the case where the original Hive single-point mode only supports 100 concurrency, the 5-node tightly coupled driver architecture can achieve 500 concurrency.

Abstract

Provided is a method for tightly coupled scalable big-data interaction; by means of constructing a distributed tightly coupled client driver layer and while ensuring consistency, single point failure of a client or server is avoided and the communication overhead between clients is reduced, causing a system to have near-linear scalability when based on metadata query class, satisfying the requirements of online high concurrency interaction and analysis. The described method ensures data read/write consistency; although delays occur with a read operation alone, the consistency in the order of read versions is ensured. When it is necessary to read the newest version, a data synchronization process may be executed proactively. Moreover, the method has good fault tolerance; as long as the number of failed nodes is less than half of the number of nodes, the read/write data of the other nodes is not affected; after a node responds, only one successful read/write operation is needed in order for synchronization to take place.

Description

一种紧耦合可扩展的大数据交互方法A tightly coupled and scalable big data interaction method 技术领域Technical field
本发明涉及大数据技术领域,具体地说是一种紧耦合可扩展的大数据交互方法。The invention relates to the field of big data technology, in particular to a tightly coupled and scalable big data interaction method.
背景技术Background technique
随着大数据时代的到来,针对行业大数据业务应用需求,面向数据密集型应用的计算模型和系统不断出现,如离线批处理系统MapReduce,海量数据高并发处理系统HBase,内存计算框架Spark和流式处理框架Storm,以及传统的高性能计算框架MPI等。在这些大数据处理模式中,由于都引入了新的编程模型,学习成本较大,因此,基于各类大数据处理系统构建与传统数据库应用为相似的交互分析模式和效果需求最为广泛。在交互分析中,数据以表的形式存储,以SQL语句作为编程接口,支持检索、统计、关联、排序等操作,达到高并发、低延迟的处理效果。当前出现的基于MapReduce的Hive,基于Spark的Shark都属于这一类交互分析引擎。With the advent of the era of big data, computing models and systems for data-intensive applications are emerging for the needs of big data business applications in the industry, such as offline batch processing system MapReduce, massive data high concurrent processing system HBase, memory computing framework Spark and streaming. The processing framework Storm, as well as the traditional high-performance computing framework MPI. In these big data processing modes, since new programming models are introduced and the learning cost is large, the interactive analysis mode and effect demand similar to the traditional database application based on various big data processing systems are the most extensive. In the interactive analysis, the data is stored in the form of a table, and the SQL statement is used as a programming interface to support retrieval, statistics, association, sorting, etc., to achieve high concurrency and low latency processing effects. The current MapReduce-based Hive, Spark-based Shark belongs to this type of interactive analysis engine.
然而,现有的交互分析引擎,虽然支持表结构和SQL语句的模式,并且底层的数据系统采用分布式架构,但在实际应用中的交互分析效果依然很差。如Hive采用MapReduce引擎采用在各个处理阶段严格同步、步步物化的模式,处理延迟较大,Shark虽然基于内存计算引擎,通过流水化和中间结果缓存优化了处理性能,但由于其采用传统的Clinet/Sever模式,并且进行SQL解析、路径规划和元数据处理Server端仅支持单点部署,但制约了高并发的交互处理效果。因此,需要一种新型驱动架构,满足大数据的在线高并发交互分析需求。However, the existing interactive analysis engine supports the table structure and the SQL statement mode, and the underlying data system adopts the distributed architecture, but the interaction analysis effect in practical applications is still very poor. For example, Hive uses the MapReduce engine to adopt a strict synchronization and step-by-step materialization mode in each processing stage, and the processing delay is large. Although Shark is based on the memory calculation engine, the processing performance is optimized through pipelined and intermediate result caching, but because of its traditional Clinet /Sever mode, and SQL parsing, path planning and metadata processing Server side only supports single-point deployment, but it restricts the high concurrent interaction processing effect. Therefore, there is a need for a new driver architecture that meets the needs of online high-concurrency interactive analysis of big data.
发明内容Summary of the invention
本发明的目的是提供一种紧耦合可扩展的大数据交互方法。It is an object of the present invention to provide a tightly coupled and extensible method of big data interaction.
本发明的目的是按以下方式实现的,通过构建分布式紧耦合的客户端驱动层,在保证一致性的基础上,能够避免客户端或服务端的单点失效,并减少了客户端之间的通信开销,使系统在以元数据查询类为主的场景下具有接近线性的可扩展性,满足大数据的在线高并发交互分析需求,具体步骤如下:The object of the present invention is achieved in the following manner. By constructing a distributed tightly coupled client driver layer, on the basis of ensuring consistency, a single point of failure of the client or the server can be avoided, and the relationship between the clients is reduced. The communication overhead makes the system have near-linear scalability in the scenario dominated by the metadata query class, which satisfies the online high-concurrency interactive analysis requirements of big data. The specific steps are as follows:
1)在应用服务器中部署多个应用实例,各应用实例间进行负载均衡;1) deploying multiple application instances in the application server, and performing load balancing among the application instances;
2)在每个实例的进程空间中动态链接客户端驱动,客户端接受应用发送的交互请求,完成Sql解析、执行路径优化、任务调度、发送操作请求和结果汇聚;2) Dynamically linking the client driver in the process space of each instance, the client accepts the interaction request sent by the application, completes the Sql parsing, the execution path optimization, the task scheduling, the sending operation request, and the result aggregation;
3)应用实例得到返回结果并在业务逻辑层处理,避免客户端或服务端的单点 失效,并减少了客户端之间的通信开销,由于上述架构的客户端驱动只需要保存少量系统的元数据状态,并且元数据是以读取和查询类操作为主,因此能够有效扩展、支持高并发,当发生元数据写操作时,存在着元数据同步问题,因此需要通过节点间交互保障读写一致性;3) The application instance gets the returned result and processes it in the business logic layer to avoid a single point on the client or server. It fails and reduces the communication overhead between clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is mainly based on read and query operations, it can effectively expand and support. High concurrency, when metadata write operations occur, there is a problem of metadata synchronization, so it is necessary to ensure read and write consistency through inter-node interaction;
4)读写同步过程为每次读写时,先从本节点读到当前版本;进行数据更新后,版本号加1,向所有个节点发送写数据更新请求;节点收到新版本更新后,若之前没有同意更高的版本,则赞成返回,否则通知发送方最新的版本号;4) The read-write synchronization process reads the current version from the node first when reading and writing each time; after the data is updated, the version number is incremented by 1, and a write data update request is sent to all the nodes; after the node receives the new version update, If you have not agreed to a higher version before, you are in favor of returning, otherwise notify the sender of the latest version number;
5)当未收到半数以上同意票后,取各节点返回的最大的版本号,若最大版本号与自己发出的相同,表明更新冲突,等待最新版本数据同步,否则从半数以上个节点读取最新版本数据,当收到最新版本数据后,重新设置当前版本继续进行更新;5) After not receiving more than half of the consent votes, take the maximum version number returned by each node. If the maximum version number is the same as that issued by itself, it indicates that the update conflicts, waiting for the latest version data synchronization, otherwise it reads from more than half of the nodes. The latest version data, after receiving the latest version data, reset the current version to continue the update;
6)当收到半数个节点以上个同意票后,向所有节点提交结果;收到半数个节点的确认后,读写操作完成;6) After receiving the consent votes of more than half of the nodes, submit the results to all nodes; after receiving the confirmation of half of the nodes, the read and write operations are completed;
在以元数据读操作为主的类场景下具有很好的可扩展性,但发生元数据写操作时,存在着元数据同步问题,因此需要通过节点间交互保障读写一致性,多节点紧耦合系统的读写同步过程如下:In the class scenario based on metadata read operation, it has good scalability. However, when metadata write operation occurs, there is a problem of metadata synchronization. Therefore, it is necessary to ensure read and write consistency through inter-node interaction. The read and write synchronization process of the coupled system is as follows:
(7)每次读写时,先从本节点读到当前版本dv(7) When reading and writing each time, first read the current version d v from this node;
(8)进行数据更新后,版本号v+1,向所有n个节点发送写请求dv+1(8) After the data update, the version number v+1, send a write request d v+1 to all n nodes;
(9)节点ni收到dv+1后,若之前没有同意更高的版本,即vi<v+1则赞成返回,否则通知发送方最新的版本号vi(9) After node n i receives d v+1 , if it does not agree to a higher version before, v i <v+1 is in favor of returning, otherwise the sender's latest version number v i is notified;
(10)当未收到半数以上同意票后,取各节点返回的最大的版本号vm(10) After not receiving more than half of the consent tickets, take the maximum version number v m returned by each node;
4.1)当vm=v+1,表明更新冲突,等待最新版本vm同步;4.1) When v m = v+1, it indicates an update conflict, waiting for the latest version v m synchronization;
4.2)否则,向n/2+1个节点读取最新版本vm4.2) Otherwise, read the latest version v m to n/2+1 nodes;
4.3)当收到最大版本号后,设置当前版本v=vm继续执行步骤(2);4.3) After receiving the maximum version number, set the current version v=v m to continue the step (2);
(11)否则,当收到半数n/2+1个节点以上个同意票后,向所有节点提交结果;(11) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;
(12)收到n/2+1个节点的确认后,读写操作完成;(12) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed;
(7)单纯读操作受步骤(6)影响,会出现延迟现象,但能保证读取版本的顺序一致,在需要读取最新版本情况下,主动执行一次步骤4.1)以同步数据;(7) The simple read operation is affected by the step (6), and the delay phenomenon occurs, but the order of reading the versions is guaranteed to be consistent. In the case that the latest version needs to be read, step 4.1) is actively performed to synchronize the data;
(8)只要失效节点数小于n/2+1,其他节点读写数据不受影响,当节点回复后,只需一次读写操作,通过步骤4.2)、4.3)即可同步。(8) As long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, and synchronization can be performed through steps 4.2) and 4.3).
本发明的目的有益效果是:上述方法可以保证数据的读写一致性,虽然单纯读操作会出现延迟现象,但可以保证读取版本的顺序一致。在需要读取最新版本情况下,可以主动执行一次数据同步过程。此外,该方法具备很好的容错性,只 要失效节点数小于半数,其他节点读写数据不受影响,当节点回复后,只需一次读写操作通过步骤即可同步。The object of the present invention is that the above method can ensure the read/write consistency of data, and although the read operation may be delayed, the order of reading the versions can be ensured. In the case that you need to read the latest version, you can actively perform a data synchronization process. In addition, the method is very fault tolerant, only If the number of failed nodes is less than half, the read and write data of other nodes is not affected. When the node replies, only one read and write operation can be synchronized through the steps.
附图说明DRAWINGS
图1是单客户端、单服务器系统架构图;Figure 1 is a single client, single server system architecture diagram;
图2是单客户端、多服务端系统架构图;Figure 2 is a single client, multi-server system architecture diagram;
图3是多客户端、多服务端分离系统架构图;Figure 3 is a multi-client, multi-server separation system architecture diagram;
图4是多节点紧耦合系统架构图;Figure 4 is a multi-node tightly coupled system architecture diagram;
图5是多节点驱动架构的读写同步过程图。Figure 5 is a diagram of the read and write synchronization process of the multi-node drive architecture.
具体实施方式detailed description
以下将结合附图及实施例来详细说明本发明的实施方式,借此对本发明如何应用技术手段来解决技术问题,并达成技术效果的实现过程能充分理解并据以实施。需要说明的是,如果不冲突,本发明实施例以及实施例中的各个特征的相互均在本发明的保护范围之内。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and embodiments, in which the present invention can be applied to the technical problems, and the implementation of the technical effects can be fully understood and implemented. It should be noted that, if not conflicting, each of the features of the embodiments of the present invention and the embodiments are within the protection scope of the present invention.
在传统的客户机服务器模式中,图1所示的单客户端、单服务器系统在服务端存在单点失效和性能瓶颈,图2所示单客户端、多服务端系统在服务端建立了集群,但在客户端存在单点失效和性能瓶颈,图3所示多客户端、多服务端分离系统在客户端和服务端分别建立集群,能够在两端分别进行负载均衡,虽然能够避免单点失效,提高并发性能,但若客户端和服务端采用物理隔离的部署方式则节点资源需求量太大,即便是采用物理集中的部署模式相互之间仍是多对多的复杂拓扑结构,各种路由消息、收发数据所产占用的系统和通信开销随着节点数目增加呈幂指数增长,在网络带宽受限的环境下严重影响了大数据系统的性能。In the traditional client server mode, the single-client and single-server systems shown in Figure 1 have a single point of failure and performance bottleneck on the server side. The single-client and multi-server systems shown in Figure 2 establish a cluster on the server. However, there is a single point of failure and performance bottleneck on the client. The multi-client and multi-server separation system shown in Figure 3 establishes a cluster on the client and the server respectively, and can perform load balancing on both ends, although a single point can be avoided. Failure, improve concurrent performance, but if the client and server are physically isolated, the node resource requirements are too large, even if the physical centralized deployment mode is still a many-to-many complex topology, various The system and communication overhead occupied by routing messages and sending and receiving data increases exponentially with the number of nodes, which seriously affects the performance of big data systems in environments with limited network bandwidth.
多节点紧耦合系统如图4所示:The multi-node tight coupling system is shown in Figure 4:
(1)在应用服务器中部署n个应用实例,各应用实例间进行负载均衡;(1) deploying n application instances in the application server, and performing load balancing among the application instances;
(2)在每个实例的进程空间中动态链接客户端驱动;(2) dynamically linking the client driver in the process space of each instance;
(3)客户端驱动接受应用发送的交互请求,完成Sql解析、执行操作编译和路径优化、向分布式大数据处理系统发送操作请求;(3) The client driver accepts the interactive request sent by the application, completes the Sql parsing, performs the operation compiling and path optimization, and sends an operation request to the distributed big data processing system;
(4)大数据处理系统在各处理节点上进行处理,并将结果返回给客户端驱动汇总处理;(4) The big data processing system performs processing on each processing node, and returns the result to the client driver summary processing;
(5)应用实例得到返回结果并在业务逻辑层处理;(5) The application instance gets the returned result and processes it at the business logic layer;
上述架构能够避免客户端或服务端的单点失效,并减少了客户端之间的通信开销,由于上述架构的客户端驱动只需要保存少量系统的元数据状态,并且元数据是以读取和查询类操作为主,因此能够有效扩展、支持高并发。The above architecture can avoid single point of failure of the client or the server, and reduce the communication overhead between the clients. Because the client driver of the above architecture only needs to save the metadata state of a small number of systems, and the metadata is read and queried. Class operations are dominant, so they can be effectively extended and support high concurrency.
1.多节点紧耦合系统读写同步方法1. Multi-node tightly coupled system read and write synchronization method
上述在以元数据读操作为主的类场景下具有很好的可扩展性,但发生元数据 写操作时,存在着元数据同步问题,因此需要通过节点间交互保障读写一致性。多节点紧耦合系统的读写同步过程如图5所示:The above is very scalable in a class scenario dominated by metadata read operations, but metadata occurs. When writing operations, there is a problem of metadata synchronization, so it is necessary to ensure read and write consistency through inter-node interaction. The read-write synchronization process of a multi-node tightly coupled system is shown in Figure 5:
(1)每次读写时,先从本节点读到当前版本dv(1) When reading and writing each time, first read the current version d v from this node;
(2)进行数据更新后,版本号v+1,向所有n个节点发送写请求dv+1(2) After the data update, the version number v+1, send a write request d v+1 to all n nodes;
(3)节点ni收到dv+1后,若之前没有同意更高的版本,即vi<v+1则赞成返回,否则通知发送方最新的版本号vi (3) After node n i receives d v+1 , if it does not agree to a higher version before, v i <v+1 is in favor of returning, otherwise the sender is notified of the latest version number v i
(4)当未收到半数以上同意票后,取各节点返回的最大的版本号vm,(4) After not receiving more than half of the consent votes, take the maximum version number v m returned by each node,
4.1)当vm=v+1,表明更新冲突,等待最新版本vm同步;4.1) When v m = v+1, it indicates an update conflict, waiting for the latest version v m synchronization;
4.2)否则,向n/2+1个节点读取最新版本vm4.2) Otherwise, read the latest version v m to n/2+1 nodes,
4.3)当收到最大版本号后,设置当前版本v=vm继续进行(2)4.3) After receiving the maximum version number, set the current version v=v m to continue (2)
(5)否则,当收到半数n/2+1个节点以上个同意票后,向所有节点提交结果;(5) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;
(6)收到n/2+1个节点的确认后,读写操作完成。(6) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed.
上述方法可以保证数据的读写一致性,虽然单纯读操作受步骤(6)影响,会出现延迟现象,但可以保证读取版本的顺序一致。在需要读取最新版本情况下,可以主动执行一次步骤4.1)同步数据。此外,该方法具备很好的容错性,只要失效节点数小于n/2+1,其他节点读写数据不受影响,当节点回复后,只需一次读写操作,通过步骤4.2)、4.3)即可同步。The above method can ensure the read and write consistency of the data. Although the simple read operation is affected by the step (6), a delay phenomenon occurs, but the order of reading the versions can be ensured. In case you need to read the latest version, you can take the initiative to perform step 4.1) to synchronize the data. In addition, the method has good fault tolerance, as long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, through steps 4.2), 4.3) You can sync.
本发明提出的面向大数据交互处理的驱动架构及同步方法,可以应用到MapReduce、Spark、HBase等大数据处理系统上,通过在构建客户端驱动层,能够在保证一致性的基础上,使客户驱动层在在以元数据查询类为主的场景下具有接近线性的可扩展性,满足大数据的在线高并发交互分析需求。以构建于MapReduce的驱动架构为例,在原先Hive的单点方式仅支持100个并发的情况下,使用5节点紧耦合驱动架构能使并发量达到500个。The drive architecture and synchronization method for big data interaction processing proposed by the present invention can be applied to big data processing systems such as MapReduce, Spark, HBase, etc., by constructing a client driver layer, enabling customers on the basis of ensuring consistency. The driver layer has near-linear scalability in the scenario dominated by the metadata query class, meeting the needs of online high-concurrency interactive analysis of big data. Taking the driver architecture built on MapReduce as an example, in the case where the original Hive single-point mode only supports 100 concurrency, the 5-node tightly coupled driver architecture can achieve 500 concurrency.
除说明书所述的技术特征外,均为本专业技术人员的已知技术。 In addition to the technical features described in the specification, they are known to those skilled in the art.

Claims (2)

  1. 一种紧耦合可扩展的大数据交互方法,其特征在于通过构建分布式紧耦合的客户端驱动层,在保证一致性的基础上,能够避免客户端或服务端的单点失效,并减少了客户端之间的通信开销,使系统在以元数据查询类为主的场景下具有接近线性的可扩展性,满足大数据的在线高并发交互分析需求,具体步骤如下:A tightly coupled and extensible big data interaction method, characterized in that by constructing a distributed tightly coupled client driver layer, on the basis of ensuring consistency, a single point of failure of the client or the server can be avoided, and the customer is reduced. The communication overhead between the two ends makes the system have near-linear scalability in the scenario dominated by the metadata query class, and meets the requirements of online high-concurrency interactive analysis of big data. The specific steps are as follows:
    1)在应用服务器中部署多个应用实例,各应用实例间进行负载均衡;1) deploying multiple application instances in the application server, and performing load balancing among the application instances;
    2)在每个实例的进程空间中动态链接客户端驱动,客户端接受应用发送的交互请求,完成Sql解析、执行路径优化、任务调度、发送操作请求和结果汇聚;2) Dynamically linking the client driver in the process space of each instance, the client accepts the interaction request sent by the application, completes the Sql parsing, the execution path optimization, the task scheduling, the sending operation request, and the result aggregation;
    3)应用实例得到返回结果并在业务逻辑层处理,避免客户端或服务端的单点失效,并减少了客户端之间的通信开销,由于上述架构的客户端驱动只需要保存少量系统的元数据状态,并且元数据是以读取和查询类操作为主,因此能够有效扩展、支持高并发,当发生元数据写操作时,存在着元数据同步问题,因此需要通过节点间交互保障读写一致性;3) The application instance gets the returned result and processes it at the business logic layer, avoiding the single point of failure of the client or the server, and reducing the communication overhead between the clients. Since the client driver of the above architecture only needs to save a small amount of system metadata. State, and metadata is based on read and query class operations, so it can effectively extend and support high concurrency. When metadata write operations occur, there is a problem of metadata synchronization. Therefore, it is necessary to ensure consistent reading and writing through inter-node interaction. Sex
    4)读写同步过程为每次读写时,先从本节点读到当前版本;进行数据更新后,版本号加1,向所有个节点发送写数据更新请求;节点收到新版本更新后,若之前没有同意更高的版本,则赞成返回,否则通知发送方最新的版本号;4) The read-write synchronization process reads the current version from the node first when reading and writing each time; after the data is updated, the version number is incremented by 1, and a write data update request is sent to all the nodes; after the node receives the new version update, If you have not agreed to a higher version before, you are in favor of returning, otherwise notify the sender of the latest version number;
    5)当未收到半数以上同意票后,取各节点返回的最大的版本号,若最大版本号与自己发出的相同,表明更新冲突,等待最新版本数据同步,否则从半数以上个节点读取最新版本数据,当收到最新版本数据后,重新设置当前版本继续进行更新;5) After not receiving more than half of the consent votes, take the maximum version number returned by each node. If the maximum version number is the same as that issued by itself, it indicates that the update conflicts, waiting for the latest version data synchronization, otherwise it reads from more than half of the nodes. The latest version data, after receiving the latest version data, reset the current version to continue the update;
    6)当收到半数个节点以上个同意票后,向所有节点提交结果;收到半数个节点的确认后,读写操作完成;6) After receiving the consent votes of more than half of the nodes, submit the results to all nodes; after receiving the confirmation of half of the nodes, the read and write operations are completed;
  2. 根据权利要求1所述的一种分布式多节点紧耦合大数据交互方法,其特征在于,在以元数据读操作为主的类场景下具有很好的可扩展性,但发生元数据写操作时,存在着元数据同步问题,因此需要通过节点间交互保障读写一致性,多节点紧耦合系统的读写同步过程如下:A distributed multi-node tightly coupled big data interaction method according to claim 1, characterized in that it has good scalability in a class scenario dominated by metadata read operations, but a metadata write operation occurs. At the time, there is a problem of metadata synchronization. Therefore, it is necessary to ensure read and write consistency through inter-node interaction. The read-write synchronization process of a multi-node tightly coupled system is as follows:
    (1)每次读写时,先从本节点读到当前版本dv(1) When reading and writing each time, first read the current version d v from this node;
    (2)进行数据更新后,版本号v+1,向所有n个节点发送写请求dv+1(2) After the data update, the version number v+1, send a write request d v+1 to all n nodes;
    (3)节点ni收到dv+1后,若之前没有同意更高的版本,即vi<v+1则赞成返回,(3) After node n i receives d v+1 , if it does not agree to a higher version before, v i <v+1 is in favor of returning,
    否则通知发送方最新的版本号viOtherwise notify the sender of the latest version number v i ;
    (4)当未收到半数以上同意票后,取各节点返回的最大的版本号vm(4) When no more than half of the consent tickets have been received, the maximum version number v m returned by each node is taken;
    4.1)当vm=v+1,表明更新冲突,等待最新版本vm同步;4.1) When v m = v+1, it indicates an update conflict, waiting for the latest version v m synchronization;
    4.2)否则,向n/2+1个节点读取最新版本vm4.2) Otherwise, read the latest version v m to n/2+1 nodes;
    4.3)当收到最大版本号后,设置当前版本v=vm继续执行步骤(2);4.3) After receiving the maximum version number, set the current version v=v m to continue the step (2);
    (5)否则,当收到半数n/2+1个节点以上个同意票后,向所有节点提交结果;(5) Otherwise, when half of the n/2+1 nodes are received, the result is submitted to all nodes;
    (6)收到n/2+1个节点的确认后,读写操作完成;(6) After receiving the confirmation of n/2+1 nodes, the read and write operations are completed;
    (7)单纯读操作受步骤(6)影响,会出现延迟现象,但能保证读取版本的顺序一致,在需要读取最新版本情况下,主动执行一次步骤4.1)以同步数据;(7) The simple read operation is affected by the step (6), and the delay phenomenon occurs, but the order of reading the versions is guaranteed to be consistent. In the case that the latest version needs to be read, step 4.1) is actively performed to synchronize the data;
    (8)只要失效节点数小于n/2+1,其他节点读写数据不受影响,当节点回复后,只需一次读写操作,通过步骤4.2)、4.3)即可同步。 (8) As long as the number of failed nodes is less than n/2+1, the read and write data of other nodes is not affected. When the node replies, only one read and write operation is required, and synchronization can be performed through steps 4.2) and 4.3).
PCT/CN2015/072975 2014-10-28 2015-02-13 Method for tightly coupled scalable big-data interaction WO2016065776A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410585403.7 2014-10-28
CN201410585403.7A CN104348913B (en) 2014-10-28 2014-10-28 A kind of extendible big data interactive method of close coupling

Publications (1)

Publication Number Publication Date
WO2016065776A1 true WO2016065776A1 (en) 2016-05-06

Family

ID=52503695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072975 WO2016065776A1 (en) 2014-10-28 2015-02-13 Method for tightly coupled scalable big-data interaction

Country Status (2)

Country Link
CN (1) CN104348913B (en)
WO (1) WO2016065776A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104348913B (en) * 2014-10-28 2016-08-24 浪潮电子信息产业股份有限公司 A kind of extendible big data interactive method of close coupling
CN108063780B (en) * 2016-11-08 2021-02-19 中国电信股份有限公司 Method and system for dynamically replicating data
CN106599195B (en) * 2016-12-14 2020-07-31 北京邮电大学 Metadata synchronization method and system under massive network data environment
CN108234641B (en) * 2017-12-29 2021-01-29 北京奇元科技有限公司 Data reading and writing method and device based on distributed consistency protocol
CN110825309B (en) * 2018-08-08 2021-06-29 华为技术有限公司 Data reading method, device and system and distributed system
CN109542872B (en) * 2018-10-26 2021-01-22 金蝶软件(中国)有限公司 Data reading method and device, computer equipment and storage medium
CN111090665A (en) * 2019-11-15 2020-05-01 广东数果科技有限公司 Data task scheduling method and scheduling system
CN116483739B (en) * 2023-06-21 2023-08-25 深存科技(无锡)有限公司 KV pair quick writing architecture based on hash calculation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218210A (en) * 2013-04-28 2013-07-24 北京航空航天大学 File level partitioning system suitable for big data high concurrence access
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data
CN103428292A (en) * 2013-08-20 2013-12-04 浪潮集团有限公司 Device and method for effectively storing big data
CN104348913A (en) * 2014-10-28 2015-02-11 浪潮电子信息产业股份有限公司 Tight-coupling extensible big data interaction method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023920B (en) * 2010-10-27 2012-09-05 西安交通大学 Method for gathering messages in remote parallel program debugging system based on tree form
CN102521044B (en) * 2011-12-30 2013-12-25 北京拓明科技有限公司 Distributed task scheduling method and system based on messaging middleware
CN103188346A (en) * 2013-03-05 2013-07-03 北京航空航天大学 Distributed decision making supporting massive high-concurrency access I/O (Input/output) server load balancing system
CN103227754B (en) * 2013-04-16 2017-02-08 浪潮(北京)电子信息产业有限公司 Dynamic load balancing method of high-availability cluster system, and node equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data
CN103218210A (en) * 2013-04-28 2013-07-24 北京航空航天大学 File level partitioning system suitable for big data high concurrence access
CN103428292A (en) * 2013-08-20 2013-12-04 浪潮集团有限公司 Device and method for effectively storing big data
CN104348913A (en) * 2014-10-28 2015-02-11 浪潮电子信息产业股份有限公司 Tight-coupling extensible big data interaction method

Also Published As

Publication number Publication date
CN104348913B (en) 2016-08-24
CN104348913A (en) 2015-02-11

Similar Documents

Publication Publication Date Title
WO2016065776A1 (en) Method for tightly coupled scalable big-data interaction
US9367410B2 (en) Failover mechanism in a distributed computing system
US10027748B2 (en) Data replication in a tree based server architecture
US8639786B2 (en) Consistency domains for replication in distributed computing
US20150331910A1 (en) Methods and systems of query engines and secondary indexes implemented in a distributed database
US11068499B2 (en) Method, device, and system for peer-to-peer data replication and method, device, and system for master node switching
US20160028806A1 (en) Halo based file system replication
US9367261B2 (en) Computer system, data management method and data management program
US10127077B2 (en) Event distribution pattern for use with a distributed data grid
CN105493474B (en) System and method for supporting partition level logging for synchronizing data in a distributed data grid
US11595474B2 (en) Accelerating data replication using multicast and non-volatile memory enabled nodes
CN109639773B (en) Dynamically constructed distributed data cluster control system and method thereof
US10826812B2 (en) Multiple quorum witness
WO2017008506A1 (en) Command processing method and server
WO2017092384A1 (en) Clustered database distributed storage method and device
CN102937964A (en) Intelligent data service method based on distributed system
CN110807039A (en) Data consistency maintenance system and method in cloud computing environment
CN111913837A (en) System for realizing distributed middleware message recovery policy management in big data environment
CN108462737B (en) Batch processing and pipeline-based hierarchical data consistency protocol optimization method
WO2023246236A1 (en) Node configuration method, transaction log synchronization method and node for distributed database
KR101696911B1 (en) Distributed Database Apparatus and Method for Processing Stream Data Thereof
Suneja Scylladb optimizes database architecture to maximize hardware performance
CN112751789A (en) Method and system for realizing asymmetric SDN controller cluster
Lu et al. Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores
JP5449471B2 (en) Method for synchronous processing of update processing for shared data, data sharing system, and data sharing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15855578

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15855578

Country of ref document: EP

Kind code of ref document: A1