WO2016101759A1 - Data routing method, data management device and distributed storage system - Google Patents

Data routing method, data management device and distributed storage system Download PDF

Info

Publication number
WO2016101759A1
WO2016101759A1 PCT/CN2015/095507 CN2015095507W WO2016101759A1 WO 2016101759 A1 WO2016101759 A1 WO 2016101759A1 CN 2015095507 W CN2015095507 W CN 2015095507W WO 2016101759 A1 WO2016101759 A1 WO 2016101759A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
information
metadata
data node
Prior art date
Application number
PCT/CN2015/095507
Other languages
French (fr)
Chinese (zh)
Inventor
陈营
李明昊
宋昭
陈宗志
王超
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201410832192.2A priority Critical patent/CN104580428B/en
Priority to CN201410832192.2 priority
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2016101759A1 publication Critical patent/WO2016101759A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L29/00Arrangements, apparatus, circuits or systems, not covered by a single one of groups H04L1/00 - H04L27/00
    • H04L29/02Communication control; Communication processing
    • H04L29/06Communication control; Communication processing characterised by a protocol
    • H04L29/08Transmission control procedure, e.g. data link level control procedure

Abstract

Provided are a data routing method, a data management device and a distributed storage system. The method specifically comprises: a data node receiving a metadata routing request from a client, wherein the metadata routing request carries first keyword information about data; the data node obtaining data node information corresponding to the first keyword information according to a metadata table matching the first keyword information itself, wherein the data node information corresponding to the keyword is stored in the metadata table, and the metadata table is obtained based on the maintenance of the communication between data nodes; and the data node returning the data node information to the client. The embodiments of the present invention can guarantee that correct data node information is fed back to the client, thereby increasing the accuracy rate of metadata routing, and furthermore, the operation and maintenance costs of a distributed storage system can be reduced.

Description

一种数据路由方法、数据管理装置和分布式存储系统Data routing method, data management device and distributed storage system 技术领域Technical field
本发明涉及分布式存储技术领域,特别是涉及一种数据路由方法、一种数据管理装置和一种分布式存储系统。The present invention relates to the field of distributed storage technologies, and in particular, to a data routing method, a data management apparatus, and a distributed storage system.
背景技术Background technique
GFS(Google文件系统,Google File System)是一个大型的分布式文件系统,它为Google云计算提供海量存储,并且与MapReduce(映射化简)技术结合十分紧密。GFS (Google File System) is a large distributed file system that provides massive storage for Google Cloud Computing and is tightly integrated with MapReduce (Map Simplification) technology.
参照图1,示出了现有技术一种GFS的结构示意图,GFS将整个系统的节点分为三类角色:Client(客户端)、Master(主服务器)和Chunk Server(数据块服务器);其中,Client是GFS提供给应用程序的访问接口;Master是GFS的管理节点,在逻辑上只有一个,它保存系统的元数据,负责整个文件系统的管理;Chunk Server负责具体的存储工作,数据以文件的形式存储在Chunk Server上,Chunk Server的个数可以有多个。Referring to FIG. 1, a schematic structural diagram of a GFS in the prior art is shown. GFS divides nodes of the entire system into three types of roles: Client (Client), Master (Master Server), and Chunk Server (Data Block Server); Client is the access interface provided by GFS to the application; Master is the management node of GFS, there is only one logic, it saves the metadata of the system and is responsible for the management of the entire file system; Chunk Server is responsible for the specific storage work, the data is filed. The form is stored on the Chunk Server, and the number of Chunk Servers can be multiple.
客户端在访问GFS时,首先向Master发送元数据路由请求,所述元数据路由请求中携带有key(关键字)信息,由Master依据存储的元数据表获取key对应的Chunk Server信息,然后直接访问这些Chunk Server完成数据存取。GFS的这种设计方法实现了控制流和数据流的分离。Client与Master之间只有控制流,而无数据流,这样就极大地降低了Master的负载,使之不成为系统性能的一个瓶颈。Client与Chunk Server之间直接传输数据流,同时由于文件被分成多个Chunk进行分布式存储,Client可以同时访问多个Chunk Server,从而使得整个系统的I/O高度并行,系统整体性能得到提高。When the client accesses the GFS, the client first sends a metadata routing request to the master, where the metadata routing request carries key (keyword) information, and the master obtains the Chunk Server information corresponding to the key according to the stored metadata table, and then directly Access these Chunk Servers to complete data access. This design method of GFS realizes the separation of control flow and data flow. There is only control flow between the Client and the Master, and there is no data flow, which greatly reduces the load of the Master, so that it does not become a bottleneck of system performance. The client and the Chunk Server directly transmit data streams. At the same time, because the files are divided into multiple Chunk for distributed storage, the Client can access multiple Chunk Servers at the same time, so that the I/O of the entire system is highly parallel and the overall performance of the system is improved.
然而,网络抖动、节点故障等因素容易导致key对应的Chunk Server发生变化,而Master作为管理节点不能及时获知上述情况下元数据的变化,因此,不能保证向客户端反馈正确的Chunk Server信息。通常情况下,客户端在元数据请求失败时,还需要向Master发送更新元数据表的请求,并持续等待Master依据更新后元数据表返回的Chunk Server信息。 However, factors such as network jitter and node failure may cause the Chunk server corresponding to the key to change. The master as the management node cannot know the change of the metadata in the above situation in time. Therefore, the correct Chunk Server information cannot be guaranteed to the client. Normally, when the metadata request fails, the client also needs to send a request to update the metadata table to the master, and continues to wait for the Chunk Server information returned by the master according to the updated metadata table.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种数据路由方法、一种数据管理装置和一种分布式存储系统。In view of the above problems, the present invention has been made in order to provide a data routing method, a data management apparatus, and a distributed storage system that overcome the above problems or at least partially solve the above problems.
依据本发明的一个方面,提供了一种数据路由方法,包括:According to an aspect of the present invention, a data routing method is provided, including:
数据节点接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;Receiving, by the data node, a metadata routing request from the client, where the metadata routing request carries first keyword information of the data;
所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The data node obtains data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information; wherein the metadata table stores data node information corresponding to the keyword, The metadata table is maintained based on communication between data nodes;
所述数据节点将所述数据节点信息返回给所述客户端。The data node returns the data node information to the client.
根据本发明的另一方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行如上文所述的数据路由方法。According to another aspect of the present invention, a computer program is provided comprising computer readable code that, when executed on a computing device, causes the computing device to perform a data routing method as described above.
根据本发明的再一方面,提供了一种计算机可读介质,其中存储了如上文所述的计算机程序。According to still another aspect of the present invention, there is provided a computer readable medium storing a computer program as described above.
根据本发明的又一方面,提供了一种数据管理装置,包括:According to still another aspect of the present invention, a data management apparatus is provided, including:
第一接收模块,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;a first receiving module, configured to receive a metadata routing request from the client, where the metadata routing request carries first keyword information of the data;
第一匹配模块,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The first matching module is configured to obtain data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores data corresponding to the keyword Node information, which is obtained based on communication maintenance between data nodes;
第一返回模块,配置为将所述数据节点信息返回给所述客户端。a first return module configured to return the data node information to the client.
根据本发明实施例的一种数据路由方法、一种数据管理装置和一种分布式存储系统,数据节点可以利用自身存储的元数据表来处理客户端的元数据 路由请求;由于数据节点可以基于数据节点之间的通信进行元数据表的维护,使得维护的元数据表能够及时反映节点状态的变化,因此,能够保证向客户端反馈正确的数据节点信息,提高元数据路由的准确率;并且,相对于现有技术中采用专门的Master来存储并维护元数据表获取,本发明实施例减少了Master这一角色,因此能够降低分布式存储系统的运维部署成本。According to an embodiment of the present invention, a data routing method, a data management apparatus, and a distributed storage system, a data node can process metadata of a client by using a metadata table stored by itself. Routing request; since the data node can maintain the metadata table based on the communication between the data nodes, the maintained metadata table can reflect the change of the node state in time, thereby ensuring that the correct data node information is fed back to the client, thereby improving The accuracy of the metadata routing; and the use of the dedicated master to store and maintain the metadata table acquisition in the prior art, the embodiment of the present invention reduces the role of the master, thereby reducing the operation and maintenance deployment of the distributed storage system. cost.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文可选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出可选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating alternative embodiments and are not to be considered as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1示出了现有技术一种GFS的结构示意图;1 is a schematic structural view of a prior art GFS;
图2示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图;2 is a flow chart showing the steps of a data routing method according to an embodiment of the present invention;
图3示出了根据本发明一个示例的一种分布式存储系统的结构示意图;3 is a block diagram showing the structure of a distributed storage system according to an example of the present invention;
图4示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图;FIG. 4 is a flow chart showing the steps of a data routing method according to an embodiment of the present invention;
图5示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图;FIG. 5 is a flow chart showing the steps of a data routing method according to an embodiment of the present invention;
图6示出了根据本发明一个实施例的一种数据节点基于数据节点之间的通信对所述元数据表进行维护的步骤流程示意图;6 is a flow chart showing the steps of maintaining a data table based on communication between data nodes according to an embodiment of the present invention;
图7示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图;FIG. 7 is a flow chart showing the steps of a data routing method according to an embodiment of the present invention;
图8示出了根据本发明一个实施例的一种数据管理装置的结构示意图;FIG. 8 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention; FIG.
图9示出了根据本发明一个实施例的一种数据管理装置的结构示意图;FIG. 9 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention; FIG.
图10示出了根据本发明一个实施例的一种数据管理装置的结构示意图; FIG. 10 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention; FIG.
图11示出了根据本发明一个实施例的一种分布式存储系统的结构示意图;FIG. 11 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention; FIG.
图12示意性地示出了用于执行根据本发明的方法的计算设备的框图;以及Figure 12 schematically shows a block diagram of a computing device for performing the method according to the invention;
图13示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。Fig. 13 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
参照图2,示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图,具体可以包括如下步骤:Referring to FIG. 2, a schematic flowchart of a step of a data routing method according to an embodiment of the present invention is shown, which may include the following steps:
步骤201、数据节点接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;Step 201: The data node receives a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data.
步骤202、所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;Step 202: The data node matches the metadata table of the first keyword information to obtain data node information corresponding to the first keyword information, where the metadata table stores data corresponding to the keyword. Node information, which is obtained based on communication maintenance between data nodes;
步骤203、所述数据节点将所述数据节点信息返回给所述客户端。Step 203: The data node returns the data node information to the client.
本发明实施例可以应用于各种业务的分布式存储系统中,用于提高元数据路由的准确率。The embodiments of the present invention can be applied to distributed storage systems of various services for improving the accuracy of metadata routing.
参照图3,示出了根据本发明一个示例的一种分布式存储系统的结构示意图,具体可以包括:客户端301和数据节点302;其中,客户端301作为业务请求的发起方,可以通过元数据路由请求从所述数据节点302获取所述元数据路由请求对应的数据节点信息,从而可以访问所述数据节点信息对应数据节点302以完成数据存取操作;数据节点302上可以存储有元数据表, 且数据节点302之间可以通信,从而可以基于数据节点302之间的通信维护所述元数据表,以保证所述元数据表能够及时反映节点状态的变化。Referring to FIG. 3, a schematic structural diagram of a distributed storage system according to an example of the present invention is shown. Specifically, the method includes: a client 301 and a data node 302. The client 301 serves as an initiator of a service request and can pass the element. The data routing request acquires the data node information corresponding to the metadata routing request from the data node 302, so that the data node information corresponding data node 302 can be accessed to complete the data access operation; the data node 302 can store the metadata. Table, And the data nodes 302 can communicate with each other, so that the metadata table can be maintained based on communication between the data nodes 302 to ensure that the metadata table can reflect changes in node status in time.
在本发明的一种可选实施例中,所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息的步骤,具体可以包括:In an optional embodiment of the present invention, the step of the data node obtaining the data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information may specifically include :
子步骤A1、计算所述第一关键字信息的哈希值;Sub-step A1, calculating a hash value of the first keyword information;
子步骤A2、依据所述哈希值匹配所述元数据表,得到所述哈希值对应数据节点信息。Sub-step A2, matching the metadata table according to the hash value, to obtain the hash value corresponding data node information.
在具体实现中,可以采用单向散列算法等哈希算法计算所述第一关键字信息的哈希值,本发明实施例对具体的哈希算法不加以限制。In a specific implementation, a hash algorithm such as a one-way hash algorithm may be used to calculate a hash value of the first keyword information, and the specific hash algorithm is not limited in the embodiment of the present invention.
在实际应用中,所述数据节点信息具体可以包括如下信息中的一项或多项:节点编号信息、节点属性信息和节点通信速率信息。其中,节点属性信息具体可以包括:节点可用或节点不可用等信息,节点通信速率信息可以为节点的通信速率值等等。In an actual application, the data node information may specifically include one or more of the following information: node number information, node attribute information, and node communication rate information. The node attribute information may specifically include: information that the node is available or the node is unavailable, and the node communication rate information may be a communication rate value of the node or the like.
在本发明的一种应用示例中,所述哈希值对应数据节点的节点编号信息可以为所述数据节点自身的编号,也可以为其它数据节点的编号,无论何种情况,所述数据节点均可以将所述哈希值对应数据节点的节点编号返回给客户端,以满足客户端进一步的节点访问请求。In an application example of the present invention, the node number information of the data node corresponding to the hash value may be the number of the data node itself, or may be the number of other data nodes, whichever is the case, the data node The hash value corresponding to the node number of the data node may be returned to the client to satisfy the client's further node access request.
综上,数据节点可以利用自身存储的元数据表来处理客户端的元数据路由请求;由于数据节点可以基于数据节点之间的通信进行元数据表的维护,使得维护的元数据表能够及时反映节点状态的变化,因此,能够保证向客户端反馈正确的数据节点信息,提高元数据路由的准确率;并且,相对于现有技术中采用专门的Master来存储并维护元数据表获取,本发明实施例减少了Master这一角色,因此能够降低分布式存储系统的运维部署成本。In summary, the data node can use its stored metadata table to process the client's metadata routing request; since the data node can maintain the metadata table based on the communication between the data nodes, the maintained metadata table can reflect the node in time. The change of the state, therefore, can ensure that the correct data node information is fed back to the client, and the accuracy of the metadata routing is improved; and the implementation of the present invention is implemented by using a dedicated Master to store and maintain the metadata table acquisition in the prior art. This reduces the role of the Master, thus reducing the operating and deployment costs of distributed storage systems.
参照图4,示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图,具体可以包括如下步骤:Referring to FIG. 4, a schematic flowchart of a step of a data routing method according to an embodiment of the present invention is shown, which may include the following steps:
步骤401、数据节点接收来自客户端的元数据路由请求;其中,所述元 数据路由请求中携带有数据的第一关键字信息;Step 401: The data node receives a metadata routing request from a client, where the element The data routing request carries the first keyword information of the data;
步骤402、所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;Step 402: The data node matches the metadata table of the first keyword information to obtain data node information corresponding to the first keyword information, where the metadata table stores data corresponding to the keyword. Node information, which is obtained based on communication maintenance between data nodes;
步骤403、所述数据节点将所述数据节点信息返回给所述客户端;Step 403: The data node returns the data node information to the client.
步骤404、当所述第一关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点将所述元数据表返回给所述客户端。Step 404: When the data node information corresponding to the first keyword information is not the data node itself, the data node returns the metadata table to the client.
相对于图2所示实施例,本实施例在所述第一关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点还可以将所述元数据表返回给所述客户端,也即,可以向客户端提供当前最新的元数据表,方便客户端通过查询自身的元数据表来进行元数据的路由。With respect to the embodiment shown in FIG. 2, in the embodiment, when the data node information corresponding to the first keyword information is not the data node itself, the data node may further return the metadata table to the client. End, that is, the current metadata table can be provided to the client, so that the client can query the metadata by querying its own metadata table.
对于客户端而言,其可以通过如下两种方式进行元数据的路由:方式一为通过向数据节点发送元数据路由请求来进行元数据的路由,方式二为通过查询自身的元数据表来进行元数据的路由,其中方式一具有准确率高的优点,方式二具有节省流量的优点。可以理解,本领域技术人员可以根据实际需求确定采用上述两种方式中的任一或全部,例如,在对准确率要求比较严格时,可以采用上述方式一,又如,在对流量要求比较严格时,可以采用上述方式二,或者,为了保证路由的成功率,可以同时采用上述方式一和方式二等等。For the client, the metadata can be routed in the following two ways: the first method is to perform metadata routing by sending a metadata routing request to the data node, and the second method is to query the metadata table of the user by querying its own metadata table. The routing of metadata, in which the first method has the advantage of high accuracy, the second method has the advantage of saving traffic. It can be understood that any one or both of the above two methods may be determined by a person skilled in the art according to actual needs. For example, when the accuracy requirement is strict, the foregoing method 1 may be adopted, and, for example, the traffic demand is strict. In the above manner, the second method may be adopted, or in order to ensure the success rate of the route, the foregoing manners 1 and 2 may be simultaneously adopted.
参照图5,示出了根据本发明一个实施例的一种数据路由方法的步骤流程示意图,具体可以包括如下步骤:Referring to FIG. 5, a schematic flowchart of a step of a data routing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
步骤501、数据节点接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;Step 501: The data node receives a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data.
步骤502、所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护 得到;Step 502: The data node matches the metadata table of the first keyword information to obtain data node information corresponding to the first keyword information. The metadata table stores data corresponding to the keyword. Node information, which is based on communication maintenance between data nodes get;
步骤503、所述数据节点将所述数据节点信息返回给所述客户端;Step 503: The data node returns the data node information to the client.
步骤504、所述数据节点基于数据节点之间的通信对所述元数据表进行维护。Step 504: The data node performs maintenance on the metadata table based on communication between data nodes.
参照图6,示出了根据本发明一个实施例的一种数据节点基于数据节点之间的通信对所述元数据表进行维护的步骤流程图,具体可以包括如下步骤:Referring to FIG. 6 , a flow chart of steps for maintaining a data table based on communication between data nodes is performed according to an embodiment of the present invention, which may specifically include the following steps:
子步骤541、协调者节点向所有执行事务T的参与者节点发送准备消息;Sub-step 541, the coordinator node sends a preparation message to all participant nodes executing the transaction T;
子步骤542、各参与者节点确定是否提交事务T,若是,则向所述协调者节点返回准备就绪消息,否则,向所述协调者节点返回异常中止消息;Sub-step 542, each participant node determines whether to submit a transaction T, and if so, returns a ready message to the coordinator node; otherwise, returns an abnormal abort message to the coordinator node;
子步骤543、当从所有参与者节点获得的消息均为准备就绪消息时,所述协调者节点向所有参与者节点发出正式提交消息;Sub-step 543, when the messages obtained from all participant nodes are ready messages, the coordinator node sends a formal commit message to all participant nodes;
子步骤544、在接收到正式提交消息后,各参与者节点正式完成事务T,释放在整个事务T期间内占用的,并向协调者节点发送完成消息;Sub-step 544, after receiving the formal submission message, each participant node officially completes the transaction T, releases the occupation during the entire transaction T, and sends a completion message to the coordinator node;
子步骤545、所述协调者节点在收到所有参与者节点反馈的完成消息后,完成事务;Sub-step 545, the coordinator node completes the transaction after receiving the completion message fed back by all the participant nodes;
子步骤546、当所述协调者节点从所有参与者节点获得的消息中存在异常中止消息,或者,在超时之前无法获取所有参与者节点的响应消息时,所述协调者节点向所有参与者节点发出回滚消息;Sub-step 546, when the coordinator node has an abnormal abort message in the message obtained from all the participant nodes, or when the response message of all the participant nodes cannot be acquired before the timeout, the coordinator node to all the participant nodes Issue a rollback message;
这里,超时可以表示从发出正式提交消息到现在超出了预置周期。Here, the timeout can indicate that the official submission message has been issued until the preset period is exceeded.
子步骤547、各参与者节点在接收到所述回滚消息后,对事务T执行回滚操作,释放在整个T事务期间内占用的资源,并向所述协调者节点发送回滚完成消息;Sub-step 547, after receiving the rollback message, each participant node performs a rollback operation on the transaction T, releases the resources occupied during the entire T transaction, and sends a rollback completion message to the coordinator node;
子步骤548、协调者节点接收到所有参与者节点反馈的回滚完成消息后,取消事务T。Sub-step 548, after the coordinator node receives the rollback completion message fed back by all the participant nodes, cancels the transaction T.
相对于图1所示实施例,本实施例的数据节点具体可以包括:协调者节点和参与者节点,并且,增加了所述数据节点基于数据节点之间的通信对所述元数据表进行维护;其中,上述子步骤541-子步骤548涉及的元数据维护 过程具体通过两阶段提交协议来保证所有参与者节点所维护元数据表的一致性和完整性。With respect to the embodiment shown in FIG. 1, the data node of this embodiment may specifically include: a coordinator node and a participant node, and adding that the data node maintains the metadata table based on communication between data nodes. Wherein, the metadata maintenance involved in the above sub-step 541 - sub-step 548 The process specifically guarantees the consistency and integrity of the metadata tables maintained by all participant nodes through a two-phase commit protocol.
两阶段提交协议的主要思想可以为:当一个事务T要对多个数据库进行操作,必须确保该多个参与者节点的数据库都提交成功,事务T才能成功,所以,协调节点可以先对多个参与者节点发出预提交,多个参与者节点返回是否可以提交,如果所有参与者节点都能提交,则协调节点可以正式提交事务T。The main idea of the two-phase commit protocol can be: when a transaction T is to operate on multiple databases, it must be ensured that the database of the multiple participant nodes is successfully submitted, and the transaction T can succeed. Therefore, the coordination node can firstly The participant node issues a pre-submission, and the multiple participant nodes return whether they can submit. If all the participant nodes can submit, the coordination node can formally submit the transaction T.
上述子步骤541-子步骤548涉及的元数据维护过程中,子步骤541-子步骤542为第一阶段的步骤,子步骤543-子步骤548为第二阶段的步骤。其中,在第一阶段,协调者节点将通知事务的参与者节点准备提交或取消事务,由参与者节点向协调者节点告知自己的决策:同意(准备就绪消息)或取消(异常中止消息);在第二阶段,协调者节点将基于所有参与者节点反馈的信息进行决策:提交或取消,当且仅当所有的参与者节点同意提交事务时,协调者节点才通知所有的参与者节点提交事务,否则协调者节点将通知所有的参与者节点取消事务。In the metadata maintenance process involved in the above sub-step 541 - sub-step 548, sub-step 541 - sub-step 542 is the first stage step, and sub-step 543 - sub-step 548 is the second stage step. Wherein, in the first phase, the coordinator node will notify the participant node of the transaction to prepare to commit or cancel the transaction, and the participant node informs the coordinator node of its decision: consent (ready message) or cancellation (abnormal abort message); In the second phase, the coordinator node will make decisions based on information fed back by all participant nodes: commit or cancel, and the coordinator node notifies all participant nodes to commit the transaction if and only if all participant nodes agree to commit the transaction. Otherwise, the coordinator node will notify all participant nodes to cancel the transaction.
需要说明的是,上述通过两阶段提交协议来保证所有参与者节点所维护元数据表的一致性和完整性的方案只是作为保证所有参与者节点所维护元数据表的一致性和完整性的可选方案,可以理解,本领域技术人员还可以根据实际需要,采用其它方案来保证所有参与者节点所维护元数据表的一致性和完整性,如三阶段提交协议等等,本发明实施例对保证所有参与者节点所维护元数据表的一致性和完整性的具体方案不加以限制。It should be noted that the above scheme for ensuring the consistency and integrity of the metadata table maintained by all participant nodes through the two-phase commit protocol is only as a guarantee for the consistency and integrity of the metadata table maintained by all participant nodes. Alternatively, it can be understood that those skilled in the art can adopt other schemes to ensure the consistency and integrity of the metadata table maintained by all participant nodes according to actual needs, such as a three-phase commit protocol, etc. The specific scheme for ensuring the consistency and integrity of the metadata tables maintained by all participant nodes is not limited.
参照图7,示出了根据本发明一个实施例的一种数据路由方法的步骤流程图,具体可以包括如下步骤:Referring to FIG. 7, a flow chart of steps of a data routing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
步骤701、数据节点接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;Step 701: The data node receives a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data.
步骤702、所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有 关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;Step 702: The data node matches the metadata table of the first keyword information to obtain data node information corresponding to the first keyword information, where the metadata table stores Data node information corresponding to the keyword, the metadata table is obtained based on communication maintenance between the data nodes;
步骤703、所述数据节点将所述数据节点信息返回给所述客户端;Step 703: The data node returns the data node information to the client.
步骤704、所述数据节点接收来自客户端的读请求;其中,所述读请求中携带有数据的第二关键字信息;Step 704: The data node receives a read request from a client, where the read request carries second keyword information of data.
步骤705、所述数据节点依据所述读请求携带的第二关键字信息匹配自身的元数据表,得到所述第二关键字信息对应的数据节点信息;Step 705: The data node matches its own metadata table according to the second keyword information carried by the read request, to obtain data node information corresponding to the second keyword information.
步骤706、所述数据节点依据所述第二关键字信息对应的数据节点信息,判定所述读请求对应数据是否在所述数据节点自身;Step 706: The data node determines, according to the data node information corresponding to the second keyword information, whether the read request corresponding data is in the data node itself;
步骤707、当所述第二关键字信息对应的数据节点信息为所述数据节点自身时,所述数据节点依据所述读请求查询自身的数据引擎,并将查询得到的数据返回给所述客户端;Step 707: When the data node information corresponding to the second keyword information is the data node itself, the data node queries its own data engine according to the read request, and returns the data obtained by the query to the client. end;
步骤708、当所述第二关键字信息对应的数据节点信息非所述数据节点自身时,将所述读请求转发给第二关键字信息对应的数据节点信息所对应的第一数据节点;Step 708: When the data node information corresponding to the second keyword information is not the data node itself, forwarding the read request to the first data node corresponding to the data node information corresponding to the second keyword information;
步骤709、接收所述第一数据节点返回的所述读请求对应的数据,并返回给所述客户端。Step 709: Receive data corresponding to the read request returned by the first data node, and return the data to the client.
相对于图1所示实施例,本实施例除了可以通过执行步骤701-步骤703来处理来自客户端的元数据路由请求外,还可以通过步骤704-步骤709处理来自客户端的读请求,特别地,在处理所述读请求的过程中,当所述读请求中所携带第二关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点还可以担当网络代理的角色,也即,将所述读请求转发给第二关键字信息对应的数据节点信息所对应的第一数据节点,所述第一数据节点返回的所述读请求对应的数据,并返回给所述客户端;所述代理客户端转发读请求并向客户端返回读出数据的过程,能够避免客户端尝试向多个不同的数据节点发送读请求,从而节省了客户端的流量。With respect to the embodiment shown in FIG. 1, in addition to processing the metadata routing request from the client by performing steps 701-step 703, the embodiment may also process the read request from the client through steps 704-step 709, in particular, In the process of processing the read request, when the data node information corresponding to the second keyword information carried in the read request is not the data node itself, the data node may also play the role of a network proxy, that is, And forwarding the read request to the first data node corresponding to the data node information corresponding to the second keyword information, and the data corresponding to the read request returned by the first data node is returned to the client; The process in which the proxy client forwards the read request and returns the read data to the client can prevent the client from attempting to send a read request to multiple different data nodes, thereby saving the traffic of the client.
需要说明的是,数据节点除了可以代理客户端转发读请求并向客户端返回读出数据外,还可以代理客户端转发写请求并向客户端返回响应结果,由 于代理客户端转发写请求的过程与代理客户端转发读请求的过程类似,故在此不作赘述,相互参照即可。It should be noted that, in addition to the proxy client forwarding the read request and returning the read data to the client, the data node may also forward the write request to the client and return the response result to the client. The process of forwarding the write request to the proxy client is similar to the process of forwarding the read request by the proxy client, so it will not be described here and cross-referenced.
对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作并不一定是本发明实施例所必须的。For the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present invention are not limited by the described action sequence, because the embodiment according to the present invention Some steps can be performed in other orders or at the same time. In the following, those skilled in the art should also understand that the embodiments described in the specification are optional embodiments, and the actions involved are not necessarily required by the embodiments of the present invention.
参照图8,示出了根据本发明一个实施例的一种数据管理装置的结构示意图,具体可以包括如下模块:FIG. 8 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention, which may specifically include the following modules:
第一接收模块801,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;The first receiving module 801 is configured to receive a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data;
第一匹配模块802,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The first matching module 802 is configured to obtain data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores a keyword corresponding Data node information obtained based on communication maintenance between data nodes;
第一返回模块803,配置为将所述数据节点信息返回给所述客户端。The first returning module 803 is configured to return the data node information to the client.
在实际应用中,上述数据管理装置可以为数据节点本身,也可以为管理各数据节点的装置,本发明实施例对上述数据管理装置的具体位置不加以限制。In an actual application, the data management device may be a data node itself or a device for managing each data node. The specific location of the data management device is not limited in the embodiment of the present invention.
在本发明的一种可选实施例中,所述第一匹配模块802,具体可以包括:In an optional embodiment of the present invention, the first matching module 802 may specifically include:
哈希计算子模块,配置为计算所述第一关键字信息的哈希值;及a hash calculation submodule configured to calculate a hash value of the first keyword information; and
哈希匹配子模块,配置为依据所述哈希值匹配所述元数据表,得到所述哈希值对应数据节点信息。The hash matching submodule is configured to match the metadata table according to the hash value to obtain the hash value corresponding data node information.
在本发明的另一种可选实施例中,所述数据节点还可以包括:In another optional embodiment of the present invention, the data node may further include:
第二返回模块,配置为当所述第一关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点将所述元数据表返回给所述客户端。 And a second returning module, configured to: when the data node information corresponding to the first keyword information is not the data node itself, the data node returns the metadata table to the client.
参照图9,示出了根据本发明一个实施例的一种数据管理装置的结构示意图,具体可以包括:协调者节点901和参与者节点902;9 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention, which may specifically include: a coordinator node 901 and a participant node 902;
其中,所述协调者节点901具体可以包括:配置为基于数据节点之间的通信对所述元数据表进行维护的第一维护模块911;The coordinator node 901 may specifically include: a first maintenance module 911 configured to perform maintenance on the metadata table based on communication between data nodes;
所述参与者节点902具体可以包括如下模块:The participant node 902 may specifically include the following modules:
第一接收模块921,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;The first receiving module 921 is configured to receive a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data;
第一匹配模块922,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;The first matching module 922 is configured to obtain the data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores the keyword corresponding Data node information, which is obtained based on communication maintenance between data nodes;
第一返回模块923,配置为将所述数据节点信息返回给所述客户端;及a first returning module 923, configured to return the data node information to the client; and
第二维护模块924,配置为基于数据节点之间的通信对所述元数据表进行维护的;The second maintenance module 924 is configured to perform maintenance on the metadata table based on communication between data nodes;
其中,所述第一维护模块911,具体可以包括:The first maintenance module 911 may specifically include:
准备发送子模块9111,配置为向所有执行事务T的参与者节点发送准备消息;Preparing to send a sub-module 9111 configured to send a preparation message to all participant nodes executing the transaction T;
正式提交发送子模块9112,配置为当从所有参与者节点获得的消息均为准备就绪消息时,向所有参与者节点发出正式提交消息;The formal submission sending sub-module 9112 is configured to issue a formal submission message to all participant nodes when the messages obtained from all the participant nodes are ready messages;
事务完成子模块9113,配置为在收到所有参与者节点反馈的完成消息后,完成事务;The transaction completion sub-module 9113 is configured to complete the transaction after receiving the completion message fed back by all the participant nodes;
回滚发送子模块9114,配置为当从所有参与者节点获得的消息中存在异常中止消息,或者,在超时之前无法获取所有参与者节点的响应消息时,所述协调者节点向所有参与者节点发出回滚消息;及Rollback sending sub-module 9114, configured to have an abort message in a message obtained from all participant nodes, or to obtain a response message of all participant nodes before timing out, the coordinator node to all participant nodes Issue a rollback message; and
事务取消子模块9115,配置为在接收到所有参与者节点反馈的回滚完成消息后,取消事务T;The transaction cancellation sub-module 9115 is configured to cancel the transaction T after receiving the rollback completion message fed back by all the participant nodes;
其中,所述第二维护模块924,具体可以包括:The second maintenance module 924 may specifically include:
准备响应子模块9241,配置为确定是否提交事务T,若是,则向所述协 调者节点返回准备就绪消息,否则,向所述协调者节点返回异常中止消息;Preparing a response sub-module 9241, configured to determine whether to submit a transaction T, and if so, to the association The caller node returns a ready message, otherwise, returns an abort message to the coordinator node;
事务执行子模块9242,配置为在接收到正式提交消息后,正式完成事务T,释放在整个事务T期间内占用的,并向协调者节点发送完成消息;及The transaction execution sub-module 9242 is configured to formally complete the transaction T after receiving the formal submission message, release the occupation during the entire transaction T, and send the completion message to the coordinator node;
回滚子模块9243,配置为各参与者节点在接收到所述回滚消息后,对事务T执行回滚操作,释放在整个T事务期间内占用的资源,并向所述协调者节点发送回滚完成消息。The rollback submodule 9243 is configured to perform, by each participant node, a rollback operation on the transaction T after receiving the rollback message, release the resources occupied during the entire T transaction, and send back to the coordinator node. Roll the completion message.
参照图10,示出了根据本发明一个实施例的一种数据管理装置的结构示意图,具体可以包括如下模块:Referring to FIG. 10, a schematic structural diagram of a data management apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
第一接收模块1001,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;The first receiving module 1001 is configured to receive a metadata routing request from the client, where the metadata routing request carries first keyword information of the data;
第一匹配模块1002,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;The first matching module 1002 is configured to obtain data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores a keyword corresponding Data node information, which is obtained based on communication maintenance between data nodes;
第一返回模块1003,配置为将所述数据节点信息返回给所述客户端;The first returning module 1003 is configured to return the data node information to the client;
第二接收模块1004,配置为接收来自客户端的读请求;其中,所述读请求中携带有数据的第二关键字信息;The second receiving module 1004 is configured to receive a read request from the client, where the read request carries the second keyword information of the data;
第二匹配模块1005,配置为依据所述读请求携带的第二关键字信息匹配自身的元数据表,得到所述第二关键字信息对应的数据节点信息;The second matching module 1005 is configured to match the metadata table of the second keyword information carried by the read request to obtain the data node information corresponding to the second keyword information;
判定模块1006,配置为依据所述第二关键字信息对应的数据节点信息,判定所述读请求对应数据是否在所述数据节点自身;The determining module 1006 is configured to determine, according to the data node information corresponding to the second keyword information, whether the read request corresponding data is in the data node itself;
查询模块1007,配置为当所述第二关键字信息对应的数据节点信息为所述数据节点自身时,所述数据节点依据所述读请求查询自身的数据引擎,并将查询得到的数据返回给所述客户端;The query module 1007 is configured to: when the data node information corresponding to the second keyword information is the data node itself, the data node queries its own data engine according to the read request, and returns the data obtained by the query to The client;
转发模块1008,配置为当所述第二关键字信息对应的数据节点信息非所述数据节点自身时,将所述读请求转发给第二关键字信息对应的数据节点信息所对应的第一数据节点;以及 The forwarding module 1008 is configured to: when the data node information corresponding to the second keyword information is not the data node itself, forward the read request to the first data corresponding to the data node information corresponding to the second keyword information Node;
第三返回模块1009,配置为接收所述第一数据节点返回的所述读请求对应的数据,并返回给所述客户端。The third returning module 1009 is configured to receive data corresponding to the read request returned by the first data node, and return the data to the client.
本发明还提供了一种分布式存储系统,其具体可以包括:客户端和前述的数据管理装置。The present invention also provides a distributed storage system, which may specifically include: a client and the foregoing data management device.
参照图11,示出了根据本发明一个实施例的一种分布式存储系统的结构示意图,具体可以包括:客户端1101和多个数据节点1102;FIG. 11 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention, which may specifically include: a client 1101 and a plurality of data nodes 1102;
其中,所述数据节点1102具体可以包括如下模块:The data node 1102 may specifically include the following modules:
第一接收模块1121,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;The first receiving module 1121 is configured to receive a metadata routing request from the client, where the metadata routing request carries the first keyword information of the data;
第一匹配模块1122,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The first matching module 1122 is configured to obtain data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores a keyword corresponding Data node information obtained based on communication maintenance between data nodes;
第一返回模块1123,配置为将所述数据节点信息返回给所述客户端。The first return module 1123 is configured to return the data node information to the client.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的数据路由方法、数据管理装置和分布式存储系统中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网平台上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some or all of the components of the data routing method, data management apparatus, and distributed storage system in accordance with embodiments of the present invention. Some or all of the features. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an internet platform, provided on a carrier signal, or provided in any other form.
例如,图12示出了可以实现根据本发明上述方法的计算设备,例如搜 索引擎服务器。该计算设备传统上包括处理器1210和以存储器1230形式的计算机程序产品或者计算机可读介质。存储器1230可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1230具有存储用于执行上述方法中的任何方法步骤的程序代码1251的存储空间1250。例如,存储程序代码的存储空间1250可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1251。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图13所示的便携式或者固定存储单元。该存储单元可以具有与图12的计算设备中的存储器1230类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括用于执行根据本发明的方法步骤的计算机可读代码1251’,即可以由诸如1210之类的处理器读取的代码,当这些代码由服务器运行时,导致该服务器执行上面所描述的方法中的各个步骤。For example, Figure 12 illustrates a computing device, such as a search, that can implement the above method in accordance with the present invention. Cable engine server. The computing device conventionally includes a processor 1210 and a computer program product or computer readable medium in the form of a memory 1230. The memory 1230 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. Memory 1230 has a storage space 1250 that stores program code 1251 for performing any of the method steps described above. For example, storage space 1250 storing program code may include various program codes 1251 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a portable or fixed storage unit such as that shown in FIG. The storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 1230 in the computing device of FIG. The program code can be compressed, for example, in an appropriate form. Typically, the storage unit comprises computer readable code 1251' for performing the steps of the method according to the invention, ie code that can be read by a processor such as 1210, which when executed by the server causes the server to execute Each step in the described method.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。"an embodiment," or "an embodiment," or "an embodiment," In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。 在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It is to be noted that the above-described embodiments are illustrative of the invention and are not intended to be limiting, and that the invention may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。 In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.

Claims (15)

  1. 一种数据路由方法,包括:A data routing method includes:
    数据节点接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;Receiving, by the data node, a metadata routing request from the client, where the metadata routing request carries first keyword information of the data;
    所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The data node obtains data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information; wherein the metadata table stores data node information corresponding to the keyword, The metadata table is maintained based on communication between data nodes;
    所述数据节点将所述数据节点信息返回给所述客户端。The data node returns the data node information to the client.
  2. 如权利要求1所述的方法,其中,所述方法还包括:The method of claim 1 wherein the method further comprises:
    当所述第一关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点将所述元数据表返回给所述客户端。When the data node information corresponding to the first keyword information is not the data node itself, the data node returns the metadata table to the client.
  3. 如权利要求1或2所述的方法,其中,所述数据节点包括:协调者节点和参与者节点;The method of claim 1 or 2, wherein the data node comprises: a coordinator node and a participant node;
    则所述方法还包括:所述数据节点基于数据节点之间的通信对所述元数据表进行维护;The method further includes: the data node maintaining the metadata table based on communication between the data nodes;
    其中,所述数据节点基于数据节点之间的通信对所述元数据表进行维护的步骤,包括:The step of maintaining, by the data node, the metadata table based on communication between data nodes, including:
    所述协调者节点向所有执行事务T的参与者节点发送准备消息;The coordinator node sends a preparation message to all participant nodes that execute the transaction T;
    各参与者节点确定是否提交事务T,若是,则向所述协调者节点返回准备就绪消息,否则,向所述协调者节点返回异常中止消息;Each participant node determines whether to submit a transaction T, and if so, returns a ready message to the coordinator node, otherwise returns an abnormal abort message to the coordinator node;
    当从所有参与者节点获得的消息均为准备就绪消息时,所述协调者节点向所有参与者节点发出正式提交消息;When the messages obtained from all participant nodes are ready messages, the coordinator node issues a formal commit message to all participant nodes;
    在接收到正式提交消息后,各参与者节点正式完成事务T,释放在整个事务T期间内占用的,并向协调者节点发送完成消息;After receiving the formal submission message, each participant node officially completes the transaction T, releases the occupation during the entire transaction T period, and sends a completion message to the coordinator node;
    所述协调者节点在收到所有参与者节点反馈的完成消息后,完成事务;The coordinator node completes the transaction after receiving the completion message fed back by all the participant nodes;
    当从所有参与者节点获得的消息中存在异常中止消息,或者,在超时之前无法获取所有参与者节点的响应消息时,所述协调者节点向所有参与者节点发出回滚消息; When there is an abnormal abort message in the message obtained from all the participant nodes, or when the response message of all the participant nodes cannot be acquired before the timeout, the coordinator node issues a rollback message to all the participant nodes;
    各参与者节点在接收到所述回滚消息后,对事务T执行回滚操作,释放在整个T事务期间内占用的资源,并向所述协调者节点发送回滚完成消息;After receiving the rollback message, each participant node performs a rollback operation on the transaction T, releases the resources occupied during the entire T transaction, and sends a rollback completion message to the coordinator node;
    协调者节点接收到所有参与者节点反馈的回滚完成消息后,取消事务T。The coordinator node cancels the transaction T after receiving the rollback completion message fed back by all the participant nodes.
  4. 如权利要求1或2或3所述的方法,其中,所述方法还包括:The method of claim 1 or 2 or 3, wherein the method further comprises:
    所述数据节点接收来自客户端的读请求;其中,所述读请求中携带有数据的第二关键字信息;Receiving, by the data node, a read request from a client, where the read request carries second keyword information of data;
    所述数据节点依据所述读请求携带的第二关键字信息匹配自身的元数据表,得到所述第二关键字信息对应的数据节点信息;The data node matches the metadata table of the second key information carried by the read request to obtain the data node information corresponding to the second keyword information;
    所述数据节点依据所述第二关键字信息对应的数据节点信息,判定所述读请求对应数据是否在所述数据节点自身;Determining, according to the data node information corresponding to the second keyword information, whether the read request corresponding data is in the data node itself;
    当所述第二关键字信息对应的数据节点信息为所述数据节点自身时,所述数据节点依据所述读请求查询自身的数据引擎,并将查询得到的数据返回给所述客户端;When the data node information corresponding to the second keyword information is the data node itself, the data node queries its own data engine according to the read request, and returns the data obtained by the query to the client;
    当所述第二关键字信息对应的数据节点信息非所述数据节点自身时,将所述读请求转发给第二关键字信息对应的数据节点信息所对应的第一数据节点;When the data node information corresponding to the second keyword information is not the data node itself, forwarding the read request to the first data node corresponding to the data node information corresponding to the second keyword information;
    接收所述第一数据节点返回的所述读请求对应的数据,并返回给所述客户端。Receiving data corresponding to the read request returned by the first data node, and returning the data to the client.
  5. 如权利要求1或2或3所述的方法,其中,所述数据节点依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息的步骤,包括:The method of claim 1 or 2 or 3, wherein the data node obtains data node information corresponding to the first keyword information according to the first keyword information matching its own metadata table, include:
    计算所述第一关键字信息的哈希值;Calculating a hash value of the first keyword information;
    依据所述哈希值匹配所述元数据表,得到与所述哈希值对应的数据节点的数据节点信息。Matching the metadata table according to the hash value, and obtaining data node information of the data node corresponding to the hash value.
  6. 如权利要求1或2或3所述的方法,其中,所述数据节点信息包括如下信息中的一项或多项:节点编号信息、节点属性信息和节点通信速率信息。The method of claim 1 or 2 or 3, wherein the data node information comprises one or more of the following information: node number information, node attribute information, and node communication rate information.
  7. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在 计算设备上运行时,导致所述计算设备执行根据权利要求1至6中的任一项所述的数据路由方法。A computer program comprising computer readable code when said computer readable code is When the computing device is running, the computing device is caused to perform the data routing method according to any one of claims 1 to 6.
  8. 一种计算机可读介质,其中存储了如权利要求7所述的计算机程序。A computer readable medium storing the computer program of claim 7.
  9. 一种数据管理装置,包括:A data management device comprising:
    第一接收模块,配置为接收来自客户端的元数据路由请求;其中,所述元数据路由请求中携带有数据的第一关键字信息;a first receiving module, configured to receive a metadata routing request from the client, where the metadata routing request carries first keyword information of the data;
    第一匹配模块,配置为依据所述第一关键字信息匹配自身的元数据表,得到所述第一关键字信息对应的数据节点信息;其中,所述元数据表存储有关键字对应的数据节点信息,所述元数据表为基于数据节点之间的通信维护得到;以及The first matching module is configured to obtain data node information corresponding to the first keyword information according to the first keyword information matching the metadata table of the first keyword information, where the metadata table stores data corresponding to the keyword Node information, which is obtained based on communication maintenance between data nodes;
    第一返回模块,配置为将所述数据节点信息返回给所述客户端。a first return module configured to return the data node information to the client.
  10. 如权利要求9所述的数据管理装置,其中,所述数据管理装置还包括:The data management device of claim 9, wherein the data management device further comprises:
    第二返回模块,配置为当所述第一关键字信息对应的数据节点信息非所述数据节点自身时,所述数据节点将所述元数据表返回给所述客户端。And a second returning module, configured to: when the data node information corresponding to the first keyword information is not the data node itself, the data node returns the metadata table to the client.
  11. 如权利要求9或10所述的数据管理装置,其中,所述数据管理装置包括:协调者节点和参与者节点;The data management device according to claim 9 or 10, wherein said data management device comprises: a coordinator node and a participant node;
    则所述协调者节点包括:配置为基于数据节点之间的通信对所述元数据表进行维护的第一维护模块;The coordinator node includes: a first maintenance module configured to maintain the metadata table based on communication between the data nodes;
    所述参与者节点包括:前述第一接收模块、第一匹配模块、第一返回模块、以及配置为基于数据节点之间的通信对所述元数据表进行维护的第二维护模块;The participant node includes: the foregoing first receiving module, a first matching module, a first returning module, and a second maintenance module configured to perform maintenance on the metadata table based on communication between data nodes;
    其中,所述第一维护模块,包括:The first maintenance module includes:
    准备发送子模块,配置为向所有执行事务T的参与者节点发送准备消息;Preparing to send a submodule configured to send a preparation message to all participant nodes that execute transaction T;
    正式提交发送子模块,配置为当从所有参与者节点获得的消息均为准备就绪消息时,向所有参与者节点发出正式提交消息; Formally submit a sending sub-module configured to issue a formal commit message to all participant nodes when the messages obtained from all participant nodes are ready messages;
    事务完成子模块,配置为在收到所有参与者节点反馈的完成消息后,完成事务;The transaction completion sub-module is configured to complete the transaction after receiving the completion message fed back by all the participant nodes;
    回滚发送子模块,配置为当从所有参与者节点获得的消息中存在异常中止消息,或者,在超时之前无法获取所有参与者节点的响应消息时,所述协调者节点向所有参与者节点发出回滚消息;及Rollback the sending submodule configured to have an abort message in the message obtained from all the participant nodes, or to send a response message of all the participant nodes before the timeout, the coordinator node issues to all the participant nodes Rollback message; and
    事务取消子模块,配置为在接收到所有参与者节点反馈的回滚完成消息后,取消事务T;The transaction canceling submodule is configured to cancel the transaction T after receiving the rollback completion message fed back by all the participant nodes;
    其中,所述第二维护模块,包括:The second maintenance module includes:
    准备响应子模块,配置为确定是否提交事务T,若是,则向所述协调者节点返回准备就绪消息,否则,向所述协调者节点返回异常中止消息;Preparing a response sub-module configured to determine whether to commit the transaction T, and if so, returning a ready message to the coordinator node; otherwise, returning an abort message to the coordinator node;
    事务执行子模块,配置为在接收到正式提交消息后,正式完成事务T,释放在整个事务T期间内占用的,并向协调者节点发送完成消息;及a transaction execution submodule configured to formally complete the transaction T after receiving the formal commit message, release the occupation during the entire transaction T, and send a completion message to the coordinator node;
    回滚子模块,配置为各参与者节点在接收到所述回滚消息后,对事务T执行回滚操作,释放在整个T事务期间内占用的资源,并向所述协调者节点发送回滚完成消息。Rollback sub-module, configured, after receiving the rollback message, each participant node performs a rollback operation on the transaction T, releases the resources occupied during the entire T transaction, and sends a rollback to the coordinator node. Complete the message.
  12. 如权利要求9或10或11所述的数据管理装置,其中,所述数据管理装置还包括:The data management device according to claim 9 or 10 or 11, wherein the data management device further comprises:
    第二接收模块,配置为接收来自客户端的读请求;其中,所述读请求中携带有数据的第二关键字信息;a second receiving module, configured to receive a read request from the client, where the read request carries second keyword information of the data;
    第二匹配模块,配置为依据所述读请求携带的第二关键字信息匹配自身的元数据表,得到所述第二关键字信息对应的数据节点信息;The second matching module is configured to match the metadata table of the second keyword information that is carried by the read request to obtain the data node information corresponding to the second keyword information;
    判定模块,配置为依据所述第二关键字信息对应的数据节点信息,判定所述读请求对应数据是否在所述数据节点自身;The determining module is configured to determine, according to the data node information corresponding to the second keyword information, whether the read request corresponding data is in the data node itself;
    查询模块,配置为当所述第二关键字信息对应的数据节点信息为所述数据节点自身时,所述数据节点依据所述读请求查询自身的数据引擎,并将查询得到的数据返回给所述客户端;a query module, configured to: when the data node information corresponding to the second keyword information is the data node itself, the data node queries its own data engine according to the read request, and returns the data obtained by the query to the Client
    转发模块,配置为当所述第二关键字信息对应的数据节点信息非所述数据节点自身时,将所述读请求转发给第二关键字信息对应的数据节点信息所 对应的第一数据节点;a forwarding module, configured to forward the read request to a data node information corresponding to the second keyword information when the data node information corresponding to the second keyword information is not the data node itself Corresponding first data node;
    第三返回模块,配置为接收所述第一数据节点返回的所述读请求对应的数据,并返回给所述客户端。The third returning module is configured to receive data corresponding to the read request returned by the first data node, and return the data to the client.
  13. 如权利要求9或10或11所述的数据管理装置,其中,所述第一匹配模块,包括:The data management device of claim 9 or 10 or 11, wherein the first matching module comprises:
    哈希计算子模块,配置为计算所述第一关键字信息的哈希值;及a hash calculation submodule configured to calculate a hash value of the first keyword information; and
    哈希匹配子模块,配置为依据所述哈希值匹配所述元数据表,得到与所述哈希值对应的数据节点的数据节点信息。The hash matching submodule is configured to match the metadata table according to the hash value to obtain data node information of the data node corresponding to the hash value.
  14. 如权利要求9或10或11所述的数据管理装置,其中,所述数据节点信息包括如下信息中的一项或多项:节点编号信息、节点属性信息和节点通信速率信息。The data management device according to claim 9 or 10 or 11, wherein said data node information comprises one or more of the following: node number information, node attribute information, and node communication rate information.
  15. 一种分布式存储系统,包括:客户端和前述权利要求9至14中任一所述的数据管理装置。 A distributed storage system comprising: a client and the data management device according to any one of claims 9 to 14.
PCT/CN2015/095507 2014-12-27 2015-11-25 Data routing method, data management device and distributed storage system WO2016101759A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410832192.2A CN104580428B (en) 2014-12-27 2014-12-27 A kind of data routing method, data administrator and distributed memory system
CN201410832192.2 2014-12-27

Publications (1)

Publication Number Publication Date
WO2016101759A1 true WO2016101759A1 (en) 2016-06-30

Family

ID=53095585

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095507 WO2016101759A1 (en) 2014-12-27 2015-11-25 Data routing method, data management device and distributed storage system

Country Status (2)

Country Link
CN (1) CN104580428B (en)
WO (1) WO2016101759A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020037625A1 (en) * 2018-08-23 2020-02-27 袁振南 Distributed storage system and data read-write method therefor, and storage terminal and storage medium
CN109783204A (en) * 2018-12-28 2019-05-21 咪咕文化科技有限公司 A kind of distributed transaction processing method, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984280A (en) * 2012-12-18 2013-03-20 北京工业大学 Data backup system and method for social cloud storage network application
CN103019960A (en) * 2012-12-03 2013-04-03 华为技术有限公司 Distributed cache method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209007A1 (en) * 2007-02-27 2008-08-28 Tekelec Methods, systems, and computer program products for accessing data associated with a plurality of similarly structured distributed databases
CN100576809C (en) * 2007-08-21 2009-12-30 北京航空航天大学 Access in the large scale dynamic heterogeneous mixed wireless self-organizing network and route computing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019960A (en) * 2012-12-03 2013-04-03 华为技术有限公司 Distributed cache method and system
CN102984280A (en) * 2012-12-18 2013-03-20 北京工业大学 Data backup system and method for social cloud storage network application

Also Published As

Publication number Publication date
CN104580428A (en) 2015-04-29
CN104580428B (en) 2018-09-04

Similar Documents

Publication Publication Date Title
US9183271B2 (en) Big-fast data connector between in-memory database system and data warehouse system
CN105516284B (en) A kind of method and apparatus of Cluster Database distributed storage
JP5686034B2 (en) Cluster system, synchronization control method, server device, and synchronization control program
KR20160147909A (en) System and method for supporting common transaction identifier (xid) optimization and transaction affinity based on resource manager (rm) instance awareness in a transactional environment
US9952940B2 (en) Method of operating a shared nothing cluster system
JP6700308B2 (en) Data copy method and device
WO2020134615A1 (en) Cross-chain evidence preservation method and access method, apparatus, and electronic device
US20110016349A1 (en) Replication in a network environment
US20160034191A1 (en) Grid oriented distributed parallel computing platform
WO2019001017A1 (en) Inter-cluster data migration method and system, server, and computer storage medium
EP3188051B1 (en) Systems and methods for search template generation
CN105574010B (en) Data query method and device
TW201727517A (en) Data storage and service processing method and device
WO2016101759A1 (en) Data routing method, data management device and distributed storage system
WO2017124933A1 (en) Information processing method, device and system
US10733176B2 (en) Detecting phantom items in distributed replicated database
US20200249876A1 (en) System and method for data storage management
US8862544B2 (en) Grid based replication
US7933962B1 (en) Reducing reliance on a central data store while maintaining idempotency in a multi-client, multi-server environment
JP6475852B2 (en) Method, apparatus and system for processing service data
US9684668B1 (en) Systems and methods for performing lookups on distributed deduplicated data systems
US20170161352A1 (en) Scalable snapshot isolation on non-transactional nosql
WO2016101662A1 (en) Data processing method and relevant server
US10127270B1 (en) Transaction processing using a key-value store
CN110543448A (en) data synchronization method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15871828

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15871828

Country of ref document: EP

Kind code of ref document: A1