CN103761167B - A kind of method and apparatus for realizing data center backup - Google Patents

A kind of method and apparatus for realizing data center backup Download PDF

Info

Publication number
CN103761167B
CN103761167B CN201410032550.1A CN201410032550A CN103761167B CN 103761167 B CN103761167 B CN 103761167B CN 201410032550 A CN201410032550 A CN 201410032550A CN 103761167 B CN103761167 B CN 103761167B
Authority
CN
China
Prior art keywords
data
range
hash values
value
data center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410032550.1A
Other languages
Chinese (zh)
Other versions
CN103761167A (en
Inventor
刘璧怡
邓强
吴楠
邓鹏飞
宗栋瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410032550.1A priority Critical patent/CN103761167B/en
Publication of CN103761167A publication Critical patent/CN103761167A/en
Application granted granted Critical
Publication of CN103761167B publication Critical patent/CN103761167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention proposes a kind of method and apparatus for realizing data center backup, including:According to data block place table name to be backed up and row Praenomen, memory range of the data block in target data center is determined;It is determined that memory range in choose a back end data storage block.When the present invention solves Hbase across data center data backups, the data block of same row Praenomen stores scattered problem so that the data block storage for backuping to the same row Praenomen at target data center is more concentrated, so as to improve reading speed.

Description

一种实现数据中心备份的方法和装置A method and device for realizing data center backup

技术领域technical field

本发明涉及大数据领域,尤其涉及一种基于Hbase的数据中心的备份方法和装置。The invention relates to the field of big data, in particular to a backup method and device for an Hbase-based data center.

背景技术Background technique

Hbase的数据存储通常是基于Hadoop的分布式文件存储(HDFS,HadoopDistributed File System),HDFS在原数据中心中进行数据存储时通常要进行备份,默认备份三份,其中,两份备份在归属于相同机架的两个不同的数据节点中,另外一份备份在归属于与上述机架不同的其他机架的一个数据节点中。同时,为了保证数据中心出现故障时,仍然能够正常工作,需要对数据中心进行备份。Hbase's data storage is usually based on Hadoop's distributed file storage (HDFS, Hadoop Distributed File System). HDFS usually needs to perform backups when storing data in the original data center. By default, there are three backups, of which two backups belong to the same machine. Among the two different data nodes of the rack, the other copy is in a data node belonging to another rack different from the above rack. At the same time, in order to ensure that the data center can still work normally when a failure occurs, the data center needs to be backed up.

现有的实现数据中心备份的方法包括:Existing approaches to data center backup include:

获取原数据中心中待备份的数据块;在目标数据中心中随机选择1个数据节点对数据块在目标数据中心中备份,然后根据现有的备份方法选择另外两个数据节点进行备份。Obtain the data blocks to be backed up in the original data center; randomly select a data node in the target data center to back up the data blocks in the target data center, and then select the other two data nodes for backup according to the existing backup method.

上述数据存储方法由于在进行数据中心备份时,目标数据中心的1个数据节点是随机选择的,而Hbase的数据存储是基于列族数据进行存储的,即在原数据中心中,将表格中属于同一列族数据的数据块集中存储在同一个数据节点或邻近几个数据节点中,这样当在进行数据读取时,需要根据读取的数据所在列族名查找该列族名所在的所有数据节点,查找到的数据节点有可能分布在目标数据中心的所有数据节点中。使得采用上述方法进行Hbase跨数据中心数据备份时,不能充分利用列族存储的特点,造成了目标数据中心中,同一列族名的数据块存储分散、不连续等问题,从而导致读取速度较慢。In the above data storage method, one data node in the target data center is randomly selected when performing data center backup, while Hbase data storage is based on column family data, that is, in the original data center, the table belongs to the same The data blocks of column family data are stored centrally in the same data node or several adjacent data nodes, so that when reading data, it is necessary to find all the data nodes where the column family name is located according to the column family name of the read data , the found data nodes may be distributed among all data nodes in the target data center. As a result, when the above method is used for Hbase cross-data center data backup, the characteristics of column family storage cannot be fully utilized, resulting in problems such as scattered and discontinuous storage of data blocks with the same column family name in the target data center, resulting in slower reading speed. slow.

发明内容Contents of the invention

为了解决上述技术问题,本发明提出了一种数据中心的备份方法和装置,能够充分利用列族存储的特点,使得备份到目的数据中心的同一列族名的数据块存储更加集中,从而提高读取速度。In order to solve the above-mentioned technical problems, the present invention proposes a data center backup method and device, which can make full use of the characteristics of column family storage, so that the data block storage of the same column family name backed up to the destination data center is more concentrated, thereby improving readability. Take the speed.

为了达到上述目的,本发明提出了一种实现数据中心备份的方法,包括:In order to achieve the above object, the present invention proposes a method for realizing data center backup, including:

根据待备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围;Determine the storage range of the data block in the target data center according to the table name and column family name of the data block to be backed up;

在确定的存储范围中选取一个数据节点存储数据块。Select a data node in the determined storage range to store the data block.

优选地,所述根据需要进行备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围包括:Preferably, determining the storage range of the data block in the target data center includes:

根据所述表名确定所述数据块存储的数据节点所在机架的范围;Determine the range of the rack where the data node storing the data block is located according to the table name;

根据所述列族名确定所述数据节点的物理地址范围;determining the physical address range of the data node according to the column family name;

所述从存储范围中选取其中一个数据节点存储数据块包括:The selecting one of the data nodes from the storage range to store the data block includes:

从所述机架的范围中选择一个机架,从所述物理地址范围中选择一个物理地址。A rack is selected from the range of racks, and a physical address is selected from the range of physical addresses.

优选地,所述根据所述表名确定所述数据块存储的数据节点所在机架的范围包括:Preferably, the determining the range of the rack where the data node storing the data block is located according to the table name includes:

计算所述表名的哈希值,并分别计算目标数据中心中所有机架的哈希值;Calculate the hash value of the table name, and calculate the hash values of all the racks in the target data center respectively;

确定所述机架的哈希值的范围为:所述机架的哈希值与所述表名的哈希值之间的差值的绝对值小于或等于目标数据中心中所有机架的哈希值的最大值的预设比例;The range of determining the hash value of the rack is: the absolute value of the difference between the hash value of the rack and the hash value of the table name is less than or equal to the hash values of all racks in the target data center a preset ratio of the maximum value of the Hive;

其中,原数据中心中所有表名的哈希值的最大值与目标数据中心中所有机架的哈希值的最大值相等。Wherein, the maximum value of hash values of all table names in the original data center is equal to the maximum value of hash values of all racks in the target data center.

优选地,所述从存储范围中选取其中一个数据节点存储数据块包括:Preferably, the selecting one of the data nodes from the storage range to store the data block includes:

从所述机架的范围中随机选择一个机架,或者选择与所述表名的哈希值的差值的绝对值最小的哈希值对应的机架。A rack is randomly selected from the range of the racks, or the rack corresponding to the hash value with the smallest absolute value of the difference between the hash values of the table names is selected.

优选地,所述根据所述列族名确定所述数据节点的物理地址范围包括:Preferably, the determining the physical address range of the data node according to the column family name includes:

计算所述列族名的哈希值,并分别计算所选择的机架中所有数据节点的物理地址的哈希值;Calculate the hash value of the column family name, and calculate the hash values of the physical addresses of all data nodes in the selected rack;

确定所述物理地址的哈希值的范围为:所述物理地址的哈希值与所述列族名的哈希值之间的差值的绝对值小于或等于所选择的机架中所有数据节点的物理地址的哈希值的最大值的预设比例;其中,所述表名对应的表中所有列族名的哈希值的最大值与所选择的机架中所有数据节点的物理地址的哈希值的最大值相等。The range of determining the hash value of the physical address is: the absolute value of the difference between the hash value of the physical address and the hash value of the column family name is less than or equal to all data in the selected rack The preset ratio of the maximum value of the hash value of the physical address of the node; wherein, the maximum value of the hash value of all column family names in the table corresponding to the table name is the same as the physical address of all data nodes in the selected rack The maximum value of the hash value is equal.

优选地,所述从存储范围中选取其中一个数据节点存储数据块包括:Preferably, the selecting one of the data nodes from the storage range to store the data block includes:

从所述物理地址范围中随机选择一个物理地址对应的数据节点,或者选择与所述列族名的哈希值的差值的绝对值最小的哈希值对应的物理地址对应的数据节点。Randomly select a data node corresponding to a physical address from the physical address range, or select a data node corresponding to a physical address corresponding to the hash value whose absolute value of the difference between the hash values of the column family names is the smallest.

优选地,所述最大值为2π。Preferably, the maximum value is 2π.

本发明还提出了一种实现数据中心备份的装置,至少包括:The present invention also proposes a device for realizing data center backup, which at least includes:

确定模块,用于根据待备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围;A determining module, configured to determine the storage range of the data block in the target data center according to the table name and column family name of the data block to be backed up;

选择模块,用于在确定的存储范围中选取一个数据节点存储数据块。The selection module is used for selecting a data node in a determined storage range to store data blocks.

优选地,所述确定模块,具体用于:Preferably, the determination module is specifically used for:

根据所述表名确定所述数据块存储的数据节点所在机架的范围;根据所述列族名确定所述数据节点的物理地址范围;Determine the range of the rack where the data node stored by the data block is located according to the table name; determine the physical address range of the data node according to the column family name;

所述选择模块,具体用于:The selection module is specifically used for:

从所述机架的范围中选择一个机架,从所述物理地址范围中选择一个物理地址。A rack is selected from the range of racks, and a physical address is selected from the range of physical addresses.

优选地,所述确定模块,具体用于:Preferably, the determination module is specifically used for:

计算所述表名的哈希值,并分别计算目标数据中心中所有机架的哈希值;Calculate the hash value of the table name, and calculate the hash values of all the racks in the target data center respectively;

确定所述机架的哈希值的范围为:所述机架的哈希值与所述表名的哈希值之间的差值的绝对值小于或等于目标数据中心中所有机架的哈希值的最大值的预设比例;The range of determining the hash value of the rack is: the absolute value of the difference between the hash value of the rack and the hash value of the table name is less than or equal to the hash values of all racks in the target data center a preset ratio of the maximum value of the Hive;

其中,原数据中心中所有表名的哈希值的最大值与目标数据中心中所有机架的哈希值的最大值相等。Wherein, the maximum value of hash values of all table names in the original data center is equal to the maximum value of hash values of all racks in the target data center.

优选地,所述确定模块,具体用于:Preferably, the determination module is specifically used for:

计算所述列族名的哈希值,并分别计算所选择的机架中所有数据节点的物理地址的哈希值;Calculate the hash value of the column family name, and calculate the hash values of the physical addresses of all data nodes in the selected rack;

确定所述物理地址的哈希值的范围为:所述物理地址的哈希值与所述列族名的哈希值之间的差值的绝对值小于或等于所选择的机架中所有数据节点的物理地址的哈希值的最大值的预设比例;其中,所述表名对应的表中所有列族名的哈希值的最大值与所选择的机架中所有数据节点的物理地址的哈希值的最大值相等。The range of determining the hash value of the physical address is: the absolute value of the difference between the hash value of the physical address and the hash value of the column family name is less than or equal to all data in the selected rack The preset ratio of the maximum value of the hash value of the physical address of the node; wherein, the maximum value of the hash value of all column family names in the table corresponding to the table name is the same as the physical address of all data nodes in the selected rack The maximum value of the hash value is equal.

与现有技术相比,本发明包括:根据待备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围;在确定的存储范围中选取一个数据节点存储数据块。通过本发明的技术方案,充分利用了列族存储的特点,解决了Hbase跨数据中心数据备份时同一列族名的数据块存储分散的问题,使得备份到目标数据中心的同一列族名的数据块存储更加集中,从而提高了读取速度。Compared with the prior art, the present invention includes: determining the storage range of the data block in the target data center according to the table name and column family name of the data block to be backed up; selecting a data node in the determined storage range to store the data block . Through the technical solution of the present invention, the characteristics of column family storage are fully utilized, and the problem of scattered storage of data blocks with the same column family name when Hbase cross-data center data backup is solved, so that the data of the same column family name backed up to the target data center Block storage is more centralized, which increases read speed.

附图说明Description of drawings

下面对本发明实施例中的附图进行说明,实施例中的附图是用于对本发明的进一步理解,与说明书一起用于解释本发明,并不构成对本发明保护范围的限制。The accompanying drawings in the embodiments of the present invention are described below. The accompanying drawings in the embodiments are used for further understanding of the present invention and are used together with the description to explain the present invention, and do not constitute a limitation to the protection scope of the present invention.

图1为本发明的实现数据中心备份的方法的流程图;Fig. 1 is the flowchart of the method for realizing data center backup of the present invention;

图2为本发明的实现数据中心备份的方法的实施例的流程图;Fig. 2 is the flow chart of the embodiment of the method for realizing data center backup of the present invention;

图3为本发明的实现数据中心备份的装置的结构组成示意图。FIG. 3 is a schematic diagram of the structure and composition of the device for implementing data center backup according to the present invention.

具体实施方式detailed description

为了便于本领域技术人员的理解,下面结合附图对本发明作进一步的描述,并不能用来限制本发明的保护范围。In order to facilitate the understanding of those skilled in the art, the present invention will be further described below in conjunction with the accompanying drawings, which cannot be used to limit the protection scope of the present invention.

参见图1,本发明提出了一种实现数据中心备份的方法,包括:Referring to Fig. 1, the present invention proposes a kind of method that realizes data center backup, comprises:

步骤100、根据待备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围。Step 100: Determine the storage range of the data block in the target data center according to the table name and column family name where the data block to be backed up is located.

本步骤中,可以根据表名确定数据块存储的数据节点所在机架的范围;根据列族名确定数据节点的物理地址范围。其中,In this step, the range of the rack where the data node storing the data block is located can be determined according to the table name; the physical address range of the data node can be determined according to the column family name. in,

根据表名确定数据块存储的数据节点所在机架的范围包括:Determine the range of the rack where the data node of the data block storage is located according to the table name includes:

计算表名的哈希值,并分别计算目标数据中心中所有机架的哈希值;确定机架的哈希值的范围为:机架的哈希值与表名的哈希值之间的差值的绝对值小于或等于目标数据中心中所有机架的哈希值的最大值的预设比例;其中,原数据中心中所有表名的哈希值的最大值与目标数据中心中所有机架的哈希值的最大值相等。Calculate the hash value of the table name, and calculate the hash value of all racks in the target data center respectively; determine the range of the hash value of the rack as: between the hash value of the rack and the hash value of the table name The absolute value of the difference is less than or equal to the preset ratio of the maximum value of the hash values of all racks in the target data center; where the maximum value of the hash values of all table names in the original data center is equal to The maximum value of the hash value of the rack is equal.

根据列族名确定数据节点的物理地址范围包括:Determining the physical address range of the data node according to the column family name includes:

计算列族名的哈希值,并分别计算所选择的机架中所有数据节点的物理地址的哈希值;确定物理地址的哈希值的范围为:物理地址的哈希值与列族名的哈希值之间的差值的绝对值小于或等于所选择的机架中所有数据节点的物理地址的哈希值的最大值的预设比例;其中,表名对应的表中所有列族名的哈希值的最大值与所选择的机架中所有数据节点的物理地址的哈希值的最大值相等。Calculate the hash value of the column family name, and calculate the hash value of the physical addresses of all data nodes in the selected rack respectively; determine the range of the hash value of the physical address: the hash value of the physical address and the column family name The absolute value of the difference between the hash values is less than or equal to the preset ratio of the maximum value of the hash values of the physical addresses of all data nodes in the selected rack; where the table name corresponds to all column families in the table The maximum value of the hash value of the name is equal to the maximum value of the hash value of the physical addresses of all data nodes in the selected rack.

其中,最大值可以但不限于是2π,所有表名的哈希值的最大值和表名对应的表中所有列族名的哈希值的最大值可以相等,也可以不相等。Wherein, the maximum value may be but not limited to 2π, and the maximum value of the hash values of all table names and the maximum value of the hash values of all column family names in the table corresponding to the table name may or may not be equal.

步骤101、在确定的存储范围中选取一个数据节点存储数据块。Step 101, select a data node in the determined storage range to store the data block.

本步骤中,可以从机架的范围中随机选择一个机架,从物理地址范围中随机选择一个物理地址。所选择的机架和物理地址对应的数据节点即为数据块的存储位置,数据块存储后,再根据现有的备份方法选择两外两个数据节点进行备份。In this step, a rack may be randomly selected from a range of racks, and a physical address may be randomly selected from a range of physical addresses. The data node corresponding to the selected rack and physical address is the storage location of the data block. After the data block is stored, two other data nodes are selected for backup according to the existing backup method.

本步骤中,也可以从机架的范围中选择与表名的哈希值的差值的绝对值最小的哈希值对应的机架,从物理地址范围中选择与列族名的哈希值的差值的绝对值最小的哈希值对应的物理地址对应的数据节点。In this step, you can also select the rack corresponding to the hash value with the smallest absolute value of the hash value difference of the table name from the rack range, and select the hash value corresponding to the column family name from the physical address range The data node corresponding to the physical address corresponding to the hash value whose absolute value of the difference is the smallest.

本发明将同一表名和同一列族名的数据块存储在相同的存储范围内,使得同一列族名的数据库存储较为集中,从而提高了读取速度。The invention stores the data blocks of the same table name and the same column family name in the same storage range, so that the database storage of the same column family name is relatively concentrated, thereby improving the reading speed.

下面通过一个具体实施例说明如何选择数据节点进行备份。How to select a data node for backup is described below through a specific embodiment.

步骤200、获取待备份的数据块所在表名和所在列族名。Step 200, obtaining the name of the table where the data block to be backed up is located and the name of the column family where it is located.

本步骤中,如何获取待备份的数据库所在表名和所在列族名属于现有技术,不能用来限制本发明的保护范围。例如,数据块更新时,机架能够记录更新数据块的表名和列族名,在进行备份时,就可以直接获取待备份的数据块的表名和列族名。In this step, how to obtain the table name and column family name of the database to be backed up belongs to the prior art and cannot be used to limit the protection scope of the present invention. For example, when a data block is updated, the rack can record the table name and column family name of the updated data block, and when performing backup, it can directly obtain the table name and column family name of the data block to be backed up.

步骤201、建立球坐标系,将获得的数据块所在表名和所在列族名映射到球坐标系中固定半径的球面上。Step 201, establish a spherical coordinate system, and map the obtained table name and column family name of the data block to a spherical surface with a fixed radius in the spherical coordinate system.

本步骤中,球面上的一点用两个坐标值表示:球面上的点与坐标中心的连线和Z轴的夹角θ、球面上的点与坐标中心的连线在XoY平面上的投影和X轴的夹角 In this step, a point on the sphere is represented by two coordinate values: the angle θ between the line connecting the point on the sphere and the coordinate center and the Z axis, the projection of the line connecting the point on the sphere and the coordinate center on the XoY plane and Angle of X axis

本步骤中,采用球面上的点来表示数据块在原数据中心中的位置,即取所在表名为取所在列族名为θ。In this step, the point on the spherical surface is used to represent the position of the data block in the original data center, that is, the name of the table where it is located is The name of the column family is θ.

其中,可以是表名的哈希值,θ可以是列族名的哈希值。如何计算各哈希值可以采用现有方法来实现,比如,采用现有的哈希算法进行计算,其具体实现并不用于限定本发明的保护范围,这里不再赘述。in, Can be the hash value of the table name, and θ can be the hash value of the column family name. How to calculate each hash value can be realized by using an existing method, for example, using an existing hash algorithm for calculation, and its specific implementation is not used to limit the scope of protection of the present invention, and will not be repeated here.

步骤202、以映射到球面上的点到坐标中心的连线为轴,取过坐标中心且与轴成预设角度的直线,绕轴旋转一周截得的球面为该点对应的数据块的存储区域。Step 202: Take the line connecting the point mapped to the spherical surface to the coordinate center as the axis, take a straight line that passes through the coordinate center and forms a preset angle with the axis, and rotates around the axis for a circle to obtain a sphere that is stored as the data block corresponding to the point area.

本步骤中,假设以数据在球面上的点P与坐标中心O的连线OP为轴,取过O点且与OP的夹角为γ的直线绕OP旋转一周得到的锥形区域截得的球面的面积为相应(表名的哈希值,列族名的哈希值)所对应的存储区域,将锥形区域在YoZ平面上投影,得到θ’的范围为θ'∈[θ-γ,θ+γ];将锥形区域在XoY平面上投影,得到公式(1):In this step, assuming that the line OP connecting the point P on the spherical surface of the data and the coordinate center O is taken as the axis, the tapered area obtained by taking a straight line passing through point O and having an angle γ with OP and rotating around OP for one circle is intercepted The area of the sphere is the corresponding storage area (the hash value of the table name, the hash value of the column family name), project the cone area on the YoZ plane, and the range of θ' is θ'∈[θ-γ ,θ+γ]; project the cone-shaped area on the XoY plane, and get the formula (1):

得到所以得到的范围如公式(2)所示:get so get The range of is shown in formula (2):

其中,b为过O点且与OP的夹角为γ的直线与球面的交点到OP的距离,r为OP的长度,γ’为取过O点且与OP的夹角为γ的直线在XoY平面上投影后与OP在XoY平面上投影的夹角。Among them, b is the distance from the intersection point of the straight line passing through O and forming an angle γ with OP to OP, r is the length of OP, and γ' is the line passing O and forming an angle γ with OP in The angle between the projection on the XoY plane and the projection of the OP on the XoY plane.

锥形区域截得的球面的面积与球面的总面积的比值如公式(3)所示:The ratio of the area of the sphere intercepted by the conical area to the total area of the sphere is shown in formula (3):

得到γ’的取值如公式(4)所示:The value of γ' is obtained as shown in formula (4):

满足θ’的范围和的范围的区域即为数据块的存储区域。Satisfy the range of θ' and The range of the area is the storage area of the data block.

步骤203、将目标中心中所有数据节点对应的机架和数据节点的物理地址映射到球坐标系中固定半径的球面上。Step 203 , mapping the physical addresses of racks and data nodes corresponding to all data nodes in the target center to a spherical surface with a fixed radius in the spherical coordinate system.

本步骤中,采用球面上的点来表示数据节点在目标数据中心中的位置,即取机架为取物理地址为θ。In this step, points on the spherical surface are used to represent the position of the data node in the target data center, that is, the rack is Take the physical address as θ.

其中,可以是机架的哈希值,θ可以是物理地址的哈希值。如何计算各哈希值属于现有技术,不能用来限制本发明的保护范围。in, can be the hash value of the rack, and θ can be the hash value of the physical address. How to calculate each hash value belongs to the prior art and cannot be used to limit the protection scope of the present invention.

步骤204、从数据块的存储区域中选择其中一个点对应的数据节点作为数据块的存储位置。Step 204: Select a data node corresponding to one of the points from the storage area of the data block as the storage location of the data block.

本步骤中,可以选择距离数据块对应的点最近的点对应的数据节点作为数据块的存储位置。In this step, the data node corresponding to the point closest to the point corresponding to the data block may be selected as the storage location of the data block.

参见图3,本发明还提出了一种实现数据中心备份的装置,至少包括:Referring to Fig. 3, the present invention also proposes a device for realizing data center backup, including at least:

确定模块,用于根据待备份的数据块所在表名和列族名,确定该数据块在目标数据中心中的存储范围;A determining module, configured to determine the storage range of the data block in the target data center according to the table name and column family name of the data block to be backed up;

选择模块,用于在确定的存储范围中选取一个数据节点存储数据块。The selection module is used for selecting a data node in a determined storage range to store data blocks.

本发明的装置中,确定模块,具体用于:In the device of the present invention, the determination module is specifically used for:

根据表名确定数据块存储的数据节点所在机架的范围;根据列族名确定数据节点的物理地址范围;Determine the range of the rack where the data node stored in the data block is located according to the table name; determine the physical address range of the data node according to the column family name;

选择模块,具体用于:Select modules, specifically for:

从机架的范围中选择一个机架,从物理地址范围中选择一个物理地址。Select a rack from a range of racks and a physical address from a range of physical addresses.

本发明的装置中,确定模块,具体用于:In the device of the present invention, the determination module is specifically used for:

计算所述表名的哈希值,并分别计算目标数据中心中所有机架的哈希值;Calculate the hash value of the table name, and calculate the hash values of all the racks in the target data center respectively;

确定机架的哈希值的范围为:机架的哈希值与表名的哈希值之间的差值的绝对值小于或等于目标数据中心中所有机架的哈希值的最大值的预设比例;Determine the range of the hash value of the rack: the absolute value of the difference between the hash value of the rack and the hash value of the table name is less than or equal to the maximum value of the hash values of all racks in the target data center preset ratio;

其中,原数据中心中所有表名的哈希值的最大值与目标数据中心中所有机架的哈希值的最大值相等。Wherein, the maximum value of hash values of all table names in the original data center is equal to the maximum value of hash values of all racks in the target data center.

本发明的备份装置中,确定模块,具体用于:In the backup device of the present invention, the determination module is specifically used for:

计算所述列族名的哈希值,并分别计算所选择的机架中所有数据节点的物理地址的哈希值;Calculate the hash value of the column family name, and calculate the hash values of the physical addresses of all data nodes in the selected rack;

确定物理地址的哈希值的范围为:物理地址的哈希值与列族名的哈希值之间的差值的绝对值小于或等于所选择的机架中所有数据节点的物理地址的哈希值的最大值的预设比例;其中,表名对应的表中所有列族名的哈希值的最大值与所选择的机架中所有数据节点的物理地址的哈希值的最大值相等。Determine the range of the hash value of the physical address: the absolute value of the difference between the hash value of the physical address and the hash value of the column family name is less than or equal to the hash value of the physical addresses of all data nodes in the selected rack The preset ratio of the maximum value of the hash value; where the maximum value of the hash value of all column family names in the table corresponding to the table name is equal to the maximum value of the hash value of the physical addresses of all data nodes in the selected rack .

需要说明的是,以上所述的实施例仅是为了便于本领域的技术人员理解而已,并不用于限制本发明的保护范围,在不脱离本发明的发明构思的前提下,本领域技术人员对本发明所做出的任何显而易见的替换和改进等均在本发明的保护范围之内。It should be noted that the above-described embodiments are only for the convenience of those skilled in the art to understand, and are not intended to limit the protection scope of the present invention. Any obvious replacements and improvements made by the invention are within the protection scope of the present invention.

Claims (9)

1. A method for implementing data center backup is characterized by comprising the following steps:
determining the storage range of the data block in the target data center according to the table name and the column family name of the data block to be backed up,
the method specifically comprises the following steps:
determining the range of the rack where the data node stored in the data block is located according to the table name,
the method specifically comprises the following steps:
calculating hash values of the table names, and calculating hash values of all racks in the target data center respectively;
determining the range of the hash value of the rack as follows: the absolute value of the difference between the hash values of the racks and the hash values of the table names is smaller than or equal to the preset proportion of the maximum value of the hash values of all the racks in the target data center;
the maximum value of the hash values of all table names in the original data center is equal to the maximum value of the hash values of all racks in the target data center; and selecting one data node in the determined storage range to store the data block.
2. The method of claim 1, wherein determining the storage range of the data block in the target data center according to the table name and the column name of the data block to be backed up further comprises:
determining a physical address range of the data node according to the column family name;
the selecting one of the data nodes from the storage range to store the data block comprises:
one chassis is selected from the range of chassis, and one physical address is selected from the range of physical addresses.
3. The method of claim 1, wherein selecting one of the data nodes from the storage range to store the data block comprises:
and randomly selecting one rack from the range of the racks, or selecting the rack corresponding to the hash value with the smallest absolute value of the difference value of the hash values of the table names.
4. The method of claim 1, wherein determining the physical address range of the data node according to the column family name comprises:
calculating hash values of the column family names, and respectively calculating hash values of physical addresses of all data nodes in the selected rack;
determining the range of the hash value of the physical address as follows: the absolute value of the difference between the hash value of the physical address and the hash value of the column family name is smaller than or equal to the preset proportion of the maximum value of the hash values of the physical addresses of all the data nodes in the selected rack; and the maximum value of the hash values of all column family names in the table corresponding to the table name is equal to the maximum value of the hash values of the physical addresses of all the data nodes in the selected rack.
5. The method of claim 4, wherein selecting one of the data nodes from the storage range to store the data block comprises:
and randomly selecting a data node corresponding to a physical address from the physical address range, or selecting a data node corresponding to a physical address corresponding to a hash value with the smallest absolute value of the difference value of the hash values of the column family names.
6. A method according to any one of claims 3 to 5, wherein the maximum value is 2 π.
7. An apparatus for implementing data center backup, comprising at least:
the determining module is used for determining the storage range of the data block in the target data center according to the table name and the column family name of the data block to be backed up;
the determining module is specifically configured to:
determining the range of the rack where the data node stored in the data block is located according to the table name,
the method is specifically used for:
calculating hash values of the table names, and calculating hash values of all racks in the target data center respectively;
determining the range of the hash value of the rack as follows: the absolute value of the difference between the hash values of the racks and the hash values of the table names is smaller than or equal to the preset proportion of the maximum value of the hash values of all the racks in the target data center;
the maximum value of the hash values of all table names in the original data center is equal to the maximum value of the hash values of all racks in the target data center; and the selection module is used for selecting one data node from the determined storage range to store the data block.
8. The apparatus of claim 7, wherein the determining module is specifically configured to:
determining a physical address range of the data node according to the column family name;
the selection module is specifically configured to:
one chassis is selected from the range of chassis, and one physical address is selected from the range of physical addresses.
9. The apparatus of claim 7, wherein the determining module is specifically configured to:
calculating hash values of the column family names, and respectively calculating hash values of physical addresses of all data nodes in the selected rack;
determining the range of the hash value of the physical address as follows: the absolute value of the difference between the hash value of the physical address and the hash value of the column family name is smaller than or equal to the preset proportion of the maximum value of the hash values of the physical addresses of all the data nodes in the selected rack; and the maximum value of the hash values of all column family names in the table corresponding to the table name is equal to the maximum value of the hash values of the physical addresses of all the data nodes in the selected rack.
CN201410032550.1A 2014-01-23 2014-01-23 A kind of method and apparatus for realizing data center backup Active CN103761167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410032550.1A CN103761167B (en) 2014-01-23 2014-01-23 A kind of method and apparatus for realizing data center backup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410032550.1A CN103761167B (en) 2014-01-23 2014-01-23 A kind of method and apparatus for realizing data center backup

Publications (2)

Publication Number Publication Date
CN103761167A CN103761167A (en) 2014-04-30
CN103761167B true CN103761167B (en) 2017-04-05

Family

ID=50528409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410032550.1A Active CN103761167B (en) 2014-01-23 2014-01-23 A kind of method and apparatus for realizing data center backup

Country Status (1)

Country Link
CN (1) CN103761167B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105592178B (en) * 2015-09-17 2018-12-25 新华三技术有限公司 A kind of back end method for determining position and device
CN105511801B (en) * 2015-11-12 2018-11-16 长春理工大学 The method and apparatus of data storage
CN107463342B (en) * 2017-08-28 2021-04-20 北京奇艺世纪科技有限公司 CDN edge node file storage method and device
CN112379840B (en) * 2020-11-17 2023-02-24 海光信息技术股份有限公司 Terminal data protection method and device and terminal
CN114780298B (en) * 2022-06-16 2022-09-06 深圳市慧为智能科技股份有限公司 File data processing method and device, computer terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103281291A (en) * 2013-02-19 2013-09-04 电子科技大学 Application layer protocol identification method based on Hadoop

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103281291A (en) * 2013-02-19 2013-09-04 电子科技大学 Application layer protocol identification method based on Hadoop

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《多数据中心非结构化数据复制方法研究》;王开煊;《中国优秀硕士学位论文全文数据库信息科技辑》;20121015(第10期);14-22 *

Also Published As

Publication number Publication date
CN103761167A (en) 2014-04-30

Similar Documents

Publication Publication Date Title
CN103761167B (en) A kind of method and apparatus for realizing data center backup
JP6302951B2 (en) Method and system for global namespace using consistent hash method
US20190034454A1 (en) Expandable tree-based indexing framework that enables expansion of the hadoop distributed file system
CN102801784B (en) A kind of distributed data storage method and equipment
TWI544334B (en) Data storage device and operating method thereof
CN103324533B (en) distributed data processing method, device and system
TWI694700B (en) Data processing method and device, user terminal
CN103218312B (en) file access method and system
JP2019512804A5 (en)
CN102609446B (en) Distributed Bloom filter system and application method thereof
CN104298541A (en) Data distribution algorithm and data distribution device for cloud storage system
CN104202434A (en) Node access method and device
US10338813B2 (en) Storage controller and using method therefor
TWI697223B (en) Data processing method
CN106599091B (en) RDF graph structure storage and index method based on key value storage
CN107291874A (en) Point map position polymerization and device
CN109981311B (en) Method and device for realizing graph layout
CN104052824A (en) Distributed cache method and system
CN103605484A (en) Data storage method and storage server
WO2016070341A1 (en) Data processing method and apparatus
CN104932986A (en) Data redistribution method and apparatus
CN107391040A (en) A kind of method and device of storage array disk I O scheduling
CN110941669B (en) Space vector data storage method based on attribute information and coordinate system conversion system
CN107832011A (en) A kind of date storage method, device, equipment and computer-readable recording medium
CN105765542B (en) Method for accessing files, distributed storage system and storage nodes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant