WO2022001883A1 - 一种数据重分布的方法和装置 - Google Patents

一种数据重分布的方法和装置 Download PDF

Info

Publication number
WO2022001883A1
WO2022001883A1 PCT/CN2021/102448 CN2021102448W WO2022001883A1 WO 2022001883 A1 WO2022001883 A1 WO 2022001883A1 CN 2021102448 W CN2021102448 W CN 2021102448W WO 2022001883 A1 WO2022001883 A1 WO 2022001883A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
rule table
redistribution
key
rule
Prior art date
Application number
PCT/CN2021/102448
Other languages
English (en)
French (fr)
Inventor
严俊
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21831776.6A priority Critical patent/EP4174676A4/en
Priority to JP2022581648A priority patent/JP2023532352A/ja
Priority to KR1020237002277A priority patent/KR20230025019A/ko
Publication of WO2022001883A1 publication Critical patent/WO2022001883A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the embodiments of the present application relate to the field of databases, and in particular, to a method and apparatus for data redistribution.
  • the more common method is to create a new table, export all the data stored in the old table, and then import all the data into the new table according to the new rules, then append incremental data, lock the old table after completion, and switch to the new table .
  • This processing method has low efficiency, high network pressure, high storage space requirements, and the process of incremental increase presents different time spans according to the amount of business, and there may even be a problem of not being able to catch up all the time.
  • Distributed database redistribution scenarios in addition to capacity expansion, also include scenarios such as capacity reduction, modification of distribution methods, modification of distribution fields, etc. These scenarios are often faced by database operation and maintenance personnel, and are also functions that mature database products should have.
  • An embodiment of the present application provides a data redistribution method, including: in response to a data redistribution request sent by a computing node, creating a rule table corresponding to the request according to original table information; generating rule table data and sending it to a storage corresponding to the rule table data node; import the rule table data into the rule table; complete the data in the rule table according to the original table; access the data in the way of the rule table.
  • the embodiment of the present application also provides a data redistribution device, including: a processor, used for responding to a data redistribution request sent by a computing node, and creating a rule table corresponding to the request according to the original table information; a storage node, used for generating The rule table data is sent to the storage node corresponding to the rule table data; the rule table data is imported into the rule table; the data in the rule table is complemented according to the original table; the calculation node is used to access the data according to the rule table.
  • Embodiments of the present application further provide a data redistribution system, including the above-mentioned data redistribution apparatus.
  • Embodiments of the present application further provide a device, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more programs
  • the processor implements the above data redistribution method.
  • An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and the above-mentioned data redistribution method is implemented when the computer program is executed by a processor.
  • FIG. 2 is a flowchart of a method for data redistribution provided according to Embodiment 2 of the present application;
  • FIG. 3 is a schematic diagram of a program module of an apparatus for data redistribution provided according to Embodiment 3 of the present application;
  • FIG. 4 is a schematic diagram of a program module of an apparatus for data redistribution provided according to Embodiment 4 of the present application;
  • FIG. 5 is a schematic structural diagram of a device provided according to an embodiment of the present application.
  • the database of the application has high concurrency, fast data growth, and 7*24 hours of service.
  • the expansion of data may increase by orders of magnitude, so distributed databases with good horizontal expansion capabilities are widely used.
  • Distributed databases generally use a share nothing architecture, that is, each node has independent storage, and no storage is shared between nodes. Data is distributed to multiple nodes according to rules, and nodes are generally connected by optical fibers and other networks.
  • the distributed database has good scalability and can flexibly distribute node data according to business scenarios. For example, with the increase in the amount of data, there are further requirements for data storage space and query. It is necessary to expand the original distributed database, add new data nodes to the distributed database, and perform data migration.
  • the method and apparatus for data redistribution provided by the embodiments of the present application solve the problems of low efficiency and high overhead of data redistribution processing methods in some situations.
  • the core idea of this embodiment is to create a rule table according to the new fragmentation rule after receiving a data redistribution request, and the rule table stores fragments Key and metadata information.
  • the so-called metadata information refers to the node information corresponding to the shard key data. For example, there are 3 nodes, which can be represented by 1, 2, and 3 respectively.
  • the computing nodes of the distributed database access the data according to the old table.
  • the computing nodes access the data according to the rule table.
  • the data redistribution method involved in this embodiment includes:
  • Step 102 In response to the data redistribution request sent by the computing node, create a rule table corresponding to the request according to the original table information.
  • Step 104 Generate rule table data and send it to a storage node corresponding to the rule table data.
  • Step 106 Import the rule table data into the rule table.
  • Step 108 Complete the data of the rule table according to the original table.
  • Step 110 Access the data in the manner of the rule table.
  • the data redistribution method provided in the first embodiment of the present application improves the efficiency of data redistribution by creating a new rule table instead of migrating data by creating a new table, and effectively reduces the data migration bandwidth during the redistribution process. impact to come.
  • the fields of the rule table only include shard key and metadata information, which requires extremely low storage space, reduces system overhead, and further improves efficiency.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the data redistribution method involved in this embodiment may include:
  • Step 202 Create a rule table.
  • a new rule table is created based on the original table information.
  • the sharding key of the new rule table is the same as the sharding method of the original table of the redistribution request.
  • the original table is hashed to 3 fields according to the field col1.
  • the redistribution requires hashing to 4 nodes, then the rule table is hashed to 4 nodes according to the field col1.
  • the redistribution in which the distribution mode is changed from 3 nodes to 4 nodes is used as an example for description.
  • the original table name is tbs_info_detail
  • the distribution mode is hash(uuid)(g1,g2,g3).
  • the processor creates a hidden rule table tbs_info_detail_res_rule
  • the distribution mode is hash(uuid)(g1,g2,g3,g4).
  • the attributes of the uuid field are exactly the same as the attributes of the original table, and the attributes of the groupid field can use the unsginedtinyint attribute (only one byte), and the rules need to create an index on the uuid (if the uuid of the original table is the primary key, create a primary key for uuid) .
  • Step 204 Generate rule table data.
  • Each storage node contains a redistribution control module for generating and importing metadata rule table data. These redistribution control modules export the distribution key value from each storage node in parallel, and carry the storage node number information at the same time.
  • the split files are split into files of different storage nodes according to the new distribution rules, and transmitted to the corresponding storage nodes. .
  • Step 206 Import data into the rule table.
  • the redistribution control module of each storage node can import the transmitted data into the rule table.
  • Step 208 Add a read lock.
  • Step 210 Complete the data.
  • Step 212 Modify the access mode.
  • Step 214 Release the read lock.
  • Step 216 Align the distribution rules.
  • the data redistribution apparatus 300 involved in this embodiment includes:
  • Compute node 301 processor 302 and storage node 303 .
  • the processor 302 is configured to respond to the data redistribution request sent by the computing node 301, and create a rule table corresponding to the request according to the original table information.
  • the computing node 301 is used to access data in the manner of a rule table.
  • the data redistribution device provided in the third embodiment of the present application migrates data by creating a rule table instead of creating a new table, thereby improving the efficiency of data redistribution and reducing the impact of data migration.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the data redistribution apparatus 400 involved in this embodiment includes:
  • Compute node 401 processor 402 and storage node 403 .
  • processor 402 there are four storage nodes 403 .
  • the processor 402 creates a new rule table according to the original table information, and the fragmentation key of the new rule table is consistent with the fragmentation method of the original table of the redistribution request.
  • Each of the four storage nodes 403 includes a redistribution control module 403a for generating and importing metadata rule table data. These redistribution control modules 403a derive distribution keys from each storage node 403 in parallel, and carry the storage node number information at the same time.
  • the split files are split into files of different storage nodes 403 according to the new distribution rules, and are transmitted to the corresponding storage nodes 403.
  • Storage node 403 goes up.
  • the computing node 401 records all INSERT/UPDATE/DELETE statements that involve the shard key and successfully executed, and records the shard key value and the corresponding groupid.
  • the redistribution control module 403a of the storage node 403 imports the transmitted data into the rule table.
  • the processor 402 notifies the computing node 401 to add a read lock to the table that needs to be redistributed.
  • the redistribution control module 403a compares the shard key value recorded in the INSERT/UPDATE/DELETE statement involving the shard key recorded by the computing node 401 with the shard key in the original table and the rule table.
  • the data corresponding to the shard key value is inserted into the data corresponding to the shard key value in the original table. If there is more data corresponding to the shard key value in the rule table, the data corresponding to the shard key value is deleted.
  • the processor 402 informs the computing node 401 to modify the access mode of the table, and modifies the computing node 401's own calculation and distribution algorithm to access it according to the rule table. Data is inserted into the newly added data node first.
  • the computing node 401 releases the added read lock. So far, the redistribution task has been completed.
  • step 216 of the second embodiment If the rules need to be distributed, follow step 216 of the second embodiment.
  • the data redistribution device provided by the fourth embodiment of the present application, by creating a rule table, and the rule table only contains two fields, the redistribution efficiency is improved, the impact of data migration is reduced, and the redistribution overhead is further reduced. .
  • the present application further provides a data redistribution system, including the data redistribution device of the fourth embodiment, and the system can efficiently perform the data redistribution task.
  • the functional modules/units in the system, and the device can be implemented as software (which can be implemented by computer program codes executable by a computing device). ), firmware, hardware, and their appropriate combination.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components Components execute cooperatively.
  • Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit .
  • FIG. 5 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • the device includes a processor 51 , a memory 52 , an input device 53 , an output device 54 and Communication device 55; the number of processors 51 in the device can be one or more, and one processor 51 is taken as an example in FIG. 5; For connection in other ways, in FIG. 5, the connection through the bus is taken as an example.
  • the memory 52 may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the data redistribution method in the embodiments of the present application.
  • the processor 51 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the memory 52, ie, implements any method provided by the embodiments of the present application.
  • the memory 52 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Additionally, memory 52 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some instances, memory 52 may further include memory located remotely from processor 51, which may be connected to the device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 53 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the device.
  • the output device 54 may include a display device such as a display screen.
  • the communication device 55 may include a receiver and a transmitter.
  • the communication device 55 is configured to transmit and receive information according to the control of the processor 51 .
  • an embodiment of the present application further provides a storage medium containing computer-executable instructions, and the computer-executable instructions, when executed by a computer processor, are used to execute a data redistribution method, including: obtaining a source database data Stored procedure; parse and translate the stored procedure to obtain the corresponding grammar block list; process the grammar block list to obtain the stored procedure that meets the requirements of the target database.
  • a storage medium containing computer-executable instructions provided by the embodiments of the present application, the computer-executable instructions of which are not limited to the above method operations, and can also perform related data redistribution methods provided by any embodiment of the present application. operate.
  • the present application can be implemented by means of software and necessary general-purpose hardware, and of course can also be implemented by hardware, but in many cases the former is a better implementation manner .
  • the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer , server, or network device, etc.) to execute the data redistribution method of each embodiment of the present application.
  • a computer-readable storage medium such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据重分布的方法和装置,方法包括:响应计算节点(301)发来的数据重分布请求,根据原表信息创建对应于请求的规则表;生成规则表数据并发送到规则表数据对应的存储节点(303);将规则表数据导入到规则表;按照原表补齐规则表的数据;按照规则表的方式访问数据。

Description

一种数据重分布的方法和装置
交叉引用
本申请基于申请号为“202010600815.9”、申请日为2020年06月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及数据库领域,具体涉及一种数据重分布的方法和装置。
背景技术
数据重分布业界主要有两种方法:
1.锁定要重分布的表,不能执行读写操作,直到表中的数据重分布全部完成。根据数据量的大小以及资源限制,数据重分布的耗时可能达到几小时甚至几天,可能导致业务长时间无法对正在重分布的表进行操作,从而产生严重的影响。
2.更常见的方法是将创建新表,将旧表存储的数据全量导出,然后按照新的规则全部导入到新表中,然后追加增量数据,完成后锁定旧表,将切换使用新表。这种处理方式效率低、网络压力大、存储空间要求高,且追增量的过程根据业务的量呈现不同的时间跨度,甚至可能存在一直追不上的问题。
数据库系统在运行过程中,使用上述现有的两种重分布方法,对现网环境的要求比较高,同时对运维操作人员的应急处理能力要求更高。
分布式数据库重分布场景,除了扩容之外,还有缩容、修改分发方式、修改分发字段等场景,这些场景是数据库运维人员经常要面对的,也是成熟数据库产品应具备的功能。
发明内容
本申请实施例提供一种数据重分布方法,包括:响应计算节点发来的数据重分布请求,根据原表信息创建对应于请求的规则表;生成规则表数据并发送到规则表数据对应的存储节点;将规则表数据导入到规则表;按照原表补齐规则表的数据;按照规则表的方式访问数据。
本申请实施例还提供一种数据重分布的装置,包括:处理器,用于响应计算节点发来的数据重分布请求,根据原表信息创建对应于请求的规则表;存储节点,用于生成规则表数据并发送到规则表数据对应的存储节点;将规则表数据导入到规则表;按照原表补齐规则表的数据;计算节点,用于按照规则表的方式访问数据。
本申请实施例还提供一种数据重分布系统,包括如上述数据重分布装置。
本申请实施例还提供一种设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现上述数据重分布方法。
本申请实施例还提供一种存储介质,存储介质存储有计算机程序,计算机程序被处理器执行时实现上述数据重分布方法。
附图说明
图1是根据本申请实施例一提供的数据重分布的方法的流程图;
图2是根据本申请实施例二提供的数据重分布的方法的流程图;
图3是根据本申请实施例三提供的数据重分布的装置的程序模块示意图;
图4是根据本申请实施例四提供的数据重分布的装置的程序模块示意图;
图5是根据本申请一个实施例提供的一种设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,下面通过具体实施方式结合附图对本申请实施例作进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
目前应用的数据库并发度高、数据增长快、7*24小时提供服务。随着业务的扩展,数据的扩展可能是数量级的递增,所以拥有良好的横向扩展能力的分布式数据库得到广泛的应用。分布式数据库一般采用无共享(share nothing)的架构,即每一个节点都有独立的存储,节点间不共享存储,数据根据规则分布到多个节点上,节点之间一般通过光纤等网络连接。
分布式数据库具备良好的伸缩性,可根据业务场景灵活地分布节点数据。例如,随着数据量的量级增长,对数据存储空间和查询有进一步的要求,需要对原有的分布式数据库进行扩容,增加新的数据节点到分布式数据库中,并进行数据迁移。
本申请实施例提供的一种数据重分布的方法和装置,解决了一些情形中中数据重分布处理方式效率低、开销大的问题。
实施例一:
为了解决一些情形中建立新表所带来的效率低下的问题,本实施例的核心思想是在接收到数据重分布请求后,根据新的分片规则创建一个规则表,规则表里面保存分片键和元数据信息,所谓元数据信息是指该分片键数据对应的节点信息,例如有3个节点,可分别用1、2、3代表。重分布任务结束前分布式数据库计算节点按照旧表的方式进行访问,重分布任务结束后,计算节点按照规则表访问数据。
请参见图1,本实施例所涉及的数据重分布方法包括:
步骤102:响应计算节点发来的数据重分布请求,根据原表信息创建对应于请求的规则表。
步骤104:生成规则表数据并发送到规则表数据对应的存储节点。
步骤106:将规则表数据导入到规则表。
步骤108:按照原表补齐规则表的数据。
步骤110:按照规则表的方式访问数据。
本申请实施例一提供的数据重分布的方法,通过创建新的规则表,而不是通过创建新表来迁移数据,提高了数据重分布的效率,并有效地降低了重分布过程中数据迁移带来的影响。 规则表的字段只包括分片键和元数据信息,对存储空间的要求极低,降低了系统开销,进一步提高了效率。
实施例二:
请参见图2,本实施例所涉及的数据重分布方法可以包括:
步骤202:创建规则表。
发起数据重分布请求后,根据原表信息创建一个新的规则表,新的规则表的分片键和重分布请求的原表的分片方式一致,比如原表按照字段col1哈希到3个节点上,重分布要求哈希到4个节点,则规则表就按照字段col1哈希到4个节点。
本实施例中以分发模式从3个节点改变成4个节点的重分布为例来进行说明。原表名为tbs_info_detail,分发模式为hash(uuid)(g1,g2,g3),处理器创建一个隐藏的规则表tbs_info_detail_res_rule,分发模式为hash(uuid)(g1,g2,g3,g4)。规则表中有两个字段,uuid和groupid。其中uuid字段的属性和原表的属性完全相同,而groupid字段的属性可使用unsginedtinyint属性(仅占一个字节),同时规则需要对uuid创建索引(若原表的uuid是主键则对uuid创建主键)。
步骤204:生成规则表数据。
各个存储节点都包含一个重分布控制模块,用于生成和导入元数据规则表数据。这些重分布控制模块并行从各个存储节点导出分发键值,同时带上存储节点号信息,拆分出来的文件根据新的分发规则拆分成不同存储节点的文件,并传输到对应的存储节点上去。
在本步骤中,记录所有涉及到该分片键并执行成功的INSERT/UPDATE/DELETE语句,记录分片键值及对应的groupid。
步骤206:导入数据到规则表。
各存储节点的重分布控制模块可将传输过来的数据导入到规则表中。
步骤208:加读锁。
通知计算节点给需要重分布的表增加读锁。
步骤210:补齐数据。
将记录的涉及分片键的INSERT/UPDATE/DELETE语句中记录的分片键值和原表及规则表中的分片键进行比对,若规则表缺少该分片键值所对应的数据则插入原表中该分片键值所对应的数据,规则表多了分片键值所对应的数据则删除该分片键值所对应的数据,规则表数据和原表不一致则修改为与原表一致的数据。
步骤212:修改访问方式。
通知计算节点修改表的访问方式,将计算节点自身计算分发算法修改成按照规则表的方式访问,涉及到表的INSERT/UPDATE/DELETE语句需要同时对规则表进行修改,新增数据优先 往新增的数据节点插入。
步骤214:释放读锁。
释放所加的读锁。
至此,重分布任务已完成。
步骤216:对齐分发规则。
对于规则表分发方式和计算节点不一致的问题,可通过跑批的方式在空闲时间执行,例如:
1.先从节点1中查询规则表的元数据字段为2的行,分批将这部分数据迁移到节点2上,同时变更规则表。
2.当大部分数据迁移成功后锁表,迁移剩下的数据。
3.通知计算节点变更表的访问方式。
4.清理规则表。
本申请实施例二提供的数据重分布方法,通过创建规则表,且规则表仅包含两个字段,在提高重分布效率、降低数据迁移所带来的影响的同时,进一步降低了重分布的开销。
实施例三:
请参见图3,本实施例所涉及的数据重分布装置300包括:
计算节点301、处理器302和存储节点303。其中处理器302用于响应计算节点301发来的数据重分布请求,根据原表信息创建对应于请求的规则表。存储节点303有多个,用于生成规则表数据并发送到规则表数据对应的存储节点303,将规则表数据导入到规则表,按照原表补齐规则表的数据。计算节点301,用于按照规则表的方式访问数据。
本申请实施例三提供的数据重分布装置,通过创建规则表,而不是通过创建新表来迁移数据,提高了数据重分布的效率,降低数据迁移带来的影响。
实施例四:
请参见图4,本实施例所涉及的数据重分布装置400包括:
计算节点401、处理器402和存储节点403。其中存储节点403有4个。
计算节点401发起数据重分布请求后,处理器402根据原表信息创建一个新的规则表,新的规则表的分片键和重分布请求的原表的分片方式一致。
4个存储节点403各包含一个重分布控制模块403a,用于生成和导入元数据规则表数据。这些重分布控制模块403a并行从各个存储节点403导出分发键值,同时带上存储节点号信息,拆分出来的文件根据新的分发规则拆分成不同存储节点403的文件,并传输到对应的存储节点403上去。计算节点401记录所有涉及到该分片键并执行成功的INSERT/UPDATE/DELETE语句,记录分片键值及对应的groupid。
存储节点403的重分布控制模块403a将传输过来的数据导入到规则表中。
处理器402通知计算节点401给需要重分布的表增加读锁。
重分布控制模块403a将计算节点401记录的涉及分片键的INSERT/UPDATE/DELETE语句中记录的分片键值和原表及规则表中的分片键进行比对,若规则表缺少该分片键值所对应的数据则插入原表中该分片键值所对应的数据,规则表多了分片键值所对应的数据则删除该分片键值所对应的数据,规则表数据和原表不一致则修改为与原表一致的数据。
处理器402通知计算节点401修改表的访问方式,将计算节点401自身计算分发算法修改成按照规则表的方式访问,涉及到表的INSERT/UPDATE/DELETE语句需要同时对规则表进行修改,新增数据优先往新增的数据节点插入。
计算节点401释放所加的读锁。至此,重分布任务已完成。
若需要对其分发规则,则按照实施例二的步骤216进行。
本申请实施例四提供的数据重分布装置,通过创建规则表,且规则表仅包含两个字段,在提高重分布效率、降低数据迁移所带来的影响的同时,进一步降低了重分布的开销。
本申请还提供一种数据重分布系统,包括实施例四的数据重分布装置,该系统能高效率执行数据重分布任务。
可见,本领域的技术人员应该明白,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件(可以用计算装置可执行的计算机程序代码来实现)、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。
本申请实施例还提供一种设备,图5是本申请实施例提供的一种设备的结构示意图,如图5所示,该设备包括处理器51、存储器52、输入装置53、输出装置54和通信装置55;设备中处理器51的数量可以是一个或多个,图5中以一个处理器51为例;设备中的处理器51、存储器52、输入装置53和输出装置54可以通过总线或其他方式连接,图5中以通过总线连接为例。
存储器52作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的数据重分布方法对应的程序指令/模块。处理器51通过运行存储在存储器52中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现本申请实施例提供的任一方法。
存储器52可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等。此外,存 储器52可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器52可进一步包括相对于处理器51远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置53可用于接收输入的数字或字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入。输出装置54可包括显示屏等显示设备。
通信装置55可以包括接收器和发送器。通信装置55设置为根据处理器51的控制进行信息收发通信。
在一个实施例中,本申请实施例还提供一种包含计算机可执行指令的存储介质,计算机可执行指令在由计算机处理器执行时用于执行一种数据重分布方法,包括:获取源数据库的存储过程;对存储过程进行解析和翻译,得到对应的语法块列表;对语法块列表进行处理,得到满足目标数据库要求的存储过程。
当然,本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上的方法操作,还可以执行本申请任意实施例所提供的数据重分布方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本申请可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例的数据重分布方法。
以上内容是结合具体的实施方式对本申请实施例所作的进一步详细说明,不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本申请的保护范围。

Claims (13)

  1. 一种数据重分布方法,包括:
    响应计算节点发来的数据重分布请求,根据原表信息创建对应于所述请求的规则表;
    生成规则表数据并发送到所述规则表数据对应的存储节点;
    将所述规则表数据导入到所述规则表;
    按照所述原表补齐所述规则表的数据;
    按照所述规则表的方式访问数据。
  2. 如权利要求1所述的数据重分布方法,其中,所述规则表包括分片键和元数据信息。
  3. 如权利要求1至2任一项所述的数据重分布方法,其中,所述生成规则表数据,包括:
    导出分发键值和节点号信息,并按照所述请求中所包含的分发规则拆分成对应存储节点的文件;
    记录分片键和分发模式,以及对应分片键上执行成功的INSERT/UPDATE/DELETE语句。
  4. 如权利要求3所述的数据重分布方法,其中,所述按照原表补齐所述规则表的数据,包括:
    将所述规则表和所述原表的所述分片键与所述INSERT/UPDATE/DELETE语句记录的分片键进行比对,所述规则表缺少所述分片键对应的数据则插入所述原表中所述分片键对应的数据,所述规则表中多出了所述分片键对应的数据则删除所述分片键对应的数据,所述规则表中针对所述分片键对应的数据与所述原表中的相应数据不一致时则修改为所述原表中所述分片键对应的数据。
  5. 如权利要求1至4任一项所述的数据重分布方法,其中,所述按照原表补齐所述规则表的数据之前,还包括:
    对需要数据重分布的表增加读锁;
    所述按照所述规则表的方式访问数据之后,还包括:
    释放所述读锁。
  6. 一种数据重分布装置,包括:
    处理器,用于响应计算节点发来的数据重分布请求,根据原表信息创建对应于所述请求的规则表;
    存储节点,用于生成规则表数据并发送到所述规则表数据对应的存储节点;将所述规则表数据导入到所述规则表;按照所述原表补齐所述规则表的数据;
    计算节点,用于按照所述规则表的方式访问数据。
  7. 如权利要求6所述的数据重分布装置,其中,所述规则表包括分片键和元数据信息。
  8. 如权利要求6至7任一项所述的数据重分布装置,其中,所述存储节点还用于导出分发键值和节点号信息,并按照所述请求中所包含的分发规则拆分成对应存储节点的文件;
    所述计算节点还用于记录分片键和分发模式,以及对应分片键上执行成功的INSERT/UPDATE/DELETE语句。
  9. 如权利要求8所述的数据重分布装置,其中,所述存储节点还用于将所述规则表和所述原表的所述分片键与所述INSERT/UPDATE/DELETE语句记录的分片键进行比对,所述规则表缺少所述分片键对应的数据则插入所述原表中所述分片键对应的数据,所述规则表中多出了 所述分片键对应的数据则删除所述分片键对应的数据,所述规则表中针对所述分片键对应的数据与所述原表中的相应数据不一致时则修改为所述原表中所述分片键对应的数据。
  10. 如权利要求6至9任一项所述的数据重分布装置,其中,所述计算节点还用于在所述存储节点补齐数据之前对需要数据重分布的表增加读锁;在按照所述规则表的方式访问数据之后释放所述读锁。
  11. 一种数据重分布系统,其中,包括如权利要求6至10任一项所述的数据重分布装置。
  12. 一种设备,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1至5任一项所述的方法。
  13. 一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至5任一项所述的方法。
PCT/CN2021/102448 2020-06-28 2021-06-25 一种数据重分布的方法和装置 WO2022001883A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21831776.6A EP4174676A4 (en) 2020-06-28 2021-06-25 METHOD AND APPARATUS FOR DATA REDISTRIBUTION
JP2022581648A JP2023532352A (ja) 2020-06-28 2021-06-25 データ再配分の方法及び装置
KR1020237002277A KR20230025019A (ko) 2020-06-28 2021-06-25 데이터 재배포 방법 및 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010600815.9A CN113849496A (zh) 2020-06-28 2020-06-28 一种数据重分布的方法和装置
CN202010600815.9 2020-06-28

Publications (1)

Publication Number Publication Date
WO2022001883A1 true WO2022001883A1 (zh) 2022-01-06

Family

ID=78972778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102448 WO2022001883A1 (zh) 2020-06-28 2021-06-25 一种数据重分布的方法和装置

Country Status (5)

Country Link
EP (1) EP4174676A4 (zh)
JP (1) JP2023532352A (zh)
KR (1) KR20230025019A (zh)
CN (1) CN113849496A (zh)
WO (1) WO2022001883A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239417A (zh) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 一种分布式数据库数据分片后动态调整方法及装置
CN106034144A (zh) * 2015-03-12 2016-10-19 中国人民解放军国防科学技术大学 一种基于负载均衡的虚拟资产数据存储方法
CN106407308A (zh) * 2016-08-31 2017-02-15 天津南大通用数据技术股份有限公司 一种分布式数据库的扩容方法及装置
CN108319623A (zh) * 2017-01-18 2018-07-24 华为技术有限公司 一种数据重分布方法、装置及数据库集群
CN108932256A (zh) * 2017-05-25 2018-12-04 中兴通讯股份有限公司 分布式数据重分布控制方法、装置及数据管理服务器
US20200026624A1 (en) * 2016-11-22 2020-01-23 Nutanix, Inc. Executing resource management operations in distributed computing systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870602B (zh) * 2014-04-03 2017-05-31 中国科学院地理科学与资源研究所 数据库空间分片复制方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239417A (zh) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 一种分布式数据库数据分片后动态调整方法及装置
CN106034144A (zh) * 2015-03-12 2016-10-19 中国人民解放军国防科学技术大学 一种基于负载均衡的虚拟资产数据存储方法
CN106407308A (zh) * 2016-08-31 2017-02-15 天津南大通用数据技术股份有限公司 一种分布式数据库的扩容方法及装置
US20200026624A1 (en) * 2016-11-22 2020-01-23 Nutanix, Inc. Executing resource management operations in distributed computing systems
CN108319623A (zh) * 2017-01-18 2018-07-24 华为技术有限公司 一种数据重分布方法、装置及数据库集群
CN108932256A (zh) * 2017-05-25 2018-12-04 中兴通讯股份有限公司 分布式数据重分布控制方法、装置及数据管理服务器

Also Published As

Publication number Publication date
EP4174676A4 (en) 2023-11-08
EP4174676A1 (en) 2023-05-03
KR20230025019A (ko) 2023-02-21
JP2023532352A (ja) 2023-07-27
CN113849496A (zh) 2021-12-28

Similar Documents

Publication Publication Date Title
US11010358B2 (en) Data migration method and system
US11526291B2 (en) Integrated hierarchical storage management
US11797498B2 (en) Systems and methods of database tenant migration
US11726984B2 (en) Data redistribution method and apparatus, and database cluster
US10853242B2 (en) Deduplication and garbage collection across logical databases
US11120024B2 (en) Dual-stack architecture that integrates relational database with blockchain
US11442961B2 (en) Active transaction list synchronization method and apparatus
CN108984639B (zh) 服务器集群的数据处理方法和装置
CN114756577A (zh) 多源异构数据的处理方法、计算机设备及存储介质
US10749955B2 (en) Online cache migration in a distributed caching system using a hybrid migration process
US20220121652A1 (en) Parallel Stream Processing of Change Data Capture
CN111930850A (zh) 数据校验方法、装置、计算机设备和存储介质
US20230049797A1 (en) Optimization of Database Write Operations By Combining and Parallelizing Operations Based on a Hash Value of Primary Keys
WO2022127866A1 (zh) 数据处理方法、装置、电子设备、存储介质
CN113656384B (zh) 数据处理方法、分布式数据库系统、电子设备及存储介质
CN112912870A (zh) 租户标识符的转换
WO2022001883A1 (zh) 一种数据重分布的方法和装置
CN110222105B (zh) 数据汇总处理方法及装置
US20210117096A1 (en) Method, device and computer program product for backuping data
US11768853B2 (en) System to copy database client data
US20240104069A1 (en) Systems and methods of managing state machine systems with compacting distributed log storage
US20240095246A1 (en) Data query method and apparatus based on doris, storage medium and device
US20230101740A1 (en) Data distribution in data analysis systems
US20210334273A1 (en) Index contention under high concurrency in a database system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21831776

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022581648

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20237002277

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021831776

Country of ref document: EP

Effective date: 20230130