WO2022127866A1 - Data processing method and apparatus, and electronic device and storage medium - Google Patents

Data processing method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2022127866A1
WO2022127866A1 PCT/CN2021/138821 CN2021138821W WO2022127866A1 WO 2022127866 A1 WO2022127866 A1 WO 2022127866A1 CN 2021138821 W CN2021138821 W CN 2021138821W WO 2022127866 A1 WO2022127866 A1 WO 2022127866A1
Authority
WO
WIPO (PCT)
Prior art keywords
sql
sql statement
combined
node
statement
Prior art date
Application number
PCT/CN2021/138821
Other languages
French (fr)
Chinese (zh)
Inventor
买建华
刘志文
付裕
黄健
许振华
李从兵
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022127866A1 publication Critical patent/WO2022127866A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Definitions

  • the embodiments of the present application relate to the field of databases, and in particular, to a data processing method, apparatus, electronic device, and storage medium.
  • the data of the old node will be distributed to the corresponding new node according to the configured distribution rules. During this process, the business still submits data on the old node, so it is necessary to append the data during this period to On the new node, the incremental data is added.
  • the database cluster will parse the logs of the old nodes, obtain SQL (Structured Query Language, referred to as SQL) statements, and store the parsed SQL statements on the new nodes in the distributed database cluster. Play back up to achieve incremental data.
  • SQL Structured Query Language
  • An embodiment of the present application provides a data processing method, including: acquiring a logical transaction log of a first node; acquiring SQL statements according to the logical transaction log; merging the SQL statements that meet preset conditions to generate a combined SQL statement SQL statement, so that the second node can play back the combined SQL statement concurrently.
  • the embodiment of the present application also provides a data processing device, including: a log acquisition module, configured to acquire a logical transaction log from a first node; a SQL statement acquisition module, used to acquire SQL statements according to the logical transaction log; SQL statement merging The module combines the SQL statements that meet the preset conditions, and generates a combined SQL statement for the second node to play back the combined SQL statement concurrently.
  • a log acquisition module configured to acquire a logical transaction log from a first node
  • a SQL statement acquisition module used to acquire SQL statements according to the logical transaction log
  • SQL statement merging The module combines the SQL statements that meet the preset conditions, and generates a combined SQL statement for the second node to play back the combined SQL statement concurrently.
  • An embodiment of the present application further provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a program that can be executed by the at least one processor instructions, the instructions being executed by the at least one processor to enable the at least one processor to perform the data processing method described above.
  • Embodiments of the present application further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the foregoing data processing method is implemented.
  • FIG. 1 is a flowchart of a data processing method according to a first embodiment of the present application
  • FIG. 2 is a flowchart of a data processing method according to a second embodiment of the present application.
  • FIG. 3 is a schematic diagram of an additional amount according to a second embodiment of the present application.
  • FIG. 4 is a flowchart of a data processing apparatus according to a third embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
  • the first embodiment of the present application relates to a data processing method, which can be applied to electronic devices such as servers.
  • This embodiment includes: acquiring the logical transaction log of the first node; acquiring SQL statements according to the logical transaction log; merging the SQL statements satisfying preset conditions to generate a combined SQL statement for concurrent use by the second node The combined SQL statement is played back.
  • This embodiment reduces the number of SQL statements output to the new node during the incremental increment process by merging SQL statements, thereby reducing the number of SQL statements running on the database node, and merging SQL statements reduces the coupling between SQL statements relationship, increase the playback efficiency of SQL statements, and improve the success rate of incremental increments.
  • the cluster manager receives a redistribution request from an upper-layer service, and exports the data that needs to be redistributed in full, that is, according to the number of lines in the split file and the number of distribution batches configured in the cluster manager.
  • Split the file verify the distribution field and the column data of the distribution field in the split file, construct a calculation object for the distribution field and the correct column data, use the distribution algorithm to calculate the destination distribution node, write
  • the split files are sent to import the data of the old node into the new nodes of each configuration.
  • the above-mentioned cluster manager can receive cluster-related requests of upper-layer services, such as the above-mentioned data redistribution request, manage the distributed cluster, coordinate the report of the database (Data Base, DB) status of the resource manager, and notify the resource manager. Commands such as switchover, backup, and redistribution are performed.
  • the resource manager usually the upper-level agent of the database, is a local database monitoring program that performs complex operations on the database in response to upper-level requests. In this embodiment, its main function is to respond to the redistribution request of the cluster manager, execute the redistribution process, split the SQL according to the distribution rules during the incremental incremental process, and connect the DB to play back the SQL.
  • Both the first node and the second node are database nodes, which are basic nodes for storing data.
  • This embodiment proposes a data processing method to process the incremental data. , to improve the incremental efficiency.
  • FIG. 1 The flowchart of the data processing method of this embodiment is shown in FIG. 1 .
  • Step 101 Obtain the logical transaction log of the first node.
  • the resource manager obtains the logical transaction log binlog of the first node.
  • Step 102 Obtain the SQL statement according to the logical transaction log.
  • the logical transaction log binlog may be parsed by the SQL consolidator.
  • the SQL consolidator For each SQL statement parsed from the binlog, the SQL consolidator first caches it in the memory of the server.
  • the SQL consolidator can be understood as a process, and each database node under the resource manager corresponds to an SQL consolidator, that is, the SQL consolidator is the process of adding the first node to the logical transaction in the first node. A process in which logs are parsed and SQL is merged.
  • Step 103 Combine the SQL statements that meet the preset conditions to generate the combined SQL statement for the second node to play back the combined SQL statement concurrently.
  • the SQL statement that satisfies the preset condition is the SQL statement that operates on the same primary key in the same database table.
  • the SQL statements that operate on the same primary key of the same database table are merged, and the primary key identifies a row of data, so that the merged statements are all operations performed on the data of different rows, that is, the row operations of the SQL statement no longer interact with each other.
  • the association further reduces the strong coupling relationship of the original SQL statement, facilitates batch concurrent playback, shortens the playback time, and improves the performance of data incremental increments.
  • the SQL consolidator first caches it in the memory of the server, and checks whether there is a new SQL statement other than the newly generated SQL statement in the memory. In the same way, they are all SQL statements that operate on the same primary key in the same database table. If there is one, two SQL statements that operate on the same primary key in the same database table are merged, and the combined SQL statement is put into memory, so that The SQL statement parsed next time is merged according to the SQL statement stored in the memory.
  • s4 For example, if there are SQL statements s1, s2, and s3 in the memory, and the currently parsed SQL statement from the logical transaction log is s4, put s4 into the memory, and check whether there are any s1, s2, and s3 that can be merged with s4. If there is an SQL statement, for example, s4 can be merged with s1, then s4 and s1 are merged to generate s5, and s5 is also put into the memory. At this time, the SQL statements stored in the memory are s2, s3, and s5.
  • the process of redistributing the incremental increments are merged according to the execution time of the SQL statements, so that the correlation degree of each SQL statement in the execution order is reduced, that is, the originally ordered SQL becomes disordered, and the original SQL is released.
  • the relationship in execution steps further reduces the strong coupling relationship of SQL statements, facilitates parallel playback of SQL statements, and creates space for improving SQL playback efficiency.
  • the verb and field value of the combined SQL statement are determined according to the SQL statement that satisfies the preset condition and the execution time; and the combined SQL statement is generated according to the determined verb and field value.
  • SQL statement In this implementation, the SQL statements are combined according to their verbs and execution time, so that the combined SQL statement and the SQL statement executed according to the execution time have the same execution effect.
  • Verbs of SQL statements include INSERT, UPDATE, and DELETE.
  • the two SQL statements that satisfy the preset condition are the first SQL statement and the second SQL statement, as follows:
  • the verb of the first SQL statement is INSERT
  • the verb of the second SQL statement is UPDATE
  • the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement
  • the verb of the combined SQL statement is INSERT;
  • the verb of the first SQL statement is update UPDATE
  • the verb of the second SQL statement is update UPDATE
  • the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement
  • the verb of the combined SQL statement is update UPDATE ;
  • the verb of the first SQL statement is UPDATE
  • the verb of the second SQL statement is DELETE
  • the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement
  • the verb of the combined SQL statement is DELETE DELETE ;
  • the verb of the first SQL statement is DELETE
  • the verb of the second SQL statement is INSERT
  • the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement
  • the verb of the combined SQL statement is UPDATE .
  • the verb of the first SQL statement is INSERT
  • the verb of the second SQL statement is DELETE
  • the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement
  • the first SQL statement and the The combined result of the second SQL statement is that no SQL statement is generated.
  • db1.tb1 has only three fields, namely a, b, and c; and the three fields are all int types, that is, integers, a is the primary key, and the execution of the first SQL statement The time is earlier than the second SQL statement.
  • the merged SQL statement retains the latest data information.
  • the value of the statement is the value in the before image of the first SQL statement.
  • step 103 the SQL statements that meet the preset conditions are combined to generate the combined SQL statement, so that the second node can concurrently play back the combined SQL statement.
  • the SQL statement that satisfies the preset condition is an SQL statement that operates on the same primary key in the same database table as an example.
  • the SQL statement that satisfies the preset condition can also be: SQL statements that operate on different primary keys in the same database table, for example, can also be SQL statements that operate on the same field.
  • the first SQL statement is to change the value of field A in the first row of the table to 2
  • the second SQL statement is to change the value of field A in the second row of the table to 3
  • the first SQL statement can Combined with the second SQL statement, the combined SQL statement can change the value of field A in the first row to 2 and the value of field A in the second row to 3.
  • the two SQL statements are changed to one, which reduces the number of SQL statements output to the new node, that is, the second node, and improves the playback efficiency.
  • the other two SQL statements operate on the same column and field in the table, and will The operating SQL statements are merged, which reduces the coupling relationship of SQL statements on fields and facilitates concurrent execution.
  • the logical transaction log of the first node is acquired, the SQL statement is acquired according to the logical transaction log, the SQL statements satisfying the preset conditions are combined, and the combined SQL statement is generated, so that the second node can play back the combined SQL statement concurrently,
  • the number of output SQL statements is reduced, so that the number of SQL statements played back by database nodes is reduced, thereby increasing the speed of incremental increments.
  • the coupling relationship between SQL statements is reduced and SQL playback is increased. efficiency and improve the success rate of incremental increments.
  • the second embodiment of the present application relates to a data processing method.
  • This embodiment is substantially the same as the first embodiment, except that: the SQL statements that meet the preset conditions are combined, and after the combined SQL statement is generated , including: obtaining a hash value according to the database table and primary key value in the merged SQL statement; determining a hash bucket for storing the merged SQL statement according to the hash value.
  • FIG. 2 A flowchart of the data processing method according to the second embodiment of the present application is shown in FIG. 2 .
  • Step 201 Obtain the logical transaction log of the first node.
  • step 202 the SQL statement is acquired according to the logical transaction log.
  • Step 203 Combine the SQL statements that meet the preset conditions to generate a combined SQL statement.
  • Steps 201 to 203 are substantially the same as steps 101 to 103 of the first embodiment of the present application, and details are not repeated here.
  • Step 204 Determine a hash bucket for storing the combined SQL statement according to the database table and the primary key value in the combined SQL statement.
  • a hash value is obtained according to a library table and a primary key value in the merged SQL statement; a hash bucket for storing the merged SQL statement is determined according to the hash value.
  • the SQL statements are stored in the data structure of the hash bucket, so that the number of SQL statements in each hash bucket file is as uniform as possible, thereby improving the playback efficiency.
  • the library table and the primary key value in the combined SQL statement are input into the hash function, the hash value is determined according to the hash function, and the hash bucket is determined according to the hash value.
  • Step 205 Determine a second node that plays back the combined SQL statement according to the combined SQL statement in the hash bucket.
  • Step 206 Split the hash bucket file according to the determined second node to obtain an SQL file.
  • the hash bucket file is a file including all merged SQL statements in the hash bucket.
  • the statements that operate on the same node in the hash bucket file are merged into one SQL file, which reduces the number of times the SQL statement is sent and further improves the playback efficiency.
  • Step 207 Send the SQL file to the determined second node. Wherein, the combined SQL statement in the SQL file is used for playback of the same second node.
  • the concurrent playback includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL file.
  • concurrent playback through concurrent playback, the playback efficiency is improved, and the success rate of incremental increments is increased.
  • the resource manager modifies the hash bucket file generated by the SQL combiner according to the distribution rules, calculates the second node corresponding to each SQL statement in the hash bucket file, and obtains the SQL file, and the SQL file is used for the same one.
  • For node playback connect to the remote DB for playback.
  • concurrent playback is performed between different SQL files, and SQL statements in the same SQL file are played back concurrently.
  • the resource manager After the playback is completed, the resource manager returns the playback result to the cluster manager. After the cluster manager receives the reply, it continues a new round of incremental increment operations until the time of a certain round of incremental increments is less than the incremental increment threshold.
  • the schematic diagram of the incremental increment in this embodiment is shown in FIG. 3 , wherein the DBA is a database administrator (Database Administrator).
  • the cluster manager obtains the redistribution request and exports the data that needs to be redistributed in full, that is, splits the fully exported files according to the number of lines and distribution batches configured in the cluster manager.
  • the distribution field in the obtained file and the column data of the distribution field are verified, the calculation object is constructed for the distribution field and the correct column data is verified, the distribution algorithm is used to calculate the destination distribution node, and the corresponding node is written, that is, the shard data. Cache, when the number of split files reaches the number of sending batches, the split files will be sent, and the data of the old node will be imported into the new nodes of each configuration.
  • the cluster manager will initiate an incremental process.
  • the cluster manager first queries the current logical transaction log location of each new node and records it, and then sends an incremental request to the resource controller of the old node.
  • the resource controller scans the location based on the backup location.
  • Logical transaction log notify the SQL statement combiner to parse the logical transaction log and combine SQL statements, and generate a hash bucket file with hash bucket as the data structure to organize data according to the combined SQL statement.
  • the hash bucket file is shown in Figure 3
  • the resource manager splits the SQL statements in the hash bucket file according to the distribution key, that is, each hash bucket file is split into multiple files, and each SQL file is transferred to a certain node, resource
  • the manager connects to the DB database node remotely.
  • the DB implements concurrency between SQL files, executes each SQL statement concurrently in the SQL file, and returns the execution result after execution.
  • each SQL statement into a hash bucket file the resource manager splits the hash bucket file into multiple SQL files, and then combines multiple new SQL statements.
  • the generated SQL file is transferred to the corresponding new node, and the new node plays back the SQL file concurrently and returns the execution result.
  • the records of the same database table and the same primary key that is, the SQL statements operating on the same database table and the same primary key
  • the existence of the hash bucket makes the number of SQL in each bucket as uniform as possible, so that the subsequent playback efficiency is higher.
  • SQL merging enables better implementation in three dimensions during playback, thereby improving playback efficiency. 1. After the SQL is merged, the SQL of the same database table with different primary keys in the same hash bucket is played back concurrently; 2. After the SQL is merged, the SQL of different database tables in the same hash bucket is played back concurrently; 3. The SQL concurrently between the hash buckets playback. It can be seen that through SQL merging, all SQL statements can be broken up, each SQL statement is no longer related to each other, and batch concurrent playback can be achieved, thereby shortening the playback time and significantly improving the performance of data tracking.
  • the third embodiment of the present application relates to a data processing device, including: a log acquisition module 401 for acquiring a logical transaction log of a first node; a SQL statement acquisition module 402 for acquiring SQL statements according to the logical transaction log; SQL The statement merging module 403 combines the SQL statements that meet the preset conditions, and generates a combined SQL statement for the second node to play back the combined SQL statement concurrently.
  • the SQL statement that meets the preset condition in the SQL statement combining module 403 is the SQL statement that operates on the same primary key in the same database table.
  • the SQL statement merging module 403 is further configured to determine the execution time of the SQL statement that satisfies the preset condition in the logical transaction log; The SQL statements are combined to obtain a combined SQL statement.
  • the SQL statement combining module 403 is further configured to determine the verb and field value of the combined SQL statement according to the SQL statement satisfying the preset condition and the execution time; according to the determined verb and field value, and generate the combined SQL statement.
  • the SQL statement merging module 403 is further configured to obtain a hash value according to the library table and the primary key value in the merged SQL statement; and determine a hash value for storing the merged SQL according to the hash value a hash bucket of the statement; determine a second node that plays back the combined SQL statement according to the combined SQL statement in the hash bucket; send the combined SQL statement to the second node .
  • the SQL statement merging module 403 is further configured to split the hash bucket file according to the determined second node to obtain an SQL file, and send the SQL file to the determined second node; wherein , the hash bucket file is a file including all the merged SQL statements in the hash bucket, wherein the merged SQL statements in the SQL file are used for playback of the same second node .
  • the concurrent playback in the SQL statement merging module 403 includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL file.
  • this embodiment is a system example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment.
  • the relevant technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition.
  • the related technical details mentioned in this embodiment can also be applied to the first embodiment.
  • each module involved in this embodiment is a logical module.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
  • the fourth embodiment of the present application relates to an electronic device, as shown in FIG. 5 , comprising at least one processor 501 ; and a memory 502 communicatively connected to the at least one processor; wherein the memory stores data that can be Instructions executed by the at least one processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the above-mentioned data processing method.
  • the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory.
  • the bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted over the wireless medium through the antenna, and the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
  • the fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing method and apparatus, and an electronic device and a storage medium, which relate to the field of databases. The method comprises: acquiring a logical transaction log of a first node (101); acquiring SQL statements according to the logical transaction log (102); and merging SQL statements that satisfy a preset condition so as to generate a merged SQL statement, such that a second node plays back the merged SQL statement concurrently (103).

Description

数据处理方法、装置、电子设备、存储介质Data processing method, device, electronic device, storage medium
交叉引用cross reference
本申请基于申请号为“202011498751.2”、申请日为2020年12月17日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。This application is based on the Chinese patent application with the application number "202011498751.2" and the application date is December 17, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference. Application.
技术领域technical field
本申请实施例涉及数据库领域,特别涉及一种数据处理方法、装置、电子设备、存储介质。The embodiments of the present application relate to the field of databases, and in particular, to a data processing method, apparatus, electronic device, and storage medium.
背景技术Background technique
在数据重分布过程中,旧节点的数据会按照配置的分发规则将数据分发到相应的新节点,在此过程中,业务仍然在旧节点上提交数据,因此需要将这段时间的数据追加到新节点上,即追增量数据。在数据重分布后期追增量数据的过程中,数据库集群会解析旧节点的日志,得到SQL(Structured Query Language,简称为SQL)语句,将解析后的SQL语句在分布式数据库集群中的新节点上回放,实现数据追增量。In the process of data redistribution, the data of the old node will be distributed to the corresponding new node according to the configured distribution rules. During this process, the business still submits data on the old node, so it is necessary to append the data during this period to On the new node, the incremental data is added. In the process of retrieving incremental data in the later stage of data redistribution, the database cluster will parse the logs of the old nodes, obtain SQL (Structured Query Language, referred to as SQL) statements, and store the parsed SQL statements on the new nodes in the distributed database cluster. Play back up to achieve incremental data.
然而,数据重分布后期追增量数据时,业务并发高、压力大,若SQL回放效率过低,会导致追增量数据的速度赶不上业务数据写入的速度,致使追增量失败。However, when incremental data is retrieved in the later stage of data redistribution, business concurrency is high and the pressure is high. If the SQL playback efficiency is too low, the speed of incremental data retrieval will not catch up with the speed of business data writing, resulting in the failure of incremental retrieval.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种数据处理方法,包括:获取第一节点的逻辑事务日志;根据所述逻辑事务日志获取SQL语句;将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的 SQL语句。An embodiment of the present application provides a data processing method, including: acquiring a logical transaction log of a first node; acquiring SQL statements according to the logical transaction log; merging the SQL statements that meet preset conditions to generate a combined SQL statement SQL statement, so that the second node can play back the combined SQL statement concurrently.
本申请实施例还提供了一种数据处理装置,包括:日志获取模块,用于从第一节点获取逻辑事务日志;SQL语句获取模块,用于根据所述逻辑事务日志获取SQL语句;SQL语句合并模块,对满足预设条件的所述SQL语句合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的SQL语句。The embodiment of the present application also provides a data processing device, including: a log acquisition module, configured to acquire a logical transaction log from a first node; a SQL statement acquisition module, used to acquire SQL statements according to the logical transaction log; SQL statement merging The module combines the SQL statements that meet the preset conditions, and generates a combined SQL statement for the second node to play back the combined SQL statement concurrently.
本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的数据处理方法。An embodiment of the present application further provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a program that can be executed by the at least one processor instructions, the instructions being executed by the at least one processor to enable the at least one processor to perform the data processing method described above.
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的数据处理方法。Embodiments of the present application further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the foregoing data processing method is implemented.
附图说明Description of drawings
图1是根据本申请第一实施例中的数据处理方法的流程图;1 is a flowchart of a data processing method according to a first embodiment of the present application;
图2是根据本申请第二实施例中的数据处理方法的流程图;2 is a flowchart of a data processing method according to a second embodiment of the present application;
图3是根据本申请第二实施例中的追增量的示意图;3 is a schematic diagram of an additional amount according to a second embodiment of the present application;
图4是根据本申请第三实施例中的数据处理装置的流程图;4 is a flowchart of a data processing apparatus according to a third embodiment of the present application;
图5是根据本申请第四实施例中的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, each embodiment of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that, in each embodiment of the present application, many technical details are provided for the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in the present application can be realized. The following divisions of the various embodiments are for the convenience of description, and should not constitute any limitation on the specific implementation of the present application, and the various embodiments may be combined with each other and referred to each other on the premise of not contradicting each other.
本申请第一实施例涉及一种数据处理方法,可以应用于服务器等电子设备。本实施例包括:获取第一节点的逻辑事务日志;根据所述逻辑事务日志获取SQL 语句;将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的SQL语句。本实施例通过合并SQL语句减少了追增量过程中向新节点输出的SQL语句的数量,从而使得数据库节点运行的SQL语句的数量减少,而且,合并SQL语句减小了SQL语句之间的耦合关系,增加SQL语句的回放效率,提高了追增量的成功率。The first embodiment of the present application relates to a data processing method, which can be applied to electronic devices such as servers. This embodiment includes: acquiring the logical transaction log of the first node; acquiring SQL statements according to the logical transaction log; merging the SQL statements satisfying preset conditions to generate a combined SQL statement for concurrent use by the second node The combined SQL statement is played back. This embodiment reduces the number of SQL statements output to the new node during the incremental increment process by merging SQL statements, thereby reducing the number of SQL statements running on the database node, and merging SQL statements reduces the coupling between SQL statements relationship, increase the playback efficiency of SQL statements, and improve the success rate of incremental increments.
在一个例子中,集群管理器接收到上层业务的重分布请求,对需要进行重分布的数据进行全量导出,即根据在集群管理器中配置的拆分文件的行数和分发批次数对全量导出的文件进行拆分,对拆分得到的文件中的分发字段及分发字段的列数据进行校验,对分发字段且校验正确的列数据构造计算对象,使用分发算法计算出目的分发节点,写入相应的节点,即分片数据缓存,拆分文件数量每达到一个分发批次数,就对拆分文件进行发送,实现将旧节点的数据导入各配置的新节点。In one example, the cluster manager receives a redistribution request from an upper-layer service, and exports the data that needs to be redistributed in full, that is, according to the number of lines in the split file and the number of distribution batches configured in the cluster manager. Split the file, verify the distribution field and the column data of the distribution field in the split file, construct a calculation object for the distribution field and the correct column data, use the distribution algorithm to calculate the destination distribution node, write When the number of split files reaches the number of distribution batches, the split files are sent to import the data of the old node into the new nodes of each configuration.
上述的集群管理器可以接收上层业务的集群相关请求,如上述的数据重分布请求,对分布式集群进行管理,协调资源管理器的数据库(Data Base,简称为DB)状态上报,通知资源管理器进行切换、备份以及重分布等命令。资源管理器,通常是数据库的上层代理,是响应上层请求对数据库进行复杂操作的本地数据库监控程序。在本实施例中,它的主要作用是响应集群管理器的重分布请求,执行重分布流程,完成追增量过程中按分发规则拆分SQL以及连接DB回放SQL。第一节点和第二节点均为数据库节点,是保存数据的基本节点。The above-mentioned cluster manager can receive cluster-related requests of upper-layer services, such as the above-mentioned data redistribution request, manage the distributed cluster, coordinate the report of the database (Data Base, DB) status of the resource manager, and notify the resource manager. Commands such as switchover, backup, and redistribution are performed. The resource manager, usually the upper-level agent of the database, is a local database monitoring program that performs complex operations on the database in response to upper-level requests. In this embodiment, its main function is to respond to the redistribution request of the cluster manager, execute the redistribution process, split the SQL according to the distribution rules during the incremental incremental process, and connect the DB to play back the SQL. Both the first node and the second node are database nodes, which are basic nodes for storing data.
在上述数据重分布的过程中,业务仍然在旧节点上提交数据,因此会将这段时间的数据追加到新节点上,本实施例提出了数据处理方法,实现对追增量的数据的处理,提高追增量效率。In the above process of data redistribution, the business still submits data on the old node, so the data during this period will be appended to the new node. This embodiment proposes a data processing method to process the incremental data. , to improve the incremental efficiency.
本实施例的数据处理方法的流程图如图1所示。The flowchart of the data processing method of this embodiment is shown in FIG. 1 .
步骤101,获取第一节点的逻辑事务日志。Step 101: Obtain the logical transaction log of the first node.
示例性的,资源管理器获取第一节点的逻辑事务日志binlog。Exemplarily, the resource manager obtains the logical transaction log binlog of the first node.
步骤102,根据逻辑事务日志获取SQL语句。Step 102: Obtain the SQL statement according to the logical transaction log.
示例性的,可以通过SQL合并器解析逻辑事务日志binlog,对于每个从binlog解析出来的SQL语句,SQL合并器首先将其缓存在服务器的内存当中。SQL合并器可以理解为一个进程,资源管理器下的每个数据库节点对应一个 SQL合并器,即,SQL合并器是在对第一节点的追增量过程中,对第一节点中的逻辑事务日志进行解析,并对SQL进行合并的一个进程。Exemplarily, the logical transaction log binlog may be parsed by the SQL consolidator. For each SQL statement parsed from the binlog, the SQL consolidator first caches it in the memory of the server. The SQL consolidator can be understood as a process, and each database node under the resource manager corresponds to an SQL consolidator, that is, the SQL consolidator is the process of adding the first node to the logical transaction in the first node. A process in which logs are parsed and SQL is merged.
步骤103,将满足预设条件的SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放合并后的SQL语句。Step 103: Combine the SQL statements that meet the preset conditions to generate the combined SQL statement for the second node to play back the combined SQL statement concurrently.
在一个例子中,所述满足预设条件的所述SQL语句为对同库表中的相同主键进行操作的所述SQL语句。该实现中,对相同库表相同主键进行操作的SQL语句进行合并,主键标识一行数据,使得合并后的语句都是对不同行的数据进行的操作,即,使得SQL语句的行操作不再互相关联,进一步减小了原SQL语句强耦合的关系,便于做到批量并发回放,缩短了回放时间,提高数据追增量的性能。In an example, the SQL statement that satisfies the preset condition is the SQL statement that operates on the same primary key in the same database table. In this implementation, the SQL statements that operate on the same primary key of the same database table are merged, and the primary key identifies a row of data, so that the merged statements are all operations performed on the data of different rows, that is, the row operations of the SQL statement no longer interact with each other. The association further reduces the strong coupling relationship of the original SQL statement, facilitates batch concurrent playback, shortens the playback time, and improves the performance of data incremental increments.
示例性的,从逻辑事务日志中每新生成一个SQL语句,SQL合并器都先将其缓存在服务器的内存中,查找内存中除新生成的SQL语句之外,是否存在与新生成的SQL语句一样,都是对同库表中相同主键进行操作的SQL语句,若存在,则将两个对同库表中相同主键进行操作的SQL语句进行合并,合并后的SQL语句放入内存中,以便下次解析出来的SQL语句根据内存中存储的SQL语句进行合并。例如,内存中存在SQL语句s1,s2,s3,从逻辑事务日志中当前解析出来的SQL语句为s4,则将s4放入内存中,并查找s1,s2,s3中是否存在能与s4合并的SQL语句,若存在,例如s4能与s1合并,则将s4与s1合并生成s5,s5也放入内存,此时内存中存储的sql语句为s2,s3,s5。Exemplarily, each time a new SQL statement is generated from the logical transaction log, the SQL consolidator first caches it in the memory of the server, and checks whether there is a new SQL statement other than the newly generated SQL statement in the memory. In the same way, they are all SQL statements that operate on the same primary key in the same database table. If there is one, two SQL statements that operate on the same primary key in the same database table are merged, and the combined SQL statement is put into memory, so that The SQL statement parsed next time is merged according to the SQL statement stored in the memory. For example, if there are SQL statements s1, s2, and s3 in the memory, and the currently parsed SQL statement from the logical transaction log is s4, put s4 into the memory, and check whether there are any s1, s2, and s3 that can be merged with s4. If there is an SQL statement, for example, s4 can be merged with s1, then s4 and s1 are merged to generate s5, and s5 is also put into the memory. At this time, the SQL statements stored in the memory are s2, s3, and s5.
在一个例子中,确定满足所述预设条件的所述SQL语句在所述逻辑事务日志的执行时间,根据所述确定的执行时间将满足所述预设条件的所述SQL语句进行合并,得到合并后的SQL语句。例如在逻辑事务日志中确定对同表同主键进行操作的SQL语句的执行时间,若第一SQL语句执行时间在前,第二SQL语句执行时间在后,则合并后的SQL语句的执行效果与第一SQL语句和第二SQL语句按时间顺序执行的效果相同。该实现中,重分布追增量过程中根据SQL语句的执行时间进行合并,使得各SQL语句的在执行顺序上的关联度降低,即,使得原本有序的SQL变成无序,解除原SQL在执行步骤上的关系,进一步减小了SQL语句的强耦合关系,便于进行回放SQL语句的并行回放,为提升SQL回放效率创造了空间。In one example, determine the execution time of the SQL statement that satisfies the preset condition in the logical transaction log, and combine the SQL statements that satisfy the preset condition according to the determined execution time to obtain The combined SQL statement. For example, determine the execution time of SQL statements that operate on the same table and the same primary key in the logical transaction log. If the execution time of the first SQL statement is earlier and the execution time of the second SQL statement is later, the execution effect of the combined SQL statement is the same as The effect of executing the first SQL statement and the second SQL statement in chronological order is the same. In this implementation, in the process of redistributing the incremental increments, they are merged according to the execution time of the SQL statements, so that the correlation degree of each SQL statement in the execution order is reduced, that is, the originally ordered SQL becomes disordered, and the original SQL is released. The relationship in execution steps further reduces the strong coupling relationship of SQL statements, facilitates parallel playback of SQL statements, and creates space for improving SQL playback efficiency.
在一个例子中,根据满足所述预设条件的SQL语句和所述执行时间,确定所述合并后的SQL语句的动词和字段值;根据所述确定的动词和字段值,生成所述合并后的SQL语句。该实现中,根据SQL语句的动词和执行时间合并SQL语句,使得合并得到的SQL语句和按照执行时间执行的SQL语句的执行效果相同。SQL语句的动词包括插入INSERT,更新UPDATE,删除DELETE。In one example, the verb and field value of the combined SQL statement are determined according to the SQL statement that satisfies the preset condition and the execution time; and the combined SQL statement is generated according to the determined verb and field value. SQL statement. In this implementation, the SQL statements are combined according to their verbs and execution time, so that the combined SQL statement and the SQL statement executed according to the execution time have the same execution effect. Verbs of SQL statements include INSERT, UPDATE, and DELETE.
示例性的,以满足所述预设条件的两条所述SQL语句分别为第一SQL语句和第二SQL语句为例,如下:Exemplarily, for example, the two SQL statements that satisfy the preset condition are the first SQL statement and the second SQL statement, as follows:
若第一SQL语句的动词为插入INSERT,第二SQL语句的动词为更新UPDATE,第一语句在逻辑事务日志中的执行时间早于第二SQL语句,则合并后的SQL语句的动词为INSERT;If the verb of the first SQL statement is INSERT, the verb of the second SQL statement is UPDATE, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the verb of the combined SQL statement is INSERT;
若第一SQL语句的动词为更新UPDATE,第二SQL语句的动词为更新UPDATE,第一语句在逻辑事务日志中的执行时间早于第二SQL语句,则合并后的SQL语句的动词为更新UPDATE;If the verb of the first SQL statement is update UPDATE, the verb of the second SQL statement is update UPDATE, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the verb of the combined SQL statement is update UPDATE ;
若第一SQL语句的动词为更新UPDATE,第二SQL语句的动词为删除DELETE,第一语句在逻辑事务日志中的执行时间早于第二SQL语句,则合并后的SQL语句的动词为删除DELETE;If the verb of the first SQL statement is UPDATE, the verb of the second SQL statement is DELETE, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the verb of the combined SQL statement is DELETE DELETE ;
若第一SQL语句的动词为删除DELETE,第二SQL语句的动词为插入INSERT,第一语句在逻辑事务日志中的执行时间早于第二SQL语句,则合并后的SQL语句的动词为更新UPDATE。If the verb of the first SQL statement is DELETE, the verb of the second SQL statement is INSERT, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the verb of the combined SQL statement is UPDATE .
需要说明的是,若第一SQL语句的动词为插入INSERT,第二SQL语句的动词为删除DELETE,第一语句在逻辑事务日志中的执行时间早于第二SQL语句,则第一SQL语句和第二SQL语句合并后的结果是不产生任何SQL语句。It should be noted that, if the verb of the first SQL statement is INSERT, the verb of the second SQL statement is DELETE, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the first SQL statement and the The combined result of the second SQL statement is that no SQL statement is generated.
下面以数据库db1中的表tb1为例,db1.tb1仅有三个字段,分别为a,b,c;且三个字段均为int类型,即整型,a为主键,第一SQL语句的执行时间早于第二SQL语句。Taking the table tb1 in the database db1 as an example, db1.tb1 has only three fields, namely a, b, and c; and the three fields are all int types, that is, integers, a is the primary key, and the execution of the first SQL statement The time is earlier than the second SQL statement.
第一SQL语句:INSERT INTO db1.tb1VALUES(1,2,3),即在db1.tb1中插入一条a=1,b=2,c=3的数据;第二SQL语句:UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1,即在db1.tb1中主键a=1的行中将a,b,c的值更新,更新为a=4,b=5,c=6;合并后的SQL语句保留最新的数据信息,合并后的SQL语句为: INSERT INTO db1.tb1VALUES(4,5,6),即在db1.tb1中插入一条a=4,b=5,c=6的数据。The first SQL statement: INSERT INTO db1.tb1VALUES(1,2,3), that is, insert a piece of data with a=1, b=2, c=3 into db1.tb1; the second SQL statement: UPDATE db1.tb1SET a =4,b=5,c=6 WHERE a=1, that is, update the values of a, b, and c in the row of primary key a=1 in db1.tb1 to a=4, b=5, c=6 ;The merged SQL statement retains the latest data information. The merged SQL statement is: INSERT INTO db1.tb1VALUES(4,5,6), that is, insert a=4,b=5,c=in db1.tb1 6 data.
第一SQL语句:UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1,即在db1.tb1中主键a=1的行中将a,b,c的值更新,更新为a=4,b=5,c=6;第二SQL语句:UPDATE db1.tb1SET a=7,b=8,c=9WHERE a=4,即在db1.tb1中将主键a=4的行中的a,b,c的值更新,更新为a=7,b=8,c=9;合并后的SQL语句:UPDATE db1.tb1SET a=7,b=8,c=9WHERE a=1,即在db1.tb1中将主键a=1的行中的a,b,c的值更新,更新为a=7,b=8,c=9,合并后的SQL语句保留第一SQL语句的before image的列值,即WHERE后的a=1,和第二SQL语句的after image的列值,即SET后的a=7,b=8,c=9。The first SQL statement: UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1, that is, update the values of a, b, and c to a in the row of primary key a=1 in db1.tb1 =4,b=5,c=6; the second SQL statement: UPDATE db1.tb1SET a=7,b=8,c=9WHERE a=4, that is, in db1.tb1, the primary key a=4 in the row The values of a, b, and c are updated to a=7, b=8, c=9; the combined SQL statement: UPDATE db1.tb1SET a=7, b=8, c=9 WHERE a=1, that is, in In db1.tb1, the values of a, b, and c in the row with the primary key a=1 are updated to a=7, b=8, c=9, and the merged SQL statement retains the before image of the first SQL statement The column value, that is, a=1 after WHERE, and the column value of the after image of the second SQL statement, that is, a=7, b=8, and c=9 after SET.
第一SQL语句:UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1,即在db1.tb1中主键a=1的行中将a,b,c的值更新,更新为a=4,b=5,c=6;第二SQL语句:DELETE FROM db1.tb1WHERE a=1,即在db1.tb1中删除主键a=1的行数据;合并后的SQL语句为:DELETE FROM db1.tb1WHERE a=1,即在db1.tb1中删除主键a=1的行数据,合并后的SQL语句中WHERE后a=1的值为第一SQL语句中WHERE后的值,即合并后的SQL语句的值为第一SQL语句的before image当中的值。The first SQL statement: UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1, that is, update the values of a, b, and c to a in the row of primary key a=1 in db1.tb1 =4,b=5,c=6; the second SQL statement: DELETE FROM db1.tb1WHERE a=1, that is, delete the row data of the primary key a=1 in db1.tb1; the combined SQL statement is: DELETE FROM db1 .tb1WHERE a=1, that is, delete the row data of the primary key a=1 in db1.tb1, the value of a=1 after the WHERE in the merged SQL statement is the value after the WHERE in the first SQL statement, that is, the merged SQL The value of the statement is the value in the before image of the first SQL statement.
第一SQL语句:DELETE FROM db1.tb1WHERE a=1,即在db1.tb1中删除主键a=1的行数据;第二SQL语句:INSERT INTO db1.tb1VALUES(4,5,6),即在db1.tb1中插入一条a=4,b=5,c=6的数据;合并后的SQL语句为:UPDATE db1.tb1SET a=4,b=5,c=6WHERE a=1,合并后的SQL语句的before image来自第一SQL语句,after image来自第二SQL语句。The first SQL statement: DELETE FROM db1.tb1WHERE a=1, that is, delete the row data of the primary key a=1 in db1.tb1; the second SQL statement: INSERT INTO db1.tb1VALUES(4, 5, 6), that is, in db1 Insert a piece of data with a=4, b=5, c=6 into .tb1; the combined SQL statement is: UPDATE db1.tb1SET a=4, b=5, c=6 WHERE a=1, the combined SQL statement The before image comes from the first SQL statement, and the after image comes from the second SQL statement.
需要说明的是,若第一SQL语句:INSERT INTO db1.tb1VALUE(1,2,3),即在db1.tb1中插入一条a=1,b=2,c=3的数据;第二SQL语句:DELETE FROM db1.tb1WHERE a=1,即在db1.tb1中删除主键a=1的行数据,合并后不产生任何SQL语句。It should be noted that, if the first SQL statement: INSERT INTO db1.tb1VALUE(1,2,3), that is, insert a piece of data with a=1, b=2, c=3 into db1.tb1; the second SQL statement : DELETE FROM db1.tb1WHERE a=1, that is, delete the row data of primary key a=1 in db1.tb1, and no SQL statement will be generated after merging.
通过上述的步骤103,将满足预设条件的SQL语句进行合并,生成合并后的SQL语句,即可使得第二节点并发回放合并后的SQL语句。Through the above step 103, the SQL statements that meet the preset conditions are combined to generate the combined SQL statement, so that the second node can concurrently play back the combined SQL statement.
可以理解的是,本实施例以满足预设条件的SQL语句为对同库表中的相同 主键进行操作的SQL语句为例进行说明,在实际应用中,满足预设条件的SQL语句也可以是对同库表中的不同主键进行操作的SQL语句,例如,也可以是对相同字段操作的SQL语句。示例性地,第一SQL语句是将表中第一行字段A的值改为2,第二SQL语句是将表的第二行中字段A的值改为3,则可以对第一SQL语句和第二SQL语句进行合并,执行合并后的SQL语句可以更改第一行字段A的值为2,第二行字段A的值为3。两条SQL语句更改为一条,减少了输出到新节点,即第二节点的SQL语句的数量,提高了回放效率,另外两条SQL语句均为对表中同列同字段进行操作,将对同字段操作的SQL语句进行合并,降低了SQL语句在字段上的耦合关系,便于执行并发。It can be understood that, in this embodiment, the SQL statement that satisfies the preset condition is an SQL statement that operates on the same primary key in the same database table as an example. In practical applications, the SQL statement that satisfies the preset condition can also be: SQL statements that operate on different primary keys in the same database table, for example, can also be SQL statements that operate on the same field. Exemplarily, the first SQL statement is to change the value of field A in the first row of the table to 2, and the second SQL statement is to change the value of field A in the second row of the table to 3, then the first SQL statement can Combined with the second SQL statement, the combined SQL statement can change the value of field A in the first row to 2 and the value of field A in the second row to 3. The two SQL statements are changed to one, which reduces the number of SQL statements output to the new node, that is, the second node, and improves the playback efficiency. The other two SQL statements operate on the same column and field in the table, and will The operating SQL statements are merged, which reduces the coupling relationship of SQL statements on fields and facilitates concurrent execution.
本实施例获取第一节点的逻辑事务日志,根据逻辑事务日志获取SQL语句,将满足预设条件的SQL语句合并,生成合并后的SQL语句,以供第二节点并发回放合并后的SQL语句,减少了输出的SQL语句的数量,使得数据库节点回放的SQL语句的数量减少,从而增加追增量的速度,另外,通过合并SQL语句,减小了SQL语句之间的耦合关系,增加了SQL回放的效率,提高了追增量的成功率。In this embodiment, the logical transaction log of the first node is acquired, the SQL statement is acquired according to the logical transaction log, the SQL statements satisfying the preset conditions are combined, and the combined SQL statement is generated, so that the second node can play back the combined SQL statement concurrently, The number of output SQL statements is reduced, so that the number of SQL statements played back by database nodes is reduced, thereby increasing the speed of incremental increments. In addition, by merging SQL statements, the coupling relationship between SQL statements is reduced and SQL playback is increased. efficiency and improve the success rate of incremental increments.
本申请的第二实施例涉及一种数据处理方法,本实施例与第一实施例大致相同,不同之处在于:将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句之后,包括:根据合并后的SQL语句中的库表和主键值获取哈希值;根据哈希值确定用于存储所述合并后的SQL语句的哈希桶。The second embodiment of the present application relates to a data processing method. This embodiment is substantially the same as the first embodiment, except that: the SQL statements that meet the preset conditions are combined, and after the combined SQL statement is generated , including: obtaining a hash value according to the database table and primary key value in the merged SQL statement; determining a hash bucket for storing the merged SQL statement according to the hash value.
本申请第二实施例的数据处理方法的流程图如图2所示。A flowchart of the data processing method according to the second embodiment of the present application is shown in FIG. 2 .
步骤201,获取第一节点的逻辑事务日志。Step 201: Obtain the logical transaction log of the first node.
步骤202,根据逻辑事务日志获取SQL语句。In step 202, the SQL statement is acquired according to the logical transaction log.
步骤203,将满足预设条件的SQL语句进行合并,生成合并后的SQL语句。Step 203: Combine the SQL statements that meet the preset conditions to generate a combined SQL statement.
步骤201至步骤203与本申请第一实施例的步骤101至步骤103大致相同,此处不再赘述。 Steps 201 to 203 are substantially the same as steps 101 to 103 of the first embodiment of the present application, and details are not repeated here.
步骤204,根据合并后的SQL语句中的库表和主键值确定用于存储所述合并后的SQL语句的哈希桶。Step 204: Determine a hash bucket for storing the combined SQL statement according to the database table and the primary key value in the combined SQL statement.
在一个例子中,根据所述合并后的SQL语句中的库表和主键值获取哈希值;根据所述哈希值确定用于存储所述合并后的SQL语句的哈希桶。以哈希桶的数 据结构,对SQL语句进行存储,使得每个哈希桶文件的SQL语句数量尽量均匀,从而提高回放效率。In one example, a hash value is obtained according to a library table and a primary key value in the merged SQL statement; a hash bucket for storing the merged SQL statement is determined according to the hash value. The SQL statements are stored in the data structure of the hash bucket, so that the number of SQL statements in each hash bucket file is as uniform as possible, thereby improving the playback efficiency.
示例性的,合并后的SQL语句中的库表和主键值输入哈希函数,根据哈希函数确定哈希值,根据哈希值确定哈希桶。Exemplarily, the library table and the primary key value in the combined SQL statement are input into the hash function, the hash value is determined according to the hash function, and the hash bucket is determined according to the hash value.
步骤205,根据所述哈希桶中的所述合并后的SQL语句确定回放所述合并后的SQL语句的第二节点。Step 205: Determine a second node that plays back the combined SQL statement according to the combined SQL statement in the hash bucket.
步骤206,根据所述确定的第二节点对哈希桶文件进行拆分得到SQL文件。其中,哈希桶文件为包括所述哈希桶中所有合并后的SQL语句的文件。该实现中,对哈希桶文件中对同一个节点操作的语句合并到一个SQL文件,减少了SQL语句的发送次数,进一步提高了回放效率。Step 206: Split the hash bucket file according to the determined second node to obtain an SQL file. The hash bucket file is a file including all merged SQL statements in the hash bucket. In this implementation, the statements that operate on the same node in the hash bucket file are merged into one SQL file, which reduces the number of times the SQL statement is sent and further improves the playback efficiency.
步骤207,将所述SQL文件发送到所述确定的第二节点。其中,所述SQL文件中的所述合并后的SQL语句用于同一个所述第二节点的回放。Step 207: Send the SQL file to the determined second node. Wherein, the combined SQL statement in the SQL file is used for playback of the same second node.
在一个例子中,并发回放包括:所述SQL文件之间的并发回放和所述SQL文件中的各SQL语句的并发回放。该实现中,通过并发回放,提高了回放效率,增加了追增量的成功率。In one example, the concurrent playback includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL file. In this implementation, through concurrent playback, the playback efficiency is improved, and the success rate of incremental increments is increased.
示例性的,资源管理器根据分发规则对SQL合并器生成的哈希桶文件进行改造,计算哈希桶文件中每条SQL语句相应的第二节点,得到SQL文件,该SQL文件用于同一个节点的回放,连接远端DB进行回放,回放时,不同SQL文件之间并发回放,同一SQL文件中SQL语句并发回放。Exemplarily, the resource manager modifies the hash bucket file generated by the SQL combiner according to the distribution rules, calculates the second node corresponding to each SQL statement in the hash bucket file, and obtains the SQL file, and the SQL file is used for the same one. For node playback, connect to the remote DB for playback. During playback, concurrent playback is performed between different SQL files, and SQL statements in the same SQL file are played back concurrently.
回放完成之后,资源管理器将回放结果返回给集群管理器,集群管理器收到回复后,继续新的一轮的追增量操作,直到某一轮追增量时间小于追增量阈值。After the playback is completed, the resource manager returns the playback result to the cluster manager. After the cluster manager receives the reply, it continues a new round of incremental increment operations until the time of a certain round of incremental increments is less than the incremental increment threshold.
本实施例的追增量的示意图如图3所示,其中DBA为数据库管理员(Database Administrator)。The schematic diagram of the incremental increment in this embodiment is shown in FIG. 3 , wherein the DBA is a database administrator (Database Administrator).
集群管理器获得重分布请求,对需要进行重分布的数据进行全量导出,即根据在集群管理器中配置的拆分文件的行数和分发批次数对全量导出的文件进行拆分,对拆分得到的文件中的分发字段及分发字段的列数据进行校验,对分发字段且校验正确的列数据构造计算对象,使用分发算法计算出目的分发节点,写入相应的节点,即分片数据缓存,拆分文件数量每达到一个发送批次数,就 对拆分文件进行发送,实现将旧节点的数据导入各配置的新节点。集群管理器会发起追增量流程,集群管理器首先查询各个新节点当前逻辑事务日志位置并记录,然后发送追增量请求给旧节点的资源控制器,资源控制器根据备份到的位置,扫描逻辑事务日志,通知SQL语句合并器解析逻辑事务日志并对SQL语句进行合并,根据合并后的SQL语句生成以哈希桶为数据结构组织数据的哈希桶文件,哈希桶文件即为图3中的redo SQL文件,资源管理器将哈希桶文件中的SQL语句按分发键拆分,即每个哈希桶文件被拆分成多个文件,每个SQL文件传输到确定的节点,资源管理器远程连接DB数据库节点,DB实现SQL文件间并发,SQL文件内并发执行各SQL语句,执行之后返回执行结果。The cluster manager obtains the redistribution request and exports the data that needs to be redistributed in full, that is, splits the fully exported files according to the number of lines and distribution batches configured in the cluster manager. The distribution field in the obtained file and the column data of the distribution field are verified, the calculation object is constructed for the distribution field and the correct column data is verified, the distribution algorithm is used to calculate the destination distribution node, and the corresponding node is written, that is, the shard data. Cache, when the number of split files reaches the number of sending batches, the split files will be sent, and the data of the old node will be imported into the new nodes of each configuration. The cluster manager will initiate an incremental process. The cluster manager first queries the current logical transaction log location of each new node and records it, and then sends an incremental request to the resource controller of the old node. The resource controller scans the location based on the backup location. Logical transaction log, notify the SQL statement combiner to parse the logical transaction log and combine SQL statements, and generate a hash bucket file with hash bucket as the data structure to organize data according to the combined SQL statement. The hash bucket file is shown in Figure 3 In the redo SQL file, the resource manager splits the SQL statements in the hash bucket file according to the distribution key, that is, each hash bucket file is split into multiple files, and each SQL file is transferred to a certain node, resource The manager connects to the DB database node remotely. The DB implements concurrency between SQL files, executes each SQL statement concurrently in the SQL file, and returns the execution result after execution.
以MySQL分布式集群数据的文档管理系统为例,在该系统中,假设是以文档类型分表保存各文档,如果某个类型的文档增多,导致该类型文档的数据库表承受的数据量大,此时分布式数据库集群可以执行重分布操作,按照分发规则,对文档数据进行拆分,然后存储到对应节点。完成此步骤之后,进行数据追增量操作,集群管理器给各旧库表对应的资源管理器发送追增量请求,资源管理器收到追增量请求后,获取逻辑事务日志,即MySQL的binlog文件,SQL合并器进程解析binlog文件并合并解析得到SQL语句,根据将各SQL语句形成哈希桶文件,资源管理器将哈希桶文件进行拆分成多个SQL文件,然后将多个新生成的SQL文件传输到对应的新节点,新节点并发回放SQL文件并返回执行结果。Take the document management system of MySQL distributed cluster data as an example. In this system, it is assumed that each document is stored in a table by document type. If the number of documents of a certain type increases, the database table of this type of document will bear a large amount of data. At this time, the distributed database cluster can perform the redistribution operation, split the document data according to the distribution rules, and then store it on the corresponding node. After this step is completed, the data increment operation is performed. The cluster manager sends an increment request to the resource manager corresponding to each old database table. After receiving the increment request, the resource manager obtains the logical transaction log, that is, the MySQL database. binlog file, the SQL combiner process parses the binlog file and merges and parses to obtain SQL statements. According to forming each SQL statement into a hash bucket file, the resource manager splits the hash bucket file into multiple SQL files, and then combines multiple new SQL statements. The generated SQL file is transferred to the corresponding new node, and the new node plays back the SQL file concurrently and returns the execution result.
值得一提的是,本实施例在binlog解析时,对于同一库表,相同主键的记录,即对同一库表,相同主键进行操作的SQL语句进行合并,减少输出到哈希桶的SQL数量。其中,哈系桶的存在使每个桶中SQL数量尽量均匀,从而使得后续的回放效率更高。其次,SQL合并使得在回放时可以在三个维度上更好去实现,从而提高回放效率。一、SQL合并后同一哈希桶中同一库表不同主键的SQL进行并发回放;二、SQL合并后同一哈希桶中不同库表的SQL进行并发回放;三、哈希桶之间的SQL并发回放。由此可知,通过SQL合并,可以将所有的SQL语句全部打散,每条SQL语句不再互相关联,做到批量并发回放,从而缩短回放时间,显著提高数据追增量的性能。It is worth mentioning that, during binlog parsing in this embodiment, the records of the same database table and the same primary key, that is, the SQL statements operating on the same database table and the same primary key, are combined to reduce the amount of SQL output to the hash bucket. Among them, the existence of the hash bucket makes the number of SQL in each bucket as uniform as possible, so that the subsequent playback efficiency is higher. Second, SQL merging enables better implementation in three dimensions during playback, thereby improving playback efficiency. 1. After the SQL is merged, the SQL of the same database table with different primary keys in the same hash bucket is played back concurrently; 2. After the SQL is merged, the SQL of different database tables in the same hash bucket is played back concurrently; 3. The SQL concurrently between the hash buckets playback. It can be seen that through SQL merging, all SQL statements can be broken up, each SQL statement is no longer related to each other, and batch concurrent playback can be achieved, thereby shortening the playback time and significantly improving the performance of data tracking.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步 骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The steps of the above various methods are divided only for the purpose of describing clearly. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as the same logical relationship is included, they are all within the protection scope of this patent. ;Adding insignificant modifications to the algorithm or process or introducing insignificant designs, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
本申请第三实施例涉及一种数据处理装置,包括:日志获取模块401,用于获取第一节点的逻辑事务日志;SQL语句获取模块402,用于根据所述逻辑事务日志获取SQL语句;SQL语句合并模块403,对将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的SQL语句。The third embodiment of the present application relates to a data processing device, including: a log acquisition module 401 for acquiring a logical transaction log of a first node; a SQL statement acquisition module 402 for acquiring SQL statements according to the logical transaction log; SQL The statement merging module 403 combines the SQL statements that meet the preset conditions, and generates a combined SQL statement for the second node to play back the combined SQL statement concurrently.
在一个例子中,SQL语句合并模块403中的满足预设条件的所述SQL语句为对同库表中的相同主键进行操作的所述SQL语句。In an example, the SQL statement that meets the preset condition in the SQL statement combining module 403 is the SQL statement that operates on the same primary key in the same database table.
在一个例子中,SQL语句合并模块403还用于确定满足所述预设条件的所述SQL语句在所述逻辑事务日志的执行时间;根据所述确定的执行时间将满足所述预设条件的所述SQL语句进行合并,得到合并后的SQL语句。In one example, the SQL statement merging module 403 is further configured to determine the execution time of the SQL statement that satisfies the preset condition in the logical transaction log; The SQL statements are combined to obtain a combined SQL statement.
在一个例子中,SQL语句合并模块403还用于根据满足所述预设条件的SQL语句和所述执行时间,确定所述合并后的SQL语句的动词和字段值;根据所述确定的动词和字段值,生成所述合并后的SQL语句。In one example, the SQL statement combining module 403 is further configured to determine the verb and field value of the combined SQL statement according to the SQL statement satisfying the preset condition and the execution time; according to the determined verb and field value, and generate the combined SQL statement.
在一个例子中,SQL语句合并模块403还用于根据所述合并后的SQL语句中的库表和主键值获取哈希值;根据所述哈希值确定用于存储所述合并后的SQL语句的哈希桶;根据所述哈希桶中的所述合并后的SQL语句确定回放所述合并后的SQL语句的第二节点;将所述合并后的SQL语句发送到所述第二节点。In an example, the SQL statement merging module 403 is further configured to obtain a hash value according to the library table and the primary key value in the merged SQL statement; and determine a hash value for storing the merged SQL according to the hash value a hash bucket of the statement; determine a second node that plays back the combined SQL statement according to the combined SQL statement in the hash bucket; send the combined SQL statement to the second node .
在一个例子中,SQL语句合并模块403还用于根据所述确定的第二节点对哈希桶文件进行拆分得到SQL文件,并将所述SQL文件发送到所述确定的第二节点;其中,所述哈希桶文件为包括所述哈希桶中所有合并后的SQL语句的文件,其中,所述SQL文件中的所述合并后的SQL语句用于同一个所述第二节点的回放。In an example, the SQL statement merging module 403 is further configured to split the hash bucket file according to the determined second node to obtain an SQL file, and send the SQL file to the determined second node; wherein , the hash bucket file is a file including all the merged SQL statements in the hash bucket, wherein the merged SQL statements in the SQL file are used for playback of the same second node .
在一个例子中,SQL语句合并模块403中所述并发回放包括:所述SQL文件之间的并发回放和所述SQL文件中的各SQL语句的并发回放。In an example, the concurrent playback in the SQL statement merging module 403 includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL file.
不难发现,本实施方式为与第一实施方式相对应的系统实施例,本实施方 式可与第一实施方式互相配合实施。第一实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在第一实施方式中。It is not difficult to find that this embodiment is a system example corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The relevant technical details mentioned in the first embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition. Correspondingly, the related technical details mentioned in this embodiment can also be applied to the first embodiment.
值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施方式中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that each module involved in this embodiment is a logical module. In practical applications, a logical unit may be a physical unit, a part of a physical unit, or multiple physical units. A composite implementation of the unit. In addition, in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
本申请第四实施方式涉及一种电子设备,如图5所示,包括至少一个处理器501;以及,与所述至少一个处理器通信连接的存储器502;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的数据处理方法。The fourth embodiment of the present application relates to an electronic device, as shown in FIG. 5 , comprising at least one processor 501 ; and a memory 502 communicatively connected to the at least one processor; wherein the memory stores data that can be Instructions executed by the at least one processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the above-mentioned data processing method.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,此外,天线还接收数据并将数据传送给处理器。The memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory. The bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides the interface between the bus and the transceiver. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted over the wireless medium through the antenna, and the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
本申请第五实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。The fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括: U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method for implementing the above embodiments can be completed by instructing the relevant hardware through a program, and the program is stored in a storage medium and includes several instructions to make a device ( It may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present application, and in practical applications, various changes can be made in form and details without departing from the spirit and the spirit of the present application. scope.

Claims (10)

  1. 一种数据处理方法,包括:A data processing method comprising:
    获取第一节点的逻辑事务日志;Get the logical transaction log of the first node;
    根据所述逻辑事务日志获取SQL语句;Obtain the SQL statement according to the logical transaction log;
    将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的SQL语句。The SQL statements satisfying the preset conditions are combined to generate a combined SQL statement for the second node to play back the combined SQL statement concurrently.
  2. 根据权利要求1所述的数据处理方法,其中,所述满足预设条件的所述SQL语句为对同库表中的相同主键进行操作的所述SQL语句。The data processing method according to claim 1, wherein the SQL statement that satisfies the preset condition is the SQL statement that operates on the same primary key in the same database table.
  3. 根据权利要求2所述的数据处理方法,其中,所述将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,包括:The data processing method according to claim 2, wherein, the SQL statements that meet the preset conditions are combined to generate a combined SQL statement, comprising:
    确定满足所述预设条件的所述SQL语句在所述逻辑事务日志的执行时间;Determine the execution time of the SQL statement that satisfies the preset condition in the logical transaction log;
    根据所述确定的执行时间将满足所述预设条件的所述SQL语句进行合并,得到合并后的SQL语句。The SQL statements that meet the preset conditions are combined according to the determined execution time to obtain combined SQL statements.
  4. 根据权利要求3所述的数据处理方法,其中,所述根据所述确定的执行时间将满足所述预设条件的所述SQL语句进行合并,包括:The data processing method according to claim 3, wherein the combining the SQL statements satisfying the preset condition according to the determined execution time comprises:
    根据满足所述预设条件的SQL语句和所述执行时间,确定所述合并后的SQL语句的动词和字段值;Determine the verb and field value of the combined SQL statement according to the SQL statement that satisfies the preset condition and the execution time;
    根据所述确定的动词和字段值,生成所述合并后的SQL语句。The combined SQL statement is generated according to the determined verb and field value.
  5. 根据权利要求1至4中任一项所述的数据处理方法,其中,所述将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句之后,还包括:The data processing method according to any one of claims 1 to 4, wherein after merging the SQL statements that meet the preset conditions to generate the merged SQL statement, the method further comprises:
    根据所述合并后的SQL语句中的库表和主键值获取哈希值;Obtain a hash value according to the library table and the primary key value in the merged SQL statement;
    根据所述哈希值确定用于存储所述合并后的SQL语句的哈希桶;Determine a hash bucket for storing the combined SQL statement according to the hash value;
    根据所述哈希桶中的所述合并后的SQL语句确定回放所述合并后的SQL语句的第二节点;Determine the second node that plays back the combined SQL statement according to the combined SQL statement in the hash bucket;
    将所述合并后的SQL语句发送到所述第二节点。Send the combined SQL statement to the second node.
  6. 根据权利要求5所述的数据处理方法,其中,所述将所述合并后的SQL语句发送到所述第二节点,包括:The data processing method according to claim 5, wherein the sending the combined SQL statement to the second node comprises:
    根据所述确定的第二节点对哈希桶文件进行拆分得到SQL文件,并将所述SQL文件发送到所述确定的第二节点;Split the hash bucket file according to the determined second node to obtain an SQL file, and send the SQL file to the determined second node;
    其中,所述哈希桶文件为包括所述哈希桶中所有合并后的SQL语句的文件,所述SQL文件中的所述合并后的SQL语句用于同一个所述第二节点的回放。The hash bucket file is a file including all the merged SQL statements in the hash bucket, and the merged SQL statements in the SQL file are used for playback on the same second node.
  7. 根据权利要求6所述的数据处理方法,其中,所述并发回放包括:所述SQL文件之间的并发回放和所述SQL文件中的各SQL语句的并发回放。The data processing method according to claim 6, wherein the concurrent playback comprises: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL file.
  8. 一种数据处理装置,包括:A data processing device, comprising:
    日志获取模块,用于获取第一节点的逻辑事务日志;a log acquisition module, used to acquire the logical transaction log of the first node;
    SQL语句获取模块,用于根据所述逻辑事务日志获取SQL语句;SQL statement acquisition module, for acquiring SQL statement according to the logical transaction log;
    SQL语句合并模块,对将满足预设条件的所述SQL语句进行合并,生成合并后的SQL语句,以供第二节点并发回放所述合并后的SQL语句。The SQL statement merging module combines the SQL statements that meet the preset conditions to generate the combined SQL statement for the second node to play back the combined SQL statement concurrently.
  9. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一项所述的数据处理方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1 to 7 the data processing method described.
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的数据处理方法。A computer-readable storage medium storing a computer program, when the computer program is executed by a processor, the data processing method according to any one of claims 1 to 7 is implemented.
PCT/CN2021/138821 2020-12-17 2021-12-16 Data processing method and apparatus, and electronic device and storage medium WO2022127866A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011498751.2 2020-12-17
CN202011498751.2A CN114647659A (en) 2020-12-17 2020-12-17 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022127866A1 true WO2022127866A1 (en) 2022-06-23

Family

ID=81990726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138821 WO2022127866A1 (en) 2020-12-17 2021-12-16 Data processing method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114647659A (en)
WO (1) WO2022127866A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644086A (en) * 2023-05-24 2023-08-25 上海沄熹科技有限公司 SST-based Insert SQL statement implementation method
CN117573730A (en) * 2024-01-16 2024-02-20 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1205853A1 (en) * 2000-11-08 2002-05-15 International Business Machines Corporation Reduced lock contention in SQL transactions
CN102591982A (en) * 2011-01-07 2012-07-18 赛门铁克公司 Method and system of performing incremental sql server database backups
CN109101627A (en) * 2018-08-14 2018-12-28 交通银行股份有限公司 heterogeneous database synchronization method and device
WO2019148713A1 (en) * 2018-01-30 2019-08-08 平安科技(深圳)有限公司 Sql statement processing method and apparatus, computer device, and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346775A (en) * 2011-09-26 2012-02-08 苏州博远容天信息科技有限公司 Method for synchronizing multiple heterogeneous source databases based on log
US10303654B2 (en) * 2015-02-23 2019-05-28 Futurewei Technologies, Inc. Hybrid data distribution in a massively parallel processing architecture
CN105955970A (en) * 2015-11-12 2016-09-21 中国银联股份有限公司 Log analysis-based database copying method and device
CN107169094B (en) * 2017-05-12 2020-10-13 北京小米移动软件有限公司 Information aggregation method and device
CN109408589B (en) * 2018-09-14 2020-08-14 新华三大数据技术有限公司 Data synchronization method and device
CN109271450B (en) * 2018-10-10 2020-12-04 北京百度网讯科技有限公司 Database synchronization method, device, server and storage medium
CN111767340B (en) * 2020-05-29 2024-01-05 中国工商银行股份有限公司 Data processing method, device, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1205853A1 (en) * 2000-11-08 2002-05-15 International Business Machines Corporation Reduced lock contention in SQL transactions
CN102591982A (en) * 2011-01-07 2012-07-18 赛门铁克公司 Method and system of performing incremental sql server database backups
WO2019148713A1 (en) * 2018-01-30 2019-08-08 平安科技(深圳)有限公司 Sql statement processing method and apparatus, computer device, and storage medium
CN109101627A (en) * 2018-08-14 2018-12-28 交通银行股份有限公司 heterogeneous database synchronization method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644086A (en) * 2023-05-24 2023-08-25 上海沄熹科技有限公司 SST-based Insert SQL statement implementation method
CN116644086B (en) * 2023-05-24 2024-02-20 上海沄熹科技有限公司 SST-based Insert SQL statement implementation method
CN117573730A (en) * 2024-01-16 2024-02-20 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product
CN117573730B (en) * 2024-01-16 2024-04-05 腾讯科技(深圳)有限公司 Data processing method, apparatus, device, readable storage medium, and program product

Also Published As

Publication number Publication date
CN114647659A (en) 2022-06-21

Similar Documents

Publication Publication Date Title
US11681684B2 (en) Client-driven commit of distributed write transactions in a database environment
WO2021249207A1 (en) Database transaction processing method and apparatus, and server and storage medium
US10691722B2 (en) Consistent query execution for big data analytics in a hybrid database
US9892153B2 (en) Detecting lost writes
US8527501B2 (en) Method, system, and program for combining and processing transactions
US11442961B2 (en) Active transaction list synchronization method and apparatus
WO2022127866A1 (en) Data processing method and apparatus, and electronic device and storage medium
US20090012932A1 (en) Method and System For Data Storage And Management
CN111581234B (en) RAC multi-node database query method, device and system
US10534797B2 (en) Synchronized updates across multiple database partitions
CN111597015A (en) Transaction processing method and device, computer equipment and storage medium
US11397714B2 (en) Database implementation for different application versions
US20230137119A1 (en) Method for replaying log on data node, data node, and system
CN116108057B (en) Distributed database access method, device, equipment and storage medium
KR20190063835A (en) System for processing real-time data modification of in-memory database
KR20200092095A (en) Transaction control method to synchronize DML statements in relational database to NoSQL database
CN106354732A (en) Offline data version conflict resolution method for supporting concurrent cooperation
US9047354B2 (en) Statement categorization and normalization
US8150865B2 (en) Techniques for coalescing subqueries
US9063773B2 (en) Automatic parallelism tuning for apply processes
JP4137366B2 (en) Database management method and database management apparatus
US11768853B2 (en) System to copy database client data
CN117827979B (en) Data batch import method and device, electronic equipment and storage medium
US11669535B1 (en) Maintaining at a target database system a copy of a source table of a source database system
US11468090B2 (en) Maintain constant load on global database after regionalization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 241023)