CN114647659A - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114647659A CN114647659A CN202011498751.2A CN202011498751A CN114647659A CN 114647659 A CN114647659 A CN 114647659A CN 202011498751 A CN202011498751 A CN 202011498751A CN 114647659 A CN114647659 A CN 114647659A
- Authority
- CN
- China
- Prior art keywords
- sql
- merged
- sql statement
- node
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010047571 Visual impairment Diseases 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2308—Concurrency control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application relates to the field of databases, in particular to a data processing method, a data processing device, electronic equipment and a storage medium. The method and the device for the data base node to track the SQL sentences acquire the logic transaction log of the first node, acquire the SQL sentences according to the logic transaction log, combine the SQL sentences meeting the preset conditions, generate the combined SQL sentences for the second node to play back the combined SQL sentences concurrently, reduce the number of output SQL sentences, reduce the number of the SQL sentences played back by the data base node, and increase the speed of the incremental tracking.
Description
Technical Field
The embodiment of the application relates to the field of databases, in particular to a data processing method, a data processing device, electronic equipment and a storage medium.
Background
In the data redistribution process, data of the old node is distributed to the corresponding new node according to the configured distribution rule, and in the process, the service still submits the data on the old node, so that the data in the period of time needs to be added to the new node, namely, the data is added. In the process of data redistribution later-stage incremental data tracing, the database cluster analyzes the log of the old node to obtain an SQL (Structured Query Language, abbreviated as SQL) statement, and the analyzed SQL statement is played back on the new node in the distributed database cluster to realize data incremental tracing.
However, when the incremental data is traced in the later period of data redistribution, the business concurrency is high, the pressure is high, the SQL playback efficiency in the related art is too low, and the speed of tracing the incremental data cannot catch up with the speed of writing the business data, so that the tracing increment fails.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data processing method, a data processing device, electronic equipment and a storage medium, which can improve the playback efficiency of SQL sentences, so that the success rate of tracing increments is improved.
In order to achieve the above object, an embodiment of the present application provides a data processing method, including: acquiring a logic transaction log of a first node; acquiring SQL statements according to the logic transaction log; and merging the SQL sentences meeting the preset conditions to generate merged SQL sentences for the second node to concurrently play back the merged SQL sentences.
To achieve the above object, an embodiment of the present application further provides a data processing apparatus, including: the log acquisition module is used for acquiring a logic transaction log from the first node; the SQL statement acquisition module is used for acquiring SQL statements according to the logic transaction log; and the SQL sentence merging module merges the SQL sentences meeting the preset conditions to generate merged SQL sentences for the second node to concurrently play back the merged SQL sentences.
In order to achieve the above object, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the data processing method described above.
In order to achieve the above object, an embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the data processing method described above.
The data processing method provided by the embodiment of the application obtains the logic transaction log of the first node, obtains the SQL sentences according to the logic transaction log, merges the SQL sentences meeting the preset condition, generates the merged SQL sentences, so that the second node can concurrently play back the merged SQL sentences, reduces the number of the output SQL sentences, reduces the number of the SQL sentences played back by the database node, and increases the speed of tracing the increment.
Drawings
Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a data processing method according to a second embodiment of the present invention;
FIG. 3 is a schematic illustration of a chase amount in a second embodiment according to the present invention;
fig. 4 is a flowchart of a data processing apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device in a fourth embodiment according to the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present application, and the embodiments may be mutually incorporated and referred to without contradiction.
The first embodiment of the present invention relates to a data processing method that can be applied to electronic devices such as servers. The embodiment comprises the following steps: acquiring a logic transaction log of a first node; acquiring SQL statements according to the logic transaction log; and merging the SQL sentences meeting the preset conditions to generate merged SQL sentences for the second node to concurrently play back the merged SQL sentences. In the embodiment, the quantity of the SQL sentences output to the new node in the incremental pursuit process is reduced by combining the SQL sentences, so that the quantity of the SQL sentences operated by the database nodes is reduced, the coupling relation among the SQL sentences is reduced by combining the SQL sentences, the playback efficiency of the SQL sentences is increased, and the incremental pursuit success rate is improved.
In one example, a cluster manager receives a redistribution request of an upper layer service, carries out full export on data needing to be redistributed, namely, splits the file needing to be redistributed according to the number of rows and the number of distribution batches of a split file configured in the cluster manager, verifies distribution fields and column data of the distribution fields in the split file, constructs a calculation object on the column data with the distribution fields and the verification being correct, calculates a target distribution node by using a distribution algorithm, writes the target distribution node into a corresponding node, namely, a fragment data cache, and sends the split file when the number of the split file reaches one distribution batch number, so that the data of an old node is imported into new nodes configured.
The cluster manager may receive a cluster-related request of an upper layer service, such as the Data redistribution request, manage the distributed cluster, coordinate a database (Data Base, abbreviated as DB) status report of the resource manager, and notify the resource manager of commands such as switching, backup, redistribution, and the like. The resource manager, typically an upper-level agent of the database, is a local database monitor that performs complex operations on the database in response to upper-level requests. In the embodiment, the main function of the method is to respond to the redistribution request of the cluster manager, execute the redistribution flow, and split SQL according to the distribution rule and connect DB playback SQL in the incremental tracing process. The first node and the second node are both database nodes and are basic nodes for storing data.
In the process of data redistribution, the service still submits data on the old node, so that the data in the period of time is added to the new node.
A flowchart of the data processing method of the present embodiment is shown in fig. 1.
Illustratively, the resource manager obtains a logical transaction log binlog of the first node.
And 102, acquiring the SQL statement according to the logic transaction log.
Illustratively, the logical transaction log binlog may be parsed by the SQL merger, which first caches each parsed SQL statement in the memory of the server. The SQL merger may be understood as a process, and each database node under the resource manager corresponds to an SQL merger, that is, the SQL merger is a process that parses a logical transaction log in a first node and merges SQL during an incremental tracing process of the first node.
And 103, merging the SQL sentences meeting the preset conditions to generate merged SQL sentences for the second node to concurrently play back the merged SQL sentences.
In an example, the SQL statement that satisfies the preset condition is the SQL statement that operates on the same primary key in the same library table. In the implementation, the SQL sentences which operate on the same main key of the same base table are merged, and the main key identifies a row of data, so that the merged sentences operate on the data of different rows, namely, the row operations of the SQL sentences are not related to each other any more, the strong coupling relation of the original SQL sentences is further reduced, the batch concurrent playback is convenient to achieve, the playback time is shortened, and the performance of data increment is improved.
Illustratively, each newly generated SQL statement in the logic transaction log is cached in the memory of the server by the SQL combiner, whether the SQL statement in the memory, which is the same as the newly generated SQL statement, exists or not, is the SQL statement that operates the same main key in the same library table, if the SQL statement exists, the two SQL statements that operate the same main key in the same library table are combined, and the combined SQL statement is placed in the memory, so that the SQL statement analyzed next time is combined according to the SQL statement stored in the memory. For example, SQL statements s1, s2, s3 exist in the memory, the SQL statement currently parsed from the logical transaction log is s4, s4 is put into the memory, and s1, s2, s3 is searched for whether an SQL statement capable of being merged with s4 exists, if so, for example, s4 can be merged with s1, s4 and s1 are merged to generate s5, s5 is also put into the memory, and at this time, the SQL statements stored in the memory are s2, s3, s 5.
In an example, the execution time of the SQL statement meeting the preset condition in the logical transaction log is determined, and the SQL statement meeting the preset condition is merged according to the determined execution time to obtain a merged SQL statement. For example, the execution time of the SQL statements operating on the same table and the main key is determined in the logical transaction log, and if the execution time of the first SQL statement is before and the execution time of the second SQL statement is after, the execution effect of the merged SQL statement is the same as the execution effect of the first SQL statement and the second SQL statement in the time sequence. In the implementation, merging is performed according to the execution time of the SQL statements in the redistribution and increment process, so that the relevance of each SQL statement in the execution sequence is reduced, that is, the original ordered SQL becomes unordered, the relation of the original SQL in the execution steps is removed, the strong coupling relation of the SQL statements is further reduced, parallel playback of the playback SQL statements is facilitated, and a space is created for improving the SQL playback efficiency.
In one example, according to the SQL statement meeting the preset condition and the execution time, determining a verb and a field value of the merged SQL statement; and generating the merged SQL statement according to the determined verb and the field value. In the implementation, the SQL sentences are merged according to the verbs and the execution time of the SQL sentences, so that the execution effect of the merged SQL sentences and the execution effect of the SQL sentences executed according to the execution time are the same. The SQL statement's verbs include INSERT INSERT, UPDATE UPDATE, DELETE DELETE.
Exemplarily, taking two SQL statements meeting the preset condition as a first SQL statement and a second SQL statement respectively as an example, the following steps are performed:
if the verb of the first SQL statement is inserted into INSERT, the verb of the second SQL statement is updated UPDATE, and the execution time of the first statement in the logic transaction log is earlier than that of the second SQL statement, the verb of the merged SQL statement is INSERT;
if the verb of the first SQL statement is UPDATE UPDATE, the verb of the second SQL statement is UPDATE UPDATE, and the execution time of the first statement in the logic transaction log is earlier than that of the second SQL statement, the verb of the merged SQL statement is UPDATE UPDATE;
if the verb of the first SQL statement is UPDATE UPDATE, the verb of the second SQL statement is DELETE DELETE, and the execution time of the first statement in the logic transaction log is earlier than that of the second SQL statement, the verb of the combined SQL statement is DELETE DELETE;
if the verb of the first SQL statement is DELETE DELETE, the verb of the second SQL statement is INSERT INSERT, and the execution time of the first statement in the logic transaction log is earlier than that of the second SQL statement, the verb of the merged SQL statement is UPDATE UPDATE.
It should be noted that, if the verb of the first SQL statement is INSERT, the verb of the second SQL statement is DELETE, and the execution time of the first statement in the logical transaction log is earlier than that of the second SQL statement, the result after the first SQL statement and the second SQL statement are merged is that no SQL statement is generated.
Taking table tb1 in database db1 as an example below, db1.tb1 has only three fields, a, b, c; and the three fields are all int types, namely integer types, a is a main key, and the execution time of the first SQL statement is earlier than that of the second SQL statement.
A first SQL statement: insettintodb1.tb1values (1,2,3), that is, a piece of data of a ═ 1, b ═ 2, and c ═ 3 is inserted into db1.tb1; the second SQL statement: UPDATEdb1.tb1SETa 4, b 5, c 6WHERE a 1, i.e. in the row of db1.tb1WHERE the primary bond a 1, the values of a, b, c are updated to a 4, b 5, c 6; the merged SQL statement retains the latest data information, and the merged SQL statement is as follows: insettintodb1.tb1values (4, 5, 6), i.e., db1.tb1 is inserted with a piece of data of a ═ 4, b ═ 5, and c ═ 6.
A first SQL statement: UPDATE db1.tb1SET a ═ 4, b ═ 5, and c ═ 6WHERE a ═ 1, that is, the values of a, b, and c are updated to a ═ 4, b ═ 5, and c ═ 6 in the row WHERE the primary key a ═ 1 in db 1.tb1; the second SQL statement: UPDATE db1.tb1SET a-7, b-8, c-9 WHERE a-4, i.e. in db1.tb1, the values of a, b, c in the row with primary key a-4 are updated to a-7, b-8, c-9; the merged SQL statement: UPDATE db1.tb1SET a-7, b-8, c-9 WHERE a-1, i.e. in db1.tb1, the values of a, b, c in the row with the primary key a-1 are updated to a-7, b-8, c-9, and the merged SQL statement retains the column value of the before-image of the first SQL statement, i.e. a-1 after WHERE, and the column value of the after-image of the second SQL statement, i.e. a-7 after SET, b-8, c-9.
A first SQL statement: UPDATE db1.tb1SET a ═ 4, b ═ 5, and c ═ 6WHERE a ═ 1, that is, the values of a, b, and c are updated to a ═ 4, b ═ 5, and c ═ 6 in the row WHERE the primary key a ═ 1 in db 1.tb1; the second SQL statement: DELETE FROM db1.tb1where a is 1, namely, DELETE row data with primary key a being 1 in db 1.tb1; the merged SQL statement is: DELETE FROM db1.tb1WHERE a is 1, that is, DELETE the line data with the primary key a being 1 in db1.tb1, the value of a after WHERE in the merged SQL statement is 1 is the value after WHERE in the first SQL statement, that is, the value of the merged SQL statement is the value in before form of the first SQL statement.
A first SQL statement: DELETE FROM db1.tb1WHERE a ═ 1, i.e. DELETE row data with primary key a ═ 1 in db 1.tb1; the second SQL statement: INSERT intort INTOs db1.tb1VALUES (4, 5, 6), that is, INSERT a piece of data of a ═ 4, b ═ 5, c ═ 6 in db 1.tb1; the merged SQL statement is: UPDATE db1.tb1SET a is 4, b is 5, c is 6WHERE a is 1, the before form of the merged SQL statement comes from the first SQL statement, and the after form comes from the second SQL statement.
It should be noted that, if the first SQL statement: INSERT intot db1.tb1VALUE (1,2,3), that is, INSERT a piece of data of a ═ 1, b ═ 2, c ═ 3 in db 1.tb1; the second SQL statement: DELETE FROM db1.tb1WHERE a ═ 1, i.e. DELETE the line data with primary key a ═ 1 in db1.tb1, do not generate any SQL statement after merging.
Through the step 103, the SQL statements meeting the preset conditions are merged to generate a merged SQL statement, so that the second node can concurrently play back the merged SQL statement.
It can be understood that, in this embodiment, the SQL statement meeting the preset condition is taken as an example of the SQL statement that operates the same main key in the same library table, and in practical applications, the SQL statement meeting the preset condition may also be an SQL statement that operates different main keys in the same library table, for example, the SQL statement that operates the same field may also be used. Illustratively, if the first SQL statement changes the value of the field a in the first row of the table to 2, and the second SQL statement changes the value of the field a in the second row of the table to 3, the first SQL statement and the second SQL statement may be merged, and the merged SQL statement may change the value of the field a in the first row to 2 and the value of the field a in the second row to 3. Two SQL sentences are changed into one, the quantity of the SQL sentences output to a new node, namely the second node, is reduced, the playback efficiency is improved, the other two SQL sentences operate the same column and the same field in the table, the SQL sentences operated in the same field are combined, the coupling relation of the SQL sentences on the field is reduced, and the execution concurrence is facilitated.
In this embodiment, a logic transaction log of a first node is obtained, SQL statements are obtained according to the logic transaction log, the SQL statements meeting preset conditions are merged to generate merged SQL statements, so that a second node concurrently plays back the merged SQL statements, the number of output SQL statements is reduced, and the number of the SQL statements played back by the database node is reduced, thereby increasing the speed of tracing the increment.
A second embodiment of the present invention relates to a data processing method, and the present embodiment is substantially the same as the first embodiment except that: merging the SQL sentences meeting the preset conditions, and generating merged SQL sentences, which comprises the following steps: acquiring a hash value according to a base table and a primary key value in the combined SQL statement; and determining a hash bucket for storing the merged SQL statement according to the hash value.
A flow chart of a data processing method according to a second embodiment of the present invention is shown in fig. 2.
And 203, merging the SQL sentences meeting the preset conditions to generate merged SQL sentences.
And 204, determining a hash bucket for storing the merged SQL statement according to the library table and the primary key value in the merged SQL statement.
In one example, a hash value is obtained according to a base table and a primary key value in the merged SQL statement; and determining a hash bucket for storing the merged SQL statement according to the hash value. The SQL sentences are stored by the data structure of the hash bucket, so that the quantity of the SQL sentences of each hash bucket file is as uniform as possible, and the playback efficiency is improved.
Illustratively, a base table and a primary key value in the merged SQL statement are input into a hash function, a hash value is determined according to the hash function, and a hash bucket is determined according to the hash value.
And step 206, splitting the hash bucket file according to the determined second node to obtain the SQL file. The hash bucket file is a file including all merged SQL statements in the hash bucket. In the implementation, the statements operating on the same node in the hash bucket file are combined into one SQL file, so that the sending times of the SQL statements are reduced, and the playback efficiency is further improved.
In one example, concurrent playback includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL files. In the realization, the playback efficiency is improved and the success rate of the tracking increment is increased through concurrent playback.
Illustratively, the resource manager transforms the hash bucket file generated by the SQL combiner according to the distribution rule, calculates a second node corresponding to each SQL statement in the hash bucket file, and obtains the SQL file, where the SQL file is used for playback of the same node, and is connected to the remote DB for playback, and during playback, different SQL files are concurrently played back, and SQL statements in the same SQL file are concurrently played back.
And after the playback is finished, the resource manager returns the playback result to the cluster manager, and after the cluster manager receives the reply, the cluster manager continues a new round of increment chasing operation until the increment chasing time of a certain round is smaller than the increment chasing threshold.
A schematic diagram of the amount of chasing of this embodiment is shown in fig. 3.
The cluster manager obtains a redistribution request, and conducts total export on data needing to be redistributed, namely, the file which is conducted with total export is split according to the line number and the distribution batch number of the split file configured in the cluster manager, the distribution field and the column data of the distribution field in the file which is obtained through splitting are verified, a calculation object is constructed on the column data which is obtained through the correct verification of the distribution field, a distribution algorithm is used for calculating a target distribution node, the target distribution node is written into a corresponding node, namely, a fragment data cache, and the split file is sent when the number of the split file reaches one transmission batch number, so that the data of an old node is imported into each new node configured. The cluster manager initiates an increment tracing process, firstly, the cluster manager queries and records the current logic transaction log position of each new node, then sending an increment tracing request to a resource controller of the old node, scanning the logic transaction log by the resource controller according to the backed-up position, informing an SQL statement merger to analyze the logic transaction log and merge SQL statements, generating a hash bucket file which takes the hash bucket as a data structure to organize data according to the merged SQL statement, wherein the hash bucket file is the redoSQL file in the figure 3, the resource manager splits the SQL statement in the hash bucket file according to a distribution key, each hash bucket file is divided into a plurality of files, each SQL file is transmitted to a determined node, a resource manager is remotely connected with DB database nodes, the DB realizes concurrence among the SQL files, each SQL statement is executed in the SQL file in a concurrent mode, and an execution result is returned after execution.
Taking the document management system of MySQL distributed cluster data as an example, in the system, assuming that each document is stored in a document type sub-table, if a certain type of document is increased, resulting in a large amount of data borne by the database table of the type of document, the distributed database cluster can execute redistribution operation, and according to the distribution rule, the document data is split and then stored in the corresponding node. After the step is completed, data increment tracing operation is carried out, the cluster manager sends an increment tracing request to the resource manager corresponding to each old base table, after the resource manager receives the increment tracing request, a logic transaction log, namely a binlog file of MySQL is obtained, the binlog file is analyzed by the SQL merger process and merged to obtain SQL sentences, a hash bucket file is formed according to each SQL sentence, the hash bucket file is split into a plurality of SQL files by the resource manager, then the plurality of newly generated SQL files are transmitted to corresponding new nodes, and the SQL files are played back by the new nodes in parallel and execution results are returned.
It is worth mentioning that, in the embodiment, when binlog is analyzed, records of the same primary key in the same library table, that is, SQL statements operating on the same library table and the same primary key are merged, so that the number of SQL output to the hash bucket is reduced. The existence of the hayes buckets enables the quantity of SQL in each bucket to be uniform as much as possible, and therefore subsequent playback efficiency is higher. Secondly, SQL merging enables the playback to be better achieved in three dimensions, and therefore playback efficiency is improved. Firstly, SQL of different main keys of the same base table in the same hash bucket is played back concurrently after SQL is merged; secondly, after SQL is merged, SQL of different base tables in the same hash bucket is played back concurrently; and thirdly, SQL concurrent playback between the hash buckets. Therefore, all SQL sentences can be scattered completely through SQL combination, each SQL sentence is not related to each other any more, and batch concurrent playback is achieved, so that the playback time is shortened, and the performance of data increment is improved remarkably.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are within the scope of the present patent; it is within the scope of this patent to add insignificant modifications or introduce insignificant designs to the algorithms or processes, but not to change the core designs of the algorithms and processes.
A third embodiment of the present invention relates to a data processing apparatus including: a log obtaining module 401, configured to obtain a logical transaction log of a first node; an SQL statement obtaining module 402, configured to obtain an SQL statement according to the logic transaction log; and the SQL statement merging module 403 is configured to merge the SQL statements meeting the preset condition to generate a merged SQL statement, so that the second node concurrently plays back the merged SQL statement.
In one example, the SQL statement satisfying the preset condition in the SQL statement merging module 403 is the SQL statement that operates on the same main key in the same library table.
In one example, the SQL statement merge module 403 is further configured to determine an execution time of the SQL statement satisfying the preset condition in the logical transaction log; and merging the SQL sentences meeting the preset conditions according to the determined execution time to obtain merged SQL sentences.
In an example, the SQL statement merging module 403 is further configured to determine a verb and a field value of the merged SQL statement according to the SQL statement that meets the preset condition and the execution time; and generating the merged SQL statement according to the determined verb and the field value.
In an example, the SQL statement merge module 403 is further configured to obtain a hash value according to a library table and a primary key value in the merged SQL statement; determining a hash bucket for storing the merged SQL statement according to the hash value; determining a second node for playing back the merged SQL statement according to the merged SQL statement in the hash bucket; and sending the merged SQL statement to the second node.
In an example, the SQL statement merge module 403 is further configured to split the hash bucket file according to the determined second node to obtain an SQL file, and send the SQL file to the determined second node; the hash bucket file is a file including all the merged SQL statements in the hash bucket, wherein the merged SQL statements in the SQL file are used for playback of the same second node.
In one example, the concurrent playback in the SQL statement merge module 403 includes: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL files.
It should be understood that this embodiment is a system example corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fourth embodiment of the invention relates to an electronic device, as shown in fig. 5, comprising at least one processor 501; and a memory 502 communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the data processing method described above.
Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, etc., which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.
Claims (10)
1. A data processing method, comprising:
acquiring a logic transaction log of a first node;
acquiring SQL statements according to the logic transaction log;
and merging the SQL sentences meeting the preset conditions to generate merged SQL sentences for the second node to concurrently play back the merged SQL sentences.
2. The data processing method according to claim 1, wherein the SQL statement that satisfies the preset condition is the SQL statement that operates on the same primary key in the same library table.
3. The data processing method according to claim 2, wherein the merging the SQL statements that satisfy the preset condition to generate a merged SQL statement comprises:
determining the execution time of the SQL statement meeting the preset condition in the logic transaction log;
and merging the SQL sentences meeting the preset conditions according to the determined execution time to obtain merged SQL sentences.
4. The data processing method according to claim 3, wherein said merging the SQL statements that satisfy the preset condition according to the determined execution time comprises:
determining verbs and field values of the combined SQL sentences according to the SQL sentences meeting the preset conditions and the execution time;
and generating the merged SQL statement according to the determined verb and the field value.
5. The data processing method according to any one of claims 1 to 4, wherein after the merging the SQL statements that satisfy the preset condition and generating the merged SQL statement, the method further comprises:
obtaining a hash value according to the base table and the primary key value in the merged SQL statement;
determining a hash bucket for storing the merged SQL statement according to the hash value;
determining a second node for replaying the merged SQL statement according to the merged SQL statement in the hash bucket;
and sending the merged SQL statement to the second node.
6. The data processing method of claim 5, wherein sending the merged SQL statement to the second node comprises:
splitting the hash bucket file according to the determined second node to obtain an SQL file, and sending the SQL file to the determined second node;
the hash bucket file is a file including all merged SQL statements in the hash bucket, and the merged SQL statements in the SQL file are used for playback of the same second node.
7. The data processing method of claim 6, wherein the concurrent playback comprises: concurrent playback between the SQL files and concurrent playback of each SQL statement in the SQL files.
8. A data processing apparatus, comprising:
the log acquisition module is used for acquiring a logic transaction log of the first node;
the SQL statement acquisition module is used for acquiring SQL statements according to the logic transaction log;
and the SQL sentence merging module is used for merging the SQL sentences meeting the preset conditions to generate merged SQL sentences so that the second node can concurrently play back the merged SQL sentences.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011498751.2A CN114647659A (en) | 2020-12-17 | 2020-12-17 | Data processing method and device, electronic equipment and storage medium |
PCT/CN2021/138821 WO2022127866A1 (en) | 2020-12-17 | 2021-12-16 | Data processing method and apparatus, and electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011498751.2A CN114647659A (en) | 2020-12-17 | 2020-12-17 | Data processing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114647659A true CN114647659A (en) | 2022-06-21 |
Family
ID=81990726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011498751.2A Pending CN114647659A (en) | 2020-12-17 | 2020-12-17 | Data processing method and device, electronic equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114647659A (en) |
WO (1) | WO2022127866A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116644086B (en) * | 2023-05-24 | 2024-02-20 | 上海沄熹科技有限公司 | SST-based Insert SQL statement implementation method |
CN117573730B (en) * | 2024-01-16 | 2024-04-05 | 腾讯科技(深圳)有限公司 | Data processing method, apparatus, device, readable storage medium, and program product |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346775A (en) * | 2011-09-26 | 2012-02-08 | 苏州博远容天信息科技有限公司 | Method for synchronizing multiple heterogeneous source databases based on log |
CN105955970A (en) * | 2015-11-12 | 2016-09-21 | 中国银联股份有限公司 | Log analysis-based database copying method and device |
CN107169094A (en) * | 2017-05-12 | 2017-09-15 | 北京小米移动软件有限公司 | information aggregation method and device |
CN107251023A (en) * | 2015-02-23 | 2017-10-13 | 华为技术有限公司 | A kind of blended data distribution in MPP framework |
CN109101627A (en) * | 2018-08-14 | 2018-12-28 | 交通银行股份有限公司 | heterogeneous database synchronization method and device |
CN109271450A (en) * | 2018-10-10 | 2019-01-25 | 北京百度网讯科技有限公司 | Database synchronization method, device, server and storage medium |
CN109408589A (en) * | 2018-09-14 | 2019-03-01 | 新华三大数据技术有限公司 | Method of data synchronization and device |
CN111767340A (en) * | 2020-05-29 | 2020-10-13 | 中国工商银行股份有限公司 | Data processing method, device, electronic equipment and medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1205853B1 (en) * | 2000-11-08 | 2006-11-22 | International Business Machines Corporation | Reduced lock contention in SQL transactions |
US8635187B2 (en) * | 2011-01-07 | 2014-01-21 | Symantec Corporation | Method and system of performing incremental SQL server database backups |
CN108197306B (en) * | 2018-01-30 | 2020-08-25 | 平安科技(深圳)有限公司 | SQL statement processing method and device, computer equipment and storage medium |
-
2020
- 2020-12-17 CN CN202011498751.2A patent/CN114647659A/en active Pending
-
2021
- 2021-12-16 WO PCT/CN2021/138821 patent/WO2022127866A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102346775A (en) * | 2011-09-26 | 2012-02-08 | 苏州博远容天信息科技有限公司 | Method for synchronizing multiple heterogeneous source databases based on log |
CN107251023A (en) * | 2015-02-23 | 2017-10-13 | 华为技术有限公司 | A kind of blended data distribution in MPP framework |
CN105955970A (en) * | 2015-11-12 | 2016-09-21 | 中国银联股份有限公司 | Log analysis-based database copying method and device |
CN107169094A (en) * | 2017-05-12 | 2017-09-15 | 北京小米移动软件有限公司 | information aggregation method and device |
CN109101627A (en) * | 2018-08-14 | 2018-12-28 | 交通银行股份有限公司 | heterogeneous database synchronization method and device |
CN109408589A (en) * | 2018-09-14 | 2019-03-01 | 新华三大数据技术有限公司 | Method of data synchronization and device |
CN109271450A (en) * | 2018-10-10 | 2019-01-25 | 北京百度网讯科技有限公司 | Database synchronization method, device, server and storage medium |
CN111767340A (en) * | 2020-05-29 | 2020-10-13 | 中国工商银行股份有限公司 | Data processing method, device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022127866A1 (en) | 2022-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10691722B2 (en) | Consistent query execution for big data analytics in a hybrid database | |
JP4856627B2 (en) | Partial query caching | |
CN103019960B (en) | Distributed caching method and system | |
US10929397B2 (en) | Forecasting query access plan obsolescence | |
US8370326B2 (en) | System and method for parallel computation of frequency histograms on joined tables | |
US10467192B2 (en) | Method and apparatus for updating data table in keyvalue database | |
WO2022127866A1 (en) | Data processing method and apparatus, and electronic device and storage medium | |
US9418154B2 (en) | Push-model based index updating | |
KR20190063835A (en) | System for processing real-time data modification of in-memory database | |
CN111309761A (en) | Method, system, equipment and medium for optimizing database middleware query | |
US20160092134A1 (en) | Scalable, multi-dimensional search for optimal configuration | |
US9594784B2 (en) | Push-model based index deletion | |
US8150865B2 (en) | Techniques for coalescing subqueries | |
CN109844723B (en) | Method and system for master control establishment using service-based statistics | |
CN112527900A (en) | Method, device, equipment and medium for database multi-copy reading consistency | |
CN115114325A (en) | Data query method and device, electronic equipment and storage medium | |
US10353920B2 (en) | Efficient mirror data re-sync | |
US11762868B2 (en) | Metadata management for a transactional storage system | |
US20240095246A1 (en) | Data query method and apparatus based on doris, storage medium and device | |
CN114519049A (en) | Data processing method and device | |
CN117390024A (en) | Data query method and device | |
CN118260308A (en) | Data processing method and device and electronic equipment | |
CN117270782A (en) | Data reading and writing method and device and electronic equipment | |
CN117763204A (en) | Index updating method and device for vector data and related equipment | |
CN113886387A (en) | Distributed data storage and query method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220621 |