Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus that enables efficient synchronization of data changes in a database, reduces the cost of updating an index, and improves performance and efficiency.
According to a first aspect, there is provided a method of synchronizing change information in a database, comprising:
receiving a first data change message from a database, the first data change message indicating a first data item in a first data table that has been changed;
updating a record corresponding to the first data item in a first mirror image table corresponding to the first data table according to the first data change message;
determining an upper level table associated with the first mirrored table, the upper level table including a foreign key associated with a primary key of the first mirrored table;
and reconnecting the upper table and the lower table associated with the upper table to generate an updated wide table, wherein the primary key of the lower table is associated with the foreign key of the upper table.
In one possible design, the step of determining an upper level table associated with the data table includes:
determining a primary key of the first mirror table; and determining a first main table as the upper table, wherein the first main table takes a main key of the first mirror image table as a foreign key.
In a further design, the determining an upper level table associated with the data table further comprises:
determining a primary key of the first primary table; and determining a second main table as the upper table, wherein the second main table takes the main key of the first main table as a foreign key.
In one possible design, the step of reconnecting the upper table and the lower table associated with the upper table includes:
determining a lower table associated with the upper table; and connecting the upper table with the lower table.
In further designs, the upper table includes a first foreign key, and the determining the lower table associated with the upper table includes: determining a first auxiliary table as the lower table, the primary key of the first auxiliary table being equal to the first foreign key.
In a further design, the first auxiliary table includes a second foreign key, and the determining the lower table associated with the upper table further includes: determining a second auxiliary table as the lower table, the primary key of the second auxiliary table being equal to the second foreign key.
In one possible design, concatenating the upper level table with the lower level table includes: and adopting one of internal connection, external connection and natural connection to connect the superior table and the inferior table.
In one design, the upper level table and the lower level table are mirror image tables of corresponding data tables in the database.
In one possible design, the method further includes: receiving a second data change message within a predetermined time interval, the second data change message indicating a second data item in a second data table that has been changed, the second data table having a same primary key as the first data table;
routing the second data change message to a thread for processing the first data change message.
In one possible design, the method further includes: and sending the updated broad table to a search engine for updating the data index.
According to a second aspect, there is provided an apparatus for synchronizing change information in a database, comprising:
a receiving unit configured to receive a first data change message from a database, the first data change message indicating a first data item in a first data table in which a change has occurred;
an updating unit configured to update a record corresponding to the first data item in a first mirror table corresponding to the first data table according to the first data change message;
an upper table determination unit configured to determine an upper table associated with the first mirror table, the upper table including a foreign key associated with a primary key of the first mirror table;
a multi-join unit configured to re-join the upper table and a lower table associated with the upper table, wherein a primary key of the lower table is associated with a foreign key of the upper table, to generate an updated wide table.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.
By the method and the device provided by the embodiment of the specification, once data change occurs in the database, the modified data table is updated in the mirror space, and the associated data table is determined for reconnection. Therefore, the query pressure and the performance pressure brought by frequently querying the original data table to the original database are avoided, the updating of the wide table is only based on the data table associated with modification, the reconnection process is not limited by formats and connection logics of different databases, and the method has better universality and better performance.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in fig. 1, a large number of data tables are stored in the database storage platform 100. Generally, the database storage platform is a distributed storage system, such as HDFS, and the data tables may be stored in different physical storage spaces. Also, each data table may be located in a database of a different format, such as MySQL, Oracle, Oceanbase, etc. To deal with these data table changes, a mirror space 200 is created for the data tables in the database, in which the mirror tables of the respective data tables are stored. In one embodiment, the mirror space 200 may also be implemented using a distributed database, such as HBase (Hadoop database). The computing platform 300 maintains synchronization of the mirror table in the mirror space 200 with the data table in the storage platform 100, and performs join (join) of the data tables based on the mirror table in the mirror space 200, and provides the generated wide table to the search engine 400. Specifically, when a data table in the storage platform 100 is changed, a change message is sent. After obtaining such change information, the computing platform 300 first modifies the mirror table in the mirror space accordingly, and then diffuses the modification to the associated upper table in the mirror space, and invokes the multi-connection engine for the associated upper table. The multi-join engine may join the upper table and the lower table in a multi-dimensional and multi-level manner, thereby generating an updated wide table. The updated broad table may be provided to a search engine for updating the data index. By the mode, a large amount of queries on the original data table are avoided, and the performance is improved; and the connection process of the data table is not restricted by the format of the database where the original data table is located, so that the method has better universality. The following describes a specific implementation of the computing platform.
FIG. 2 illustrates a flow diagram of a method of synchronizing database change information, according to one embodiment. The method may be performed by any device, apparatus, or system having computing and processing capabilities, such as computing platform 300 shown in FIG. 1. As shown in fig. 2, the method comprises the steps of: step 21, receiving a data change message from a database, wherein the data change message indicates a specific data item in a specific data table which is changed; step 22, according to the data change message, updating the record corresponding to the specific data item in the specific mirror image table corresponding to the specific data table; step 23, determining an upper table associated with the specific mirror table, wherein the upper table contains a foreign key associated with a primary key of the specific mirror table; and 24, reconnecting the upper table and the lower table related to the upper table to generate an updated wide table, wherein the primary key of the lower table is related to the external key of the upper table. The execution of the above respective steps is specifically described below.
As previously described, when a data table stored in a database storage platform (e.g., storage platform 100 of FIG. 1) is changed, the database may issue a change message. Accordingly, at step 21, a data change message is received from the database. Typically, a data change message corresponds to a modified record in a data table. More specifically, in one example, the data change message may contain the following field contents: the name of the database where the change occurred, the name of the data table where the change occurred, the location of the data item where the change occurred, such as the field (column) and record (row) of the change. In other words, the data change message may indicate which data item in which data table in which database has been changed. The data table indicated by the received data change message is referred to herein as a particular data table, and the modified data item is referred to herein as a particular data item.
Upon receiving the data change message at step 21, at step 22, the record corresponding to the particular data item is updated in the particular mirror table corresponding to the particular data table according to the received data change message. In this step, the contents of the mirror table in the mirror space are kept in synchronization with the contents of the corresponding data table. Specifically, in step 22, the changed database name, the changed table name, and the changed data item position are first extracted from the data change message, a specific mirror table corresponding to the changed specific data table is determined in the mirror space according to the changed database name and the changed table name, and then the corresponding record in the specific mirror table is updated according to the changed specific data item position. Therefore, once the content of the data table in the original database is changed, the content of the mirror image table in the mirror image space is modified, and the mirror image table is kept synchronous and consistent with the original data table.
Fig. 3A and 3B show an example of mirror table modification. Fig. 3A shows a customer information table a comprising two fields, n records. Assume that the received data change message indicates that the second field (client nickname) of the second record in data table a in database AAA is changed to "alice" at step 21. Fig. 3A schematically shows this modification. Then, correspondingly, in step 22, the mirror table a' corresponding to the data table a in the mirror space is updated, and the corresponding record is modified. Fig. 3B schematically shows the updated mirror table a'. It can be seen that the mirror table A' is consistent with the original data table A.
Next, at step 23, an upper level table associated with the particular mirror table is determined, the upper level table including a foreign key associated with the primary key of the particular mirror table.
As known to those skilled in the art, each data table may be assigned a column of fields as primary keys (primary keys) and other fields as foreign keys, depending on the data structure of the data table. In general, a primary key is an attribute field that uniquely identifies a record. For example, assuming that the student score table contains three fields of a school number, a student name and a score level (five of score levels a-E, etc.), then it is possible that the student name and the student score both have equal values, such as equal first name and last name or level, and the school number can uniquely identify a student or a record, then the school number is determined to be the primary key. The other fields than the primary key are foreign keys. A data table can only have one primary key, but can have multiple foreign keys. Foreign keys may be used to associate and connect with other tables. According to the setting of the primary key and the foreign key in each data table, the association relationship between the primary table and the secondary table can be considered to exist between the data tables. Specifically, tables 1 and 2 each have their primary keys and foreign keys, and assuming that the foreign key of table 1 is the primary key of table 2, table 1 will be referred to herein as the primary table of table 2 and table 2 will be referred to herein as the secondary table of table 1. If the above student achievement table is table 1, the primary key of table 1 is the school number, and the foreign key includes the student name and the achievement level. Assuming that there is another table 2 whose primary key is the achievement level and the foreign key is the score range corresponding to the level, since the foreign key (achievement level) of table 1 is the primary key of table 2, table 1 is considered as the primary table of table 2 and table 2 is the subsidiary table of table 1.
Based on the above description of the relationship of the primary key and the foreign key, and the primary and secondary tables, at step 23, the superior table associated with the particular mirror table in step 22 may be determined, which contains the foreign key associated with the primary key of the particular mirror table. In other words, the superior table is associated with the primary key of a particular mirror table by its foreign key. Accordingly, in this step, the primary key PK of the specific mirror table may be determined first, and then the lookup and search may be performed based on the primary key PK, thereby determining the data table whose foreign key is directly or indirectly associated with the primary key PK as the upper table. In one embodiment, the association relationship between the respective tables and the key at which the association occurs may be stored in advance. In such a case, the upper table can be quickly determined by looking up such an association.
In one embodiment, the upper level table is the primary table, or direct primary table, of the particular mirror table in step 22. That is, the foreign key of the table is equal to, and thus directly associated with, the primary key of the particular mirror table.
In another embodiment, the upper level table further comprises an indirect primary table or a multi-level primary table of the specific mirror table in step 22, such as a primary table of the primary table, the foreign key of which is associated with the primary key of the specific mirror table through an intermediate table. For example, in one example, the upper level table includes a first primary table and a second primary table, the first primary table is a direct primary table of the specific mirror image table, and a primary key of the specific mirror image table is a foreign key; and the second primary table is externally keyed by the primary key of the first primary table, and is therefore a direct primary table of the first primary table, but an indirect primary table of the specific mirror table.
In one embodiment, the primary tables of all levels of the specific mirror table determined are used as the upper level tables in step 23. In another embodiment, in this step, the highest level primary table of the specific mirror table is determined as the upper level table (these two ways do not affect the generation result of the wide table).
The above-described upper table will be described with reference to specific examples. As previously described, at step 22, the mirror table A' is updated as shown in FIG. 3B. Assume the primary key of the mirrored table a' is the customer ID. Thus, at step 23, a search can be made with "client ID" to look up which tables contain "client ID" as foreign key, or which tables contain foreign keys that are indirectly linked to "client ID" through intermediate tables. FIG. 4 illustrates a schematic diagram of an upper level table according to one embodiment. In particular, FIG. 4 shows Table B, a sales order table, which contains at least 4 fields: order number, customer ID, area ID, product ID, etc., where the order number is its primary key and the other fields are foreign keys. Since the primary key "client ID" of table a 'is used as the foreign key of table B, table B is used as the upper table of table a' in step 23. More specifically, Table B is actually the direct master of Table A'. If table B has a further master table, such further master table may be considered to be indirectly associated with table a 'via table B, which may also serve as the upper table of table a'.
Upon determining the upper table as described above, the upper table and the lower table associated with the upper table, the primary key of which is associated with the foreign key of the upper table, are reconnected to generate an updated wide table at step 24.
In one embodiment, a multi-join (MultiJoin) engine is pre-trained that can join an input data table to its subordinate tables in a multi-dimensional and multi-level manner. In this manner, in step 24, the upper table specified in step 23 is input to the multi-connection engine, and the input upper table and the lower table associated with the upper table are connected by the engine, thereby generating the wide table. The following describes the execution process of the above multi-connection engine.
In one embodiment, the multi-Join engine receives the upper table determined in step 23, determines the associated lower table from the upper table, and joins (Join) the upper table with the lower table. The upper table is described previously in connection with step 23, and the lower table in step 24 has a corresponding meaning to the upper table. In particular, the primary key of a lower level table is directly or indirectly associated with the foreign key of said upper level table.
In the case of direct association, the primary key of a lower table is equal to a certain foreign key of said upper table. That is, the upper table includes a foreign key k, and if there is a certain data table (referred to herein as a first secondary table) whose primary key is equal to the foreign key k, the first secondary table is determined as a lower table.
In the case of indirect association, the primary key of the lower table is associated to a certain foreign key of the upper table through the intermediate table. For example, assuming that the above directly associated first secondary table includes a foreign key m, if there exists a certain data table (referred to herein as a second secondary table) whose primary key is equal to the foreign key m of the first secondary table, the second secondary table may serve as a direct secondary table of the first secondary table and as an indirectly associated lower-level table of the upper-level table.
Fig. 5 shows a schematic diagram of determining a lower table. As shown in fig. 5, the process of determining its associated lower table from the upper table can be represented by a directed Acyclic graph dag (directed Acyclic graph). In the example of fig. 5, it is assumed that the upper table T contains 4 foreign keys k1, k2, k3, k4, against which a query is made, respectively. For example, for foreign key k1, assuming that secondary table A1 is determined, whose primary key is equal to k1, then secondary table A1 is a lower level table of upper table T (more specifically, its direct secondary table). Next, a query is made for each foreign key of the secondary table a 1. Assuming that there is an auxiliary table a2 whose primary key is equal to the foreign key a1 of the auxiliary table a1, the auxiliary table a2 also serves as a lower table of the upper table T (more specifically, an indirect auxiliary table thereof). Similarly, such queries are made for the foreign keys k2, k3, k 4.
In one embodiment, the depth-first principle is followed in the execution of determining the lower level table. That is, for a certain foreign key of the upper table, the lower table associated with the foreign key is determined layer by layer until there is no lower table of a further level. For example, in fig. 5, for the foreign key k1, the associated lower tables a1 and a2 are determined layer by layer, and the query is not performed for the foreign key k2 until no further secondary table exists in the secondary table a 2.
In another embodiment, the breadth-first principle may also be followed in determining the execution of the lower table. For example, in fig. 5, the lower tables of the first hierarchy, i.e., the direct auxiliary tables a1, A3, a6, A8, are determined for the foreign keys k1, k2, k3 and k4, respectively, and then the query of further auxiliary tables thereof is continued for these lower tables of the first hierarchy, respectively. The scheme needs to record and cache the inquired intermediate table, and is more suitable for the condition that the association depth between the data tables is not deep enough.
More specifically, the description is made in conjunction with the upper table shown in fig. 4, i.e., data table B. Fig. 6 shows a schematic diagram of the lower table determined from the data table B. As shown, data table B is a sales order table that contains at least 4 fields: order number, customer ID, area ID, product ID, etc., where the order number is its primary key and the other fields are foreign keys.
Based on the external key "client ID" in the data table B, the secondary table a' can be determined as a lower table. It will be understood that the data table B is the upper table determined from the mirror table a ', and conversely, the lower table determined from the data table B must contain the table a'.
According to the external key of "area ID" in Table B, it can be determined that Table C is the direct auxiliary table of Table B, and is used as a lower table. Specifically, table C is a region information table, which further expands the region information with the foreign key "region ID" of table B as the primary key.
According to the external key of "product ID" in the upper table B, it can be determined that the table D is the direct auxiliary table of the table B and is used as a lower table. Specifically, table D is a product description table, which further expands the product information with the foreign key "product ID" of table B as the primary key. In addition, table D includes the foreign keys "vendor ID" and "price". The foreign key "vendor ID" of Table D is queried to determine that Table E is a secondary table to Table D. Specifically, table E is a supplier information table, which takes the foreign key "supplier ID" of table D as the primary key, further augmenting the supplier information. Thus, table E is a direct auxiliary table of table D and an indirect auxiliary table of table B, and therefore also serves as a lower table of table B.
Thus, the lower table is determined layer by layer for each external key of the upper table.
In one embodiment, each time a subordinate table is determined, the superior table is linked to the determined subordinate table. In another embodiment, all the lower tables are determined from the upper table, and the upper table and all the lower tables are concatenated.
When two data tables are connected, different connection modes can be adopted according to the data structure of the data tables, such as one of inner connection, outer connection and natural connection, wherein the outer connection comprises left connection, right connection and full connection. The connection may be performed by selecting an appropriate connection method from the above connection methods, depending on the structures of the upper table and the lower table.
By reconnecting the upper table and the associated lower table, an updated wide table can be generated. In one embodiment, the updated broad table is sent to a search engine for updating the data index.
In one embodiment, the above query determines the upper table, and the operations that connect the upper table with the associated lower table are all performed in the mirror space. In other words, the upper table and the lower table are the corresponding mirror tables of the original data tables in the database. Therefore, the query pressure and the performance pressure brought to the original database by frequently querying the original data table are avoided.
Furthermore, through steps 23 and 24 in fig. 2, all other tables having direct or indirect association with the modified data table can be quickly determined, and the associated tables can be reconnected. In this manner, the entire wide table does not have to be regenerated based on the full data table at each data modification. Moreover, the reconnection process is not limited by the format and connection logic of each database, and has better universality and better performance.
It will be appreciated that in the case of a massive database, the data may be updated very frequently, and thus, the computing platform may receive multiple data change messages directed to the same data table or an associated data table in a very short time. To avoid data conflicts, in one embodiment, modifications to the same data table or associated data tables are handled under the same process and the same thread.
Specifically, assume that a data change message is received at time t 0. For the sake of distinction, this message is referred to as a first data change message. The first data change information indicates a modification to the first data table. After receiving the message, the computing platform processes the message in a thread under a process, and executes the method shown in fig. 2. Assume that at a time T1 after T0, where the difference between T1 and T0 is less than the predetermined time interval T, the computing platform receives another data change message, referred to herein as a second data change message. The computing platform may extract the indicated changed data table, i.e., the second data table, from the second data change message and determine a primary key of the second data table. If the second data table has the same primary key as the first data table indicated by the first data change message, the second data change message is routed to the thread for processing the first data change message. Therefore, the processing time sequence of the second data change message and the first data change message is ensured, and the data conflict possibly caused by parallel execution of the second data change message and the first data change message is avoided.
In an embodiment according to another aspect, there is also provided means for synchronizing change information in a database. Fig. 7 shows a schematic block diagram of a synchronization apparatus according to an embodiment. As shown in fig. 7, an apparatus 700 for synchronizing change information in a database includes: a receiving unit 71 configured to receive a first data change message from the database, the first data change message indicating a first data item in the first data table in which a change has occurred; an updating unit 72 configured to update a record corresponding to the first data item in a first mirror table corresponding to the first data table according to the first data change message; an upper table determining unit 73 configured to determine an upper table associated with the first mirror table, the upper table including a foreign key associated with a primary key of the first mirror table; a multi-join unit 74 configured to re-join the upper table and a lower table associated with the upper table, the lower table having a primary key associated with a foreign key of the upper table, and generate an updated wide table.
In one embodiment, the upper table determining unit 73 includes: a primary key determination module 731 configured to determine a primary key of the first mirror table; a primary table determining module 732 configured to determine a first primary table as the upper-level table, the first primary table having a primary key of the first mirror table as a foreign key. In other words, the primary table determining unit determines the primary table of the first mirror table as the upper table.
In a further embodiment, the primary key determining module 731 is further configured to determine a primary key of the first primary table; the master table determination module 732 is further configured to: and determining a second main table as the upper table, wherein the second main table takes the main key of the first main table as a foreign key. In other words, the master table determining unit may also determine the master table of the first mirror table, i.e., the indirect master table, as the upper table.
In one embodiment, the multi-connection unit 74 includes: a lower table determination module 741 configured to determine a lower table associated with the upper table; a connection module 742 configured to connect the upper table with the lower table.
In one embodiment, the upper table comprises a first foreign key, and the lower table determining module 741 is configured to: determining a first auxiliary table as the lower table, the primary key of the first auxiliary table being equal to the first foreign key. In other words, the lower table determining module determines the auxiliary table of the upper table as the lower table.
In a further embodiment, the first secondary table comprises a second foreign key, the subordinate table determination module 741 being further configured to: determining a second auxiliary table as the lower table, the primary key of the second auxiliary table being equal to the second foreign key. In other words, the lower table determining module determines the auxiliary table of the upper table, i.e., the indirect auxiliary table, as the lower table.
In one embodiment, the connection module 742 is configured to: and adopting one of internal connection, external connection and natural connection to connect the superior table and the inferior table.
In one embodiment, the upper table and the lower table are mirror tables of corresponding data tables in the database.
According to an embodiment, the receiving unit 71 is further configured to: receiving a second data change message within a predetermined time interval, the second data change message indicating a second data item in a second data table that has been changed, the second data table having a same primary key as the first data table; and the apparatus 700 further comprises (not shown) a routing unit configured to route the second data change message to a thread for processing the first data change message.
According to one embodiment, the apparatus 700 further comprises a sending unit (not shown) configured to send the updated broad table to a search engine for updating the data index.
By the above means, the modified data table is updated in the mirror space and the associated data table is determined for reconnection. Therefore, the query pressure and the performance pressure brought by frequently querying the original data table to the original database are avoided, the updating of the wide table is only based on the data table associated with modification, the reconnection process is not limited by formats and connection logics of different databases, and the method has better universality and better performance.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.