CN114816247A - Logic data acquisition method and device - Google Patents

Logic data acquisition method and device Download PDF

Info

Publication number
CN114816247A
CN114816247A CN202210389036.8A CN202210389036A CN114816247A CN 114816247 A CN114816247 A CN 114816247A CN 202210389036 A CN202210389036 A CN 202210389036A CN 114816247 A CN114816247 A CN 114816247A
Authority
CN
China
Prior art keywords
redo log
data
read
log
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210389036.8A
Other languages
Chinese (zh)
Inventor
应承峻
吴倩倩
王剑英
杨新军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202210389036.8A priority Critical patent/CN114816247A/en
Publication of CN114816247A publication Critical patent/CN114816247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present specification provides a method and an apparatus for acquiring logical data, which are applied to a read-only node of a cloud database system, and include: under the condition that the read-only node plays back at least one redo log, acquiring the at least one redo log; analyzing the at least one redo log to obtain data changed by the data change instruction corresponding to the at least one redo log; obtaining index information from the at least one redo log; and forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log. Therefore, through the playback process of the read-only node, the logic data is extracted from the redo log to complete the acquisition of the logic data, the read-write node does not need to store duplicate logs, the IO cost of the read-write node is reduced, and the effect similar to the effect of starting the binary log is achieved on the basis of not starting the binary log.

Description

Logic data acquisition method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of computer applications, and in particular, to a method and an apparatus for acquiring logic data.
Background
For a cloud database system supporting write-once and read-many, the cloud database system generally comprises a read-write (read-write) node and a plurality of read-only (read-only) nodes, wherein the read-write node can change data stored in the cloud database system, and the read-only node can only read the data stored in the cloud database system. All nodes share the same storage space, so that write-once and read-many aiming at the relational database in the shared storage space are completed.
The read-write node changes data stored in the cloud database system generally by executing a data change transaction submitted by a user. In order to improve processing efficiency, when executing a data change transaction, the read-write node does not directly write the changed data into a disk (i.e., a shared storage space), but writes a data change instruction included in the data change transaction into a redo log (redo log) of the shared storage space. Based on the above premise, for the read-only node, in order to obtain the database of the latest version, it also needs to synchronize the redo log written by the read-write node, so as to obtain the database of the latest version by playing back the redo log.
In the above case, since the physical data recorded in the redo log, that is, what modification is performed on which data in which physical page is recorded in the redo log, it is not possible to acquire data that corresponds to what logical meaning only by redo the log itself. In other words, in the related art, only the redo log is used, and the modification of the corresponding logic data cannot be known.
In the related art, the above problem is generally solved by opening a binary log (binary log), where the binary log and the redo log are logs of a cloud database system, but the binary log is a log for recording logical data, and the redo log is a log for recording physical data. Although the problem can be solved by recording the binary logs, the read-write node records two repeated logs, so that the read-write node pays double IO cost, and the efficiency is low.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a method and apparatus for obtaining logical data.
According to a first aspect of one or more embodiments of the present specification, a logical data obtaining method is provided, which is applied to a read-only node of a cloud database system, and includes:
under the condition that the read-only node plays back at least one redo log, acquiring the at least one redo log;
analyzing the at least one redo log to obtain data changed by the data change instruction corresponding to the at least one redo log;
obtaining index information from the at least one redo log;
and forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
According to a second aspect of one or more embodiments of the present specification, there is provided a logical data acquisition apparatus applied to a read-only node of a cloud database system, the apparatus including:
the redo log obtaining module is used for obtaining at least one redo log under the condition that the read-only node plays back the at least one redo log;
the change data acquisition module is used for analyzing the redo log and acquiring data changed by the data change instruction corresponding to the redo log;
the index information acquisition module is used for acquiring index information from the redo log;
and the logic data acquisition module is used for forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
According to a third aspect of the embodiments of the present specification, a cloud database system is provided, where the cloud database system includes one read-write node and a plurality of read-only nodes, and any read-only node executes the foregoing logical data acquisition method.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the aforementioned logical data acquisition method.
According to a fifth aspect of embodiments herein, there is provided a computer apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
the processor executes the executable instructions to implement the aforementioned logic data acquisition method.
According to a sixth aspect of embodiments herein, there is provided a computer program that, when executed, implements the aforementioned logical data acquisition method.
The present specification provides a method and an apparatus for acquiring logical data, which are applied to a read-only node of a cloud database system, and include: under the condition that the read-only node plays back at least one redo log, acquiring the at least one redo log; analyzing the at least one redo log to obtain data changed by the data change instruction corresponding to the at least one redo log; obtaining index information from the at least one redo log; and forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
Therefore, through the playback process of the read-only node, the logic data is extracted from the redo log to complete the acquisition of the logic data, the read-write node does not need to store duplicate logs, the IO cost of the read-write node is reduced, and the effect similar to the effect of starting the binary log is achieved on the basis of not starting the binary log. In addition, the method is completed in the playback process, namely, the redo log obtained by the playback process of the redo log of the read-only node is multiplexed (reused), so that the logical data can be obtained through the redo log and can be played back, and the logical data is extracted at a lower cost. In addition, extra binary logs are not required to be recorded by the read-write nodes, so that the storage space is saved, and the data write-in speed is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
FIG. 1 is a flow chart illustrating a method of logical data acquisition according to an exemplary embodiment of the present description.
FIG. 2 is a block diagram of a logical data acquisition device shown in accordance with an exemplary embodiment of the present description.
Fig. 3 is a hardware configuration diagram of an electronic device in which a logical data acquisition apparatus according to an exemplary embodiment is shown.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
The background on which the method provided in this specification is based will first be described.
The method provided by the specification aims at a cloud database (generally a relational database) with write-once and read-many capabilities, and the cloud database generally comprises a read-write node and a plurality of read-only nodes in order to realize the rapid processing of read-write instructions. The read-write node is a main node and is used for executing a data change instruction in a data change transaction submitted by a user so as to change the database according to the instruction of the user; the read-only node is used to execute data read instructions.
In order to ensure the writing efficiency in the database, when a data change instruction is received, the data change instruction is not immediately synchronized to a disk, but the data change instruction is temporarily written into a redo log, in other words, when the read-write node receives the data change instruction, the data change instruction is firstly written into the redo log, then the data change instruction is executed, and the writing success is returned to a user after the redo log is written, so that the writing efficiency (the speed of returning to the user) is improved.
In some cases, some data change instructions are only synchronized in the redo log, and corresponding data modifications are not persisted to the shared storage space, in this case, in order to update the database of the latest version, the read-only node needs to acquire the redo log written in the shared storage space by the read-write node, and play back (playback means that corresponding data change instructions included in the redo log are executed in sequence) the redo log to obtain the database of the latest version, thereby ensuring the synchronization of data of each node.
The requirements of the related art will be described next after a basic description of the various parts of the database system.
The redo log is a physical log which records data changed by each data change instruction physically, in other words, only which position of the physical page corresponds to each modification is recorded in the redo log, but for the relational database, only which position of the physical page corresponds to each modification is known, and data of which row and column of which table the data of the position corresponds to and the logical meaning represented by the tables/rows/columns cannot be known (such data is referred to as logical data).
This condition makes the read-only node unaware of the logical modification. The redo log formed by the physical data is a physical log with partial logical significance, and only what is specifically modified at a specific position can be known through the redo log, and the meaning of the position and the meaning of the modification cannot be known, so that the logical modification cannot be completely sensed through the redo log.
In order to enable each node to sense the logical modification, the related art may also store a Binary Log (Binary Log) while storing the redo Log, where the Binary Log and the redo Log are both logs used for storing modifications of the read-write node to the database, and the difference between the Binary Log and the redo Log is that the Binary Log represents the modifications through logical data (the Binary Log may be regarded as a sql sequence of a user), and the redo Log represents the modifications through physical data (it should be noted that, in the related art, data synchronization between multiple nodes is implemented through the Binary Log, that is, a method in which only the read node plays back the Binary Log to obtain a database of a latest version is called native replication, and a method in which data synchronization between multiple nodes is implemented through the redo Log is called physical replication).
Although the method for recording two logs can enable each node to sense the modification of the database logically, two data with basically consistent contents (only one is physical data and one is logical data) are recorded, so that double storage and data writing costs are caused, and the data writing efficiency is influenced.
To solve the above problem, it is first considered whether only the binary log can be saved to reduce the cost while the log is implemented to be more readable by the maintenance personnel. However, for some database engines (such as innodb), the redo log is data necessary for the database engine, and the binary log is a log which can be recorded or not recorded selectively, and the existing database engine cannot support the method of not saving the redo log.
In view of the above, it is easy to think in the related art that the maintenance staff can know the modification of the database by using the nodes (read-write nodes) that modify the database, that is, the nodes can sense the logical modification by using the binary log (or modify the structure of the unremovable redo log so that the redo log includes more logical information, but such a process is very complicated and is prone to errors).
Although only the read-write node has the right to modify the database in the database system, the read-only node plays back the redo log to complete the data synchronization, and the purpose of sensing the modification of the database system can be realized through the logical change in the data playback process of the read-only node. Although some read-only nodes only play back part of the redo log in some cases, due to the fact that the number of the read-only nodes is large, the data played back by all the read-only nodes can always cover most of data in the database.
It is therefore considered that the acquisition of logical data can be done by the way during the playback of the read-only node.
Furthermore, in the playback process of the read-only node, the redo log can be acquired, and the redo log is essentially the same as the content recorded by the binary log, so that the logical data corresponding to the redo log can be acquired inevitably by analyzing the redo log.
For how specific it is parsed, it is considered that logical data should include the logical meaning of the changed data, as well as the content. For the logical meaning, the logical meaning can be represented by the index corresponding to the changed data, and the redo log already comprises some simple indexes which can be directly used as the index of the changed data; for the content, the redo log includes what content of the physical page is modified, and then the changed data can be restored according to the arrangement of the data in the relational database and the information in the redo log, so that the acquisition of the logical data can be completed.
In other words, the present specification provides a method and an apparatus for acquiring logical data, which are applied to a read-only node of a cloud database system, and include: under the condition that the read-only node plays back at least one redo log, acquiring the at least one redo log; analyzing the at least one redo log to obtain data changed by the data change instruction corresponding to the at least one redo log; obtaining index information from the at least one redo log; and forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
Therefore, through the playback process of the read-only node, the logic data is extracted from the redo log to complete the acquisition of the logic data, the read-write node does not need to store duplicate logs, the IO cost of the read-write node is reduced, and the effect similar to the effect of starting the binary log is achieved on the basis of not starting the binary log. In addition, the method is completed in the playback process, namely, the redo log obtained by the playback process of the redo log of the read-only node is multiplexed (reused), so that the logical data can be obtained through the redo log and can be played back, and the logical data is extracted at a lower cost. In addition, extra binary logs are not required to be recorded by the read-write nodes, so that the storage space is saved, and the data write-in speed is improved.
Further, on this basis, the external module can access the database through the read-only node and subscribe the logic data acquired in the playback process on the read-only node, so that the external module can sense the more readable logic data, and the sensing of database modification is realized.
Next, a logical data acquisition method shown in this specification will be described in detail.
As shown in fig. 1, fig. 1 is a flow chart illustrating a logical data acquisition method according to an exemplary embodiment of the present description, including the steps of:
step 101, under the condition that the read-only node plays back at least one redo log, obtaining the at least one redo log.
It is also noted that the method is applied to a read-only node of a cloud database system.
In step 101, redo logs acquired during playback of data by the read-only node are utilized, and these redo logs are used as a basis for acquiring logical data.
The meaning of each noun in step 101 will be described in detail.
Firstly, the format of the redo log needs to be explained, the redo log is generally divided into a compact type and a redundant type, the contents of the records of the two types are basically consistent and only differ in length, and the compact type is a newer type and is shorter in length.
Although the redo log is a physical log, the redo log has a logical meaning, generally, one redo log file includes multiple redo logs, and the format of each redo log is as follows: typically including a journal Flag (Single Record Flag), a journal Type (Log Type), a tablespace id (space id), a Page Number in a tablespace (Page Number), and journaled data (Payload).
The types of logs may differ from instruction to instruction, and may include, for example, an insert (insert) log, a delete (delete) log, and an update (update) log, among others, and different types of logs may be stored in the same redo log file. The data recorded by the log is different according to different log types, and specific differences are described in detail below and are not described herein again.
The cloud database system supports write-once read-many, and comprises a read-write node and a plurality of read-only nodes, wherein the read-only nodes can acquire and play back redo logs to acquire the latest version of the database in order to synchronize modification of physical pages in the read-write node, and specific descriptions of the processes are referred to above and are not repeated herein.
The process of Read-only node playback in the related art generally includes three stages, where the first stage is a Read (Read) stage, and a journal reading thread (a Redo Log Async Reader, although other threads may be used, here, only a journal reading thread is taken as an example) in the Read-only node reads a Redo journal from a shared storage space to a memory Buffer (Async Read Buffer) of the Read-only node.
The first stage may be executed when the latest version of the data to be read is not persistent, or when each read-only node finds that the redo log of the read-write node is newly added. For the latter, Log Serial Number (LSN) can be synchronized between the read-only node and the read-write node through a communication thread, the LSN is used to represent the position where the latest redo Log is recorded, when the LSN of the read-write node is increased, it indicates that the redo Log is newly written in by the read-write node, and then the read-only node can play back the newly added redo Log, so as to implement data synchronization between the read-write node and the read-only node.
The second stage is a parsing (Parse) stage, a coordination thread (application Coordinator) parses each Redo Log, each physical page corresponds to a Log application thread (Redo Log application Worker), and the coordination thread allocates each Redo Log to a corresponding Log application thread according to a tablespace ID and a page number included in each Redo Log.
The meaning of the parsing is different from that of parsing the redo log in step 103, and the parsing refers to parsing to obtain a physical page corresponding to each redo log, so as to allocate the physical page to a corresponding log application thread. In addition, the purpose of setting a plurality of log application threads is to improve the efficiency of log application, and if the application speed is slow by only one log application thread.
The third stage is an application (Apply) stage, that is, after all redo logs acquired in the first stage are analyzed, the coordination thread notifies all log application threads to Apply all redo logs in parallel, and a memory page of the latest version is acquired in the memory.
After the description of each term involved in step 101, step 101 will be described from a different point of view.
As mentioned above, the playback includes three phases, and of course the retrieval of the redo log may be performed in any of the three phases, but considering that in some cases there is a default problem with the redo log, this makes the retrieval of the redo log in the third phase application phase more resource efficient.
Specifically, when the content written in the previous redo log and the next redo log is the same, the next redo log will default to a partial content. For example, a character string AB is written in a certain position of a certain physical page in the previous redo log, and a character string ABCD is written in a certain position of another physical page in the next redo log, because AB and ABCD have the same "AB", when recording the next piece of content, the write CD is directly recorded, and the write ABCD is not recorded.
In this case, in the parsing stage, that is, before the log application thread is actually applied, the coordinating thread can complete the complete log through the Mismatch information, so that the completed redo log can be obtained in the third stage, and the redo log is repeatedly completed after the redo log is obtained in other stages, which shows that obtaining the redo log in the third stage improves efficiency and reduces repeated work.
And in the parsing stage, acquiring mismatch information firstly acquires a physical page, and the physical page is generally acquired in the application stage. If the redo log is obtained during the parsing stage, the physical page in the shared memory space still needs to be read first (in case the physical page is not in the memory), and this is also done repeatedly. Duplication work can also be reduced from this perspective if the redo log is acquired in the third phase.
In other words, step 101 includes: and after the read-only node plays back at least one redo log and completes the redo log, acquiring the redo log.
That is, the complete redo log is obtained during the application phase of playback, which may reduce duplication. The efficiency is improved. Specifically, the playback phase acquisition may be performed before the application phase starts to apply, or after the application phase starts to apply, and the specific application timing is not limited herein.
After the description of step 101 from the execution timing of step 101, step 101 will be further described from the type of redo log.
The redo log is directed at data change instructions (including data insertion, deletion and update instructions) and is also directed at transaction operations, and the Commit (Commit), Rollback (Rollback) and save point (Savepoint) operations of a transaction also generate corresponding logs, which are different for different types of log processing methods.
For the data change instruction of the type of inserting and updating in the data change instruction, generally, only a compact type redo log and a redundant type redo log are provided, the contents of the records of the two types redo log are basically the same, the generation time is also the same, and both types redo log can be used for extracting the logic data.
For the data change instruction of the delete type in the data change instruction, there are 4 types of redo logs, which are respectively a redo log of the delete mark type in the compact format and the redo format, and a redo log of the recycle type in the compact format and the redo format. For the delete marker type and the delete recycle type, different from the insert and update, the delete is not deleted immediately upon receiving the data change instruction, but when the data change instruction is received, a delete marker is marked on a cluster index (i.e. a primary key index) corresponding to the data to be deleted, and then a redo log of the delete marker type is recorded. After the deletion can be performed for a period of time, the data corresponding to the cluster index with the deletion mark is deleted, and the recovery of the space to be deleted is completed.
In the above case, the data is considered as deleted in the case where the cluster index is marked with a deletion flag. It is sufficient to record only the redo log of the delete marker type, but when the data is actually deleted, that is, the corresponding space is recovered, the redo log should be recorded, otherwise, the data cannot be recovered through the redo log, and therefore, the redo logs of the two types need to be recorded. For both types of redo logs, a delete marker type redo log is recorded when the data is considered deleted, which is more reflective of the true state of the data, compared to a delete recycle type log that is delayed.
Therefore, in order to improve the real-time performance and accuracy of the logic data acquisition, the logic data is selected to be analyzed only by deleting the log of the mark type. In other words, step 101 includes: under the condition that the redo log type played back by the read-only node is the deletion mark type, acquiring at least one redo log of which the type is the deletion mark type; the type of the deletion marker is used for representing that the redo log is generated when the cluster index is marked with the deletion marker. Under the condition that the type of the redo log played back by the read-only node is a delete recovery type, the redo log with the type of the delete recovery type is not acquired; the deletion recycling type is used for representing that the redo log is generated when data corresponding to the cluster index with the deletion marker is deleted.
How to handle the log related to the operation type of the transaction will be described below, and will not be described herein again. It should be noted that logs related to transaction operation types still need to be obtained.
Step 103, analyzing the at least one redo log, and acquiring data changed by the data change instruction corresponding to the at least one redo log.
And 105, acquiring index information from the at least one redo log.
Next, step 103 and step 105 will be collectively described.
The process of steps 103 and 105 is the process of extracting logical data from the physical redo log. First, it should be noted that logical data includes a location where the data is located (and may further include a meaning of the location), and how the data is changed, and therefore, both need to be extracted separately.
It should be noted that, the sequence of steps 103 and 105 is not limited in this specification.
First, a specific implementation method of step 103 will be described.
For delete and insert type redo logs, since the smallest unit of relational database operation is one row, insert and delete tend to be either an insert or delete of an entire row. In this case, the modified data to be extracted by deletion and insertion is the data of an entire row, and a row includes a plurality of columns, it is necessary to restore the data of each column of the row and compose the data of each column into the data of the entire row.
For the update type redo log, sometimes the update is only to change a certain column or several columns in a row, and the obtaining of changed data may be to obtain data of an entire row or to obtain data of a certain column or several columns in a row (i.e. changed data).
For a specific recovery method, the redo log generally includes the length of the field in each column, non-empty information, and the like, and further, the data recorded in the redo log can be split into multiple blocks through both the recorded fields, and the empty columns are filled in the multiple blocks of data through the non-empty information, so that a whole row of data is obtained through the obtained data in each column.
It should be noted that the above only shows one method for extracting changed data by parsing the redo log, and the specific implementation method is not limited to this method, but may be other methods, and the present specification does not limit this method.
Next, a specific implementation of step 105 will be described.
For the redo log corresponding to the data change instruction, the redo log may include some logic information, such as a tablespace ID, a page number, and the like, and if the maintenance staff only needs the logic data in the tablespace ID level, the tablespace ID may be obtained only by analyzing the redo log.
In some cases, the maintenance personnel actually only use the tablespace ID and do not know which specific table this ID represents, and in some cases, the maintenance personnel also want to know which line the modification is, and in such cases, the redo log and the description in the physical page.
Specifically, the physical page where the changed data is located may be determined by determining the table space ID and the page number included in the redo log through the redo log, and the Index identifier (Index ID) of the line of data (i.e., the line where the changed data is located) may be extracted from the Index Header (Index Header) of the physical page, and the Index (Index) object and the table (table) object of the corresponding line may be determined from the Index dictionary, so that the primary key Index and the secondary Index of the changed data may be determined, and the complete Index information of the changed data may be obtained.
In other words, step 105 comprises: acquiring a physical page of the data changed by the data change instruction from the at least one redo log; determining an index identifier corresponding to the data changed by the data change instruction from an index header of the physical page, and determining index information of the data changed by the data change instruction according to the index identifier; the index information includes a primary key index and a secondary index of the data changed by the data change instruction.
The captured row data is captured from the primary key index, the primary key index contains complete row data, and the secondary index only contains a certain number of columns of data.
And 107, forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
Through steps 103 and 105, the positions of the changed data in which table are specifically analyzed from the redo log (index information, representing the positions in the table), and how to change the data is determined, and if both are obtained, the two can be used as the logical data corresponding to the redo log.
It should be noted that, the redo log and the log for the transaction operation are mentioned above, specifically, there are a redo log corresponding to the transaction commit, a redo log corresponding to the transaction rollback, and a redo log corresponding to the transaction with the save point set. For all three, there is no need to analyze the logical data in these redo logs, as they are not operations on database changes. However, these redo logs indicate which redo logs are executed and which are not issued transaction commit instructions (it is possible that the transaction is not committed and the corresponding data change instruction has already been written into the redo log), so these redo logs still need to be analyzed to better analyze the redo logs for the data change instruction.
For the redo log corresponding to the transaction commit operation, the redo log generally includes the transaction number, and the other redo logs for the data change instruction also include the transaction number, so that the redo log for the data change instruction corresponding to the transaction number in which the commit operation has been recorded is only the redo log that needs to be analyzed, and the other redo logs are redo logs that have not been submitted yet, and the redo logs do not need to be analyzed now and are analyzed after the corresponding transaction is committed.
In other words, the above method further comprises: and under the condition that the obtained at least one redo log comprises a redo log corresponding to a transaction commit instruction, determining a commit transaction number included in the redo log corresponding to the transaction commit instruction. And determining a change transaction number included in the redo log corresponding to the data change instruction in the at least one redo log. And determining an effective redo log from the at least one redo log according to the submitted transaction number and the change transaction number, and determining logic data based on the effective redo log.
The valid redo log is a redo log having the same change transaction number as any of the committed transaction numbers, and the logical data needs to be determined based on these redo logs.
For the rollback operation of the transaction, the rollback operation does not directly record the redo Log, the compensation operation is performed by applying the rollback Log (Undo Log) in the reverse direction, and the compensation operation also records the corresponding redo Log, so that the redo Log corresponding to the rollback operation of the transaction (which is regarded as the redo Log for the data change instruction) can be analyzed based on the above method.
For the operation of the save point, a redo log is generated when the save point is added, and the redo log does not contain logical meaning and can not be analyzed. When the data change instruction is returned to a certain storage point, similar to the rollback operation, the compensation operation is performed by reflecting the application of the rollback log, the corresponding redo log is generated by the compensation operation, and the redo log can be processed according to a processing method of the redo log aiming at the data change instruction.
In addition, it should be noted that, in order to enable the maintenance personnel to see the modification of the database, the subscription module may also subscribe the logical data generated on each read-only node, so that the maintenance personnel may know the logical data corresponding to the modification of the physical page.
In other words, after composing the logical data, the method further comprises: and sending the composed logic data to a subscription module. The sending to the subscription module may be all the logical data corresponding to the redo log, or all or part of the logical data corresponding to part of the redo log, specifically sending a result of the joint action of the redo log acquisition range of the read-only node, the analysis logical data range and the modification range subscribed by the subscription module.
In the related technology, the read-only node cannot sense the logic data under the condition that the binary log is not started, the method provides the ability of sensing the logic data for the read-only node, and maintenance personnel can acquire the logic data from the read-only node through the subscription module, so that the sensing of database change is realized.
By the method, the original scheme that the logical data can only be obtained through the read-write node is broken through, the logical data can be obtained through the read-only node, the read-write node does not need to record binary logs, and the read-write pressure of the read-write node is reduced. And this scheme is performed incidentally while the read-only node plays back, so that the scheme is implemented with minimal cost.
Corresponding to the embodiments of the method, the present specification also provides embodiments of the apparatus and the terminal applied thereto.
As shown in fig. 2, fig. 2 is a block diagram of a logical data acquisition apparatus applied to a read-only node of a cloud database system according to an exemplary embodiment. The device comprises:
the redo log obtaining module 210 is configured to obtain at least one redo log when the read-only node plays back the at least one redo log.
The changed data obtaining module 220 is configured to analyze the at least one redo log and obtain data changed by the data change instruction corresponding to the at least one redo log.
An index information obtaining module 230, configured to obtain index information from the at least one redo log.
A logic data obtaining module 240, configured to combine the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
In an optional implementation manner, the redo log obtaining module 210 is specifically configured to: and after the read-only node plays back at least one redo log and completes the redo log, acquiring the redo log.
In an optional implementation manner, the redo log obtaining module 210 is specifically configured to: under the condition that the redo log type played back by the read-only node is the deletion mark type, acquiring at least one redo log of which the type is the deletion mark type; the type of the deletion mark is used for representing that the redo log is generated when the cluster index is marked with the deletion mark; under the condition that the type of the redo log played back by the read-only node is a delete recovery type, the redo log with the type of the delete recovery type is not acquired; in an optional embodiment, the delete recovery type is used to represent that the redo log is generated when deleting data corresponding to a cluster index having a delete marker, and the index information obtaining module 230 is specifically configured to: acquiring a physical page of the data changed by the data change instruction from the at least one redo log; determining an index identifier corresponding to the data changed by the data change instruction from an index header of the physical page, and determining index information of the data changed by the data change instruction according to the index identifier; the index information includes a primary key index and a secondary index of the data changed by the data change instruction.
In an optional embodiment, the apparatus further comprises: a valid redo log determination module 240 (not shown) for: determining a commit transaction number included in the redo log corresponding to the transaction commit instruction under the condition that the obtained at least one redo log includes a redo log corresponding to the transaction commit instruction; determining a change transaction number included in a redo log corresponding to the data change instruction in the at least one redo log; and determining an effective redo log from the at least one redo log according to the submitted transaction number and the change transaction number, and determining logic data based on the effective redo log.
In an optional embodiment, the apparatus further comprises: a logic data sending module 250 (not shown in the figure) for sending the composed logic data to the subscription module after the logic data is composed.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
As shown in fig. 3, fig. 3 is a hardware structure diagram of a computer device in which the logic data acquisition apparatus of the embodiment is located, and the device may include: a processor 1010, a memory 1020 for storing processor-executable instructions, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and the processor executes the executable instructions to implement the aforementioned logic data obtaining method.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored, where the computer program is executed by a processor to implement the foregoing logical data acquisition method.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Furthermore, the present specification also provides a computer program that implements the aforementioned logical data acquisition method when being executed.
In addition, the present specification also provides a cloud database system, where the cloud database system includes a read-write node and a plurality of read-only nodes, and any read-only node executes the aforementioned logical data acquisition method.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims (11)

1. A logic data acquisition method is applied to a read-only node of a cloud database system, and comprises the following steps:
under the condition that the read-only node plays back at least one redo log, acquiring the at least one redo log;
analyzing the at least one redo log to obtain data changed by the data change instruction corresponding to the at least one redo log;
obtaining index information from the at least one redo log;
and forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
2. The method of claim 1, wherein in the case that the read-only node plays back at least one redo log, obtaining the at least one redo log comprises:
and after the read-only node plays back at least one redo log and completes the redo log, acquiring the redo log.
3. The method of claim 1, wherein in the case that the read-only node plays back at least one redo log, obtaining the at least one redo log comprises:
under the condition that the redo log type played back by the read-only node is the deletion mark type, acquiring at least one redo log of which the type is the deletion mark type; the type of the deletion mark is used for representing that the redo log is generated when the cluster index is marked with the deletion mark;
under the condition that the type of the redo log played back by the read-only node is a delete recovery type, the redo log with the type of the delete recovery type is not acquired; the deletion recycling type is used for representing that the redo log is generated when data corresponding to the cluster index with the deletion marker is deleted.
4. The method of claim 1, the obtaining index information from the at least one redo log, comprising:
acquiring a physical page of the data changed by the data change instruction from the at least one redo log;
determining an index identifier corresponding to the data changed by the data change instruction from an index header of the physical page, and determining index information of the data changed by the data change instruction according to the index identifier; the index information includes a primary key index and a secondary index of the data changed by the data change instruction.
5. The method of claim 1, further comprising:
determining a commit transaction number included in the redo log corresponding to the transaction commit instruction under the condition that the obtained at least one redo log includes a redo log corresponding to the transaction commit instruction;
determining a change transaction number included in a redo log corresponding to the data change instruction in the at least one redo log;
and determining an effective redo log from the at least one redo log according to the submitted transaction number and the change transaction number, and determining logic data based on the effective redo log.
6. The method of any of claims 1-5, after composing the logical data, the method further comprising:
and sending the composed logic data to a subscription module.
7. A logic data acquisition device applied to a read-only node of a cloud database system comprises:
the redo log obtaining module is used for obtaining at least one redo log under the condition that the read-only node plays back the at least one redo log;
the change data acquisition module is used for analyzing the redo log and acquiring data changed by the data change instruction corresponding to the redo log;
the index information acquisition module is used for acquiring index information from the redo log;
and the logic data acquisition module is used for forming the index information and the data changed by the data change instruction into logic data corresponding to the at least one redo log.
8. A cloud database system comprising one read-write node and a plurality of read-only nodes, any read-only node performing the logical data acquisition method of any one of claims 1-6.
9. A computer device, the computer device comprising:
a processor;
a memory for storing processor-executable instructions;
the processor implements the logical data acquisition method of any of claims 1-6 by executing the executable instructions.
10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the logical data acquisition method of any one of claims 1-6.
11. A computer program which when executed implements the logical data acquisition method of any one of claims 1-6.
CN202210389036.8A 2022-04-13 2022-04-13 Logic data acquisition method and device Pending CN114816247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210389036.8A CN114816247A (en) 2022-04-13 2022-04-13 Logic data acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210389036.8A CN114816247A (en) 2022-04-13 2022-04-13 Logic data acquisition method and device

Publications (1)

Publication Number Publication Date
CN114816247A true CN114816247A (en) 2022-07-29

Family

ID=82535809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210389036.8A Pending CN114816247A (en) 2022-04-13 2022-04-13 Logic data acquisition method and device

Country Status (1)

Country Link
CN (1) CN114816247A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501736A (en) * 2023-04-12 2023-07-28 北京优炫软件股份有限公司 Control method and control system for delayed playback of database

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043686A (en) * 2009-10-20 2011-05-04 华为技术有限公司 Disaster tolerance method, backup server and system of memory database
CN106383861A (en) * 2016-08-31 2017-02-08 网易(杭州)网络有限公司 Data synchronization method and apparatus used for databases
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
CN108241676A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 Realize the method and apparatus that data synchronize
CN111324665A (en) * 2020-01-23 2020-06-23 阿里巴巴集团控股有限公司 Log playback method and device
CN111797104A (en) * 2020-06-12 2020-10-20 中国建设银行股份有限公司 Method and device for acquiring data change condition and electronic equipment
US20210089401A1 (en) * 2019-09-19 2021-03-25 TmaxData Co., Ltd. Method, Server, and Computer Readable Medium for Index Recovery Using Index Redo Log
CN113760846A (en) * 2021-01-08 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043686A (en) * 2009-10-20 2011-05-04 华为技术有限公司 Disaster tolerance method, backup server and system of memory database
CN106383861A (en) * 2016-08-31 2017-02-08 网易(杭州)网络有限公司 Data synchronization method and apparatus used for databases
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
CN108241676A (en) * 2016-12-26 2018-07-03 阿里巴巴集团控股有限公司 Realize the method and apparatus that data synchronize
US20210089401A1 (en) * 2019-09-19 2021-03-25 TmaxData Co., Ltd. Method, Server, and Computer Readable Medium for Index Recovery Using Index Redo Log
CN111324665A (en) * 2020-01-23 2020-06-23 阿里巴巴集团控股有限公司 Log playback method and device
WO2021147935A1 (en) * 2020-01-23 2021-07-29 阿里巴巴集团控股有限公司 Log playback method and apparatus
CN111797104A (en) * 2020-06-12 2020-10-20 中国建设银行股份有限公司 Method and device for acquiring data change condition and electronic equipment
CN113760846A (en) * 2021-01-08 2021-12-07 北京沃东天骏信息技术有限公司 Data processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢栋栋;何清法;: "基于多线程的并行实例恢复方法", 计算机应用, no. 04, 10 April 2016 (2016-04-10), pages 126 - 131 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501736A (en) * 2023-04-12 2023-07-28 北京优炫软件股份有限公司 Control method and control system for delayed playback of database

Similar Documents

Publication Publication Date Title
US9678969B2 (en) Metadata updating method and apparatus based on columnar storage in distributed file system, and host
CN106933703B (en) Database data backup method and device and electronic equipment
US8051054B2 (en) Method and system for data processing with database update for the same
US20070156778A1 (en) File indexer
CN112286941A (en) Big data synchronization method and device based on Binlog + HBase + Hive
CN111324665A (en) Log playback method and device
CN113918535A (en) Data reading method, device, equipment and storage medium
US20180075074A1 (en) Apparatus and method to correct index tree data added to existing index tree data
CN114816247A (en) Logic data acquisition method and device
CN112783927B (en) Database query method and system
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
CN111639087B (en) Data updating method and device in database and electronic equipment
CN113760902A (en) Data splitting method, device, equipment, medium and program product
CN101853278A (en) Application method of data storage system
JPH02297284A (en) Document processing system and version control system
CN106407345B (en) Dirty data updating method and device
CN111444194B (en) Method, device and equipment for clearing indexes in block chain type account book
JP2925042B2 (en) Information link generation method
CN114547031A (en) Key value pair database operation method and device and computer readable storage medium
CN114153857A (en) Data synchronization method, data synchronization apparatus, and computer-readable storage medium
CN105893512A (en) Any version compatible data structure access method and device based on metadata
CN112527911B (en) Data storage method, device, equipment and medium
CN106155837B (en) method and device for restoring data of main and standby databases
US10521314B2 (en) Cross-referenced irregular field storage in databases
JP2822869B2 (en) Library file management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination