WO2018137327A1 - 一种主备机数据传递方法、控制节点以及数据库系统 - Google Patents

一种主备机数据传递方法、控制节点以及数据库系统 Download PDF

Info

Publication number
WO2018137327A1
WO2018137327A1 PCT/CN2017/095477 CN2017095477W WO2018137327A1 WO 2018137327 A1 WO2018137327 A1 WO 2018137327A1 CN 2017095477 W CN2017095477 W CN 2017095477W WO 2018137327 A1 WO2018137327 A1 WO 2018137327A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage unit
standby
standby machine
identifier
machine
Prior art date
Application number
PCT/CN2017/095477
Other languages
English (en)
French (fr)
Inventor
王伟
李健
徐文韬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018137327A1 publication Critical patent/WO2018137327A1/zh
Priority to US16/522,073 priority Critical patent/US10831612B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • the present invention relates to the field of shared storage technologies, and in particular, to a master-slave data transfer method, a control node, and a database system.
  • the host and the standby share the same storage device. Because the storage device is used, the host and the standby can use the data of the storage device. For example, the data of the storage device can be used. It is the data of the database, and the cluster software manages the primary and backup machines to access the data.
  • the cluster system provides a virtual IP address to provide external services.
  • Both the host and the backup machine can access the same page on the storage device.
  • the host and the backup machine usually Synchronize the required pages into the local cache (memory).
  • the host can write to the page, and the standby can only read the page.
  • the host provides the service, the page on the storage device can be read and written.
  • the host modifies a page, the related modification information is sent to the standby device, so that the standby device applies the modification information to the memory to make the memory device
  • the pages involved are in sync with the same page on the host.
  • the embodiments of the present invention provide a data transmission method, a control node, and a database system for a primary and backup machine, in order to reduce the transmission of invalid data between the primary and backup machines, thereby reducing the consumption of the primary/standby CPU and the waste of network bandwidth resources.
  • an embodiment of the present invention provides a data transmission method of a primary and backup machine, which is applied to a database system, where the method includes: the control node receives an operation log sent by the host, where the operation log includes at least An operation record, each operation record corresponding to one storage unit, each operation record indicating a record of the host performing a write operation to a local cache of the host or a storage unit in the storage device; for the first standby machine, determining the location a first storage unit set in the first standby machine, and a second storage unit set corresponding to the at least one operation record, where the first standby machine is any one of the at least one standby machine; Generating an intersection of the storage units according to the storage unit that exists in the first storage unit set and the second storage unit set; acquiring an operation record corresponding to the intersection of the storage unit in the operation log, and the corresponding The operation record is sent to the first standby machine.
  • the above-mentioned first aspect describes a data transmission method of the primary and backup devices provided by the embodiment of the present invention from the control node side.
  • the standby device can only receive the operation record corresponding to the standby device, so that the primary and backup devices can be reduced. Invalid data transmission between machines, thereby reducing the consumption of the main/standby CPU and the waste of network bandwidth resources.
  • the storage unit is used to describe database data of a specific storage area, and the storage unit refers to a fixed-size storage space in which data is recorded.
  • the storage unit may be a page, and may also be a data storage space defined by other forms, such as a block, a sector, or the like.
  • the host can perform a write operation on the storage unit, and the standby machine can only perform a read operation on the storage unit.
  • the host performs a read/write transaction
  • the database data in the storage unit is written, the storage unit is modified, and the host records the operation record modified by the storage unit, wherein each operation record corresponds to one storage unit, and each operation record corresponds to one storage unit, and each operation record corresponds to one storage unit.
  • the operational record represents a record of the host performing a write operation to the local cache or a storage unit of the storage device.
  • the storage unit is a page, and each operation record corresponds to one page, and the operation record is, for example, a redo log for a modified page, and the operation log is, for example, all modified.
  • the determining, by using the first mapping device, the first storage unit set in the first standby device comprises: determining, according to a preset mapping table, the first existing in the first standby device A collection of storage units.
  • the mapping table is preset in the host, where the mapping table includes at least one entry, and each entry may include an identifier of the standby, and all the storage units existing in the local cache of the standby.
  • the identifier of the standby device may be a standby number
  • the identifier of the storage unit may be a storage unit number (such as a page number)
  • the standby device The address can be the IP/Mac address of the standby. It can be understood that the host can determine which storage units exist in any one of the standby machines in the database system by using the query mapping table, thereby determining the first storage unit set existing in the standby machine.
  • each operation record includes an identifier of the corresponding page, so the control node can determine, by using the identifier of the page, which operation logs are corresponding to a page, thereby determining a second set of storage units corresponding to the operation log.
  • control node takes the first storage unit set and the second storage unit set to intersect, and obtains the storage unit intersection, that is, the storage unit in the storage unit intersection exists in the first storage unit set and the second storage unit set. .
  • the control node can effectively determine which pages the host has updated, and which pages are needed for the standby machine, which can be understood, and the host is configured according to the standby machine.
  • the existing page sends the operation record required by the standby to the standby machine. Since the operation record received by the standby machine is the operation record required by the standby machine, the operation record can be avoided.
  • the standby machine is based on the operation record.
  • the related pages are updated, so that the data of the same page of the main/standby machine can be synchronously updated, thereby satisfying the requirements of the users in the multi-read database system.
  • the control node determines, according to the mapping table, a Kth storage unit set existing in the Kth standby device, where the Kth standby device is the first one of the at least one standby device.
  • a standby device outside the standby device, the set of Kth storage units and the first storage unit set are inconsistent. That is to say, in the database system, the number of the standby devices is at least two, and all the storage units corresponding to the local caches of the different standby devices are inconsistent, and correspondingly, the mapping table includes at least two entries, and different entries correspond to different entries. The identifiers of all the storage units are inconsistent, wherein the inconsistency means not exactly the same.
  • the method further includes: the control node receiving a mapping table update request sent by the second standby, where the mapping table update request includes the second standby The identifier of the storage unit to be updated, and the second standby machine is one of the at least one standby machine;
  • the control node queries, in the mapping table, an entry corresponding to the identifier of the second standby device, and adds an identifier of the storage unit to be updated to an entry corresponding to the identifier of the second standby device. in.
  • mapping table update request may be a registration request or a retirement request.
  • the control node receives a registration request sent by a second standby machine, where the registration request includes an identifier of the second standby machine and an identifier of a storage unit to be registered, where The second standby machine is any one of the at least one standby machine; the control node queries an entry corresponding to the identifier of the second standby machine in the mapping table, and the to-be-registered The identifier of the storage unit is added to the entry corresponding to the identifier of the second standby machine.
  • the backup machine shares the storage medium
  • the user may issue an instruction to the standby machine, and after receiving the instruction, the standby machine first determines whether the local cache has the corresponding instruction.
  • the storage unit if not present, needs to read the corresponding storage unit from the storage device.
  • the standby device reads the corresponding storage unit from the storage device to the local cache, the storage unit in the local cache will change.
  • the standby in order to inform the control node of the change of the storage unit in the local cache of the standby, the standby generates a registration request correspondingly, and sends the registration request to the control node, where the registration request includes Describe the identifier of the standby machine and the identifier of the storage unit to be registered (ie, the identifier of the read-in storage unit).
  • the control node After receiving the registration request, the control node determines, according to the identifier of the standby device, a mapping table entry corresponding to the identifier of the standby device, and adds the identifier of the storage unit to be registered to the entry corresponding to the identifier of the standby device. In the implementation of the update of the mapping table.
  • the control node receives a deletion request sent by the third standby machine, where the deletion request includes an identifier of the third standby machine and an identifier of the storage unit to be deleted,
  • the three standby machines are any one of the at least one standby machine; the control node queries an entry corresponding to the identifier of the third standby machine in the mapping table, and the third standby machine
  • the identifier of the storage unit to be deleted is deleted from the entry corresponding to the identifier.
  • the standby device in a case that the storage space of the local cache of the standby machine is insufficient, the standby device eliminates a part of the storage units in the local cache based on the pre-configured elimination policy, where the elimination refers to the storage.
  • the data in the cell is deleted to get storage space.
  • the pre-configured elimination policy may be a storage time-based policy.
  • the standby device detects that the storage space of the local cache is insufficient (for example, the detected storage amount is greater than a preset threshold), then the standby device is configured. The storage time of the storage unit in the local cache is detected, and the storage unit whose storage time is longer than the preset duration is eliminated.
  • the pre-configured elimination policy may also be a priority-based policy.
  • the standby device detects that the storage space of the local cache is insufficient, the standby device detects the priority of the storage unit in the local cache. Level, the storage unit with priority lower than the preset level is eliminated.
  • the backup device generates a deletion request correspondingly, and sends the deletion request to the control node, where the deletion request includes an identifier of the standby machine and an identifier of a storage unit to be deleted (that is, the eliminated storage unit Identification).
  • the deletion request includes an identifier of the standby machine and an identifier of a storage unit to be deleted (that is, the eliminated storage unit Identification).
  • the mapping table entry corresponding to the identifier of the standby device is determined according to the identifier of the standby device, and the identifier of the storage unit to be deleted is deleted in the entry corresponding to the identifier of the standby device. Update of the mapping table.
  • the standby machine can feed back the registration/deletion request to the control node according to its specific situation, and the control node updates the mapping table, so that the control node learns the latest state of the standby machine in the database system, thereby ensuring The database system is more practical and reliable.
  • control node is a plurality of physical servers, and different physical servers are connected to different standby units, and each standby unit includes one or more standby units, and each standby unit is configured.
  • the host sends an operation log to the plurality of physical servers.
  • the standby device performs a registration request to the physical server according to the identifier of the corresponding physical server.
  • the standby device sends a delete request to the physical server according to the identifier of the corresponding physical server.
  • each control node can govern different standby units.
  • the operation log is separately distributed to the plurality of control nodes, and after receiving the operation log, the control node manages the operation log and prepares each of the standby units.
  • the machine separately sends the required operation records (redo logs), and each standby machine updates the corresponding pages based on the required operation records, so that the pages involved in the operation logs can be consistent in the host and standby machines in the database system.
  • the introduction of multiple control nodes will more effectively reduce the consumption of the main/standby CPU and the waste of network resources.
  • an embodiment of the present invention provides a control node, including: a memory, and a processor, a transmitter, and a receiver coupled to the memory, wherein: the transmitter is configured to send data to the outside, and the receiving The device is configured to receive externally transmitted data, the memory is used to store implementation code of the method described in the first aspect, and related data (such as an operation log, etc.), the processor is configured to execute program code stored in the memory, that is, execute The method described in the first aspect.
  • an embodiment of the present invention provides a control node, including an obtaining unit, a processing unit, and a transmitting unit, where the functional units are used to perform the method described in the first aspect.
  • an embodiment of the present invention provides a database system, where the database system includes a host, at least one standby machine, a storage device, and a control node, where the host is respectively connected to the control node and the storage device.
  • the standby machine is respectively connected to the control node and the storage device, wherein the host is configured to send an operation log to the control node, where the operation log includes at least one operation record, and each operation record Corresponding to a storage unit, each operation record indicates a record of the host's local cache to the host or a write operation of a storage unit in the storage device; the control node is configured to receive the operation log; a standby machine, determining a first storage unit set existing in the first standby machine, and determining a second storage unit set corresponding to the at least one operation record; according to the first storage unit set and the first a storage unit of the second storage unit set generates a storage unit intersection; acquiring and storing the storage in the operation log The operation record corresponding to the meta-intersection
  • the storage unit is a page.
  • the determining, by the control node, the determining, by the determining, that the first set of storage units that are present in the first standby device comprises:
  • mapping table includes at least one entry, each entry includes an identifier of a standby machine, and the The identifier of all the storage units corresponding to the local cache of the standby machine, where one entry includes the identifier of the first standby machine and the identifier of all the storage units corresponding to the local cache of the first standby machine.
  • the database system further includes a second standby machine, configured to determine whether the local cache is in the case of receiving a read operation instruction There is a storage unit corresponding to the read operation instruction, if not, reading the corresponding storage unit from the storage device, and sending a registration request to the control node, where the registration request includes the second An identifier of the standby machine and an identifier of the corresponding storage unit, where the second standby machine is any one of the at least one standby machine; the control node is further configured to receive the registration request, and according to the The identifier of the second standby device is determined to be an entry corresponding to the identifier of the second standby device, and the identifier of the corresponding storage unit is added to the entry corresponding to the identifier of the second standby device.
  • control node includes multiple physical servers
  • the second standby device is configured with a member list, and the member list records one of the multiple physical services.
  • An identifier of the server; the second standby device sends a registration request to the physical server according to the identifier of the one physical server.
  • the database system further includes a third standby machine, where the third standby machine is configured to: when the storage unit in the local cache needs to be eliminated, Determining a storage unit to be deleted, and sending a deletion request to the control node, where the deletion request includes an identifier of the third standby machine and an identifier of the storage unit to be deleted, where the third standby machine is at least one Any one of the standby machines; the control node is further configured to receive the deletion request, and determine, according to the identifier of the third standby machine, an entry corresponding to the identifier of the third standby machine, where the third The identifier of the storage unit to be deleted is deleted from the entry corresponding to the identifier of the standby device.
  • the third standby machine is configured to: when the storage unit in the local cache needs to be eliminated, Determining a storage unit to be deleted, and sending a deletion request to the control node, where the deletion request includes an identifier of the third standby machine and an identifier of the storage unit to be
  • control node includes multiple physical servers
  • the third standby device is configured with a member list, and the member list records one of the multiple physical services.
  • the identifier of the server; the third standby machine sends a delete request to the physical server according to the identifier of the one physical server.
  • first standby device the second standby device, and the third standby device may refer to the same standby device or different standby devices.
  • an embodiment of the present invention provides a database system, where the database system includes a host, at least one standby device, and a storage device, where the host is respectively connected to the at least one standby device and the storage device.
  • the at least one standby device is connected to the storage device, and the host and the backup device share data in the storage device, where
  • the host is configured to generate an operation log, where the operation log includes at least one operation record, and each operation record indicates a record of the host performing a write operation on a local cache of the host or a storage unit in the storage device. And determining a first storage unit set corresponding to the first standby machine, and determining a second storage unit set corresponding to the at least one operation record, where the first standby machine is one of the at least one standby machine.
  • the storage unit corresponding to the first storage unit set is stored in the local cache of the first standby machine, where the corresponding storage unit is read from the storage device; and the storage unit is acquired in the operation log. Intersecting corresponding operation records, and transmitting the corresponding operation records to the first standby machine, the storage unit intersections being the first storage unit set and the second An intersection of storage units of a collection of storage units;
  • the first standby machine is configured to receive the corresponding operation record, and perform a corresponding operation on the storage unit in the local cache indicated by the corresponding operation record.
  • the performing, by the host, the determining, by the determining, that the first set of storage units that are present in the first standby device comprises:
  • mapping table includes at least one entry, each entry includes an identifier of the standby machine and the standby The identifier of all the storage units corresponding to the local cache of the machine, where one entry includes the identifier of the first standby machine and the identifier of all the storage units corresponding to the local cache of the first standby machine.
  • the mapping table includes at least one entry, including:
  • the mapping table includes at least two entries, and identifiers of all the storage units corresponding to the different entries. Inconsistent.
  • the storage unit is a page.
  • the database system further includes a second standby machine, where the second standby machine is configured to determine whether the local cache is in the case of receiving a read operation instruction There is a storage unit corresponding to the read operation instruction, if not, reading the corresponding storage unit from the storage device, and sending a registration request to the host, where the registration request includes the second standby The identifier of the machine and the identifier of the corresponding storage unit, the second standby machine is any one of the at least one standby machine; the host is further configured to receive the registration request, and according to the second The identifier of the standby device is determined to be an entry corresponding to the identifier of the second standby device, and the identifier of the corresponding storage unit is added to the entry corresponding to the identifier of the second standby device.
  • the database system further includes a third standby machine, where the third standby machine is configured to: when the storage unit in the local cache needs to be eliminated, Determining a storage unit to be deleted, and sending a deletion request to the host, where the deletion request includes an identifier of the third standby machine and an identifier of the storage unit to be deleted, where the third standby machine is at least one standby Any one of the standby machines; the host is further configured to receive the deletion request, and determine, according to the identifier of the third standby machine, an entry corresponding to the identifier of the third standby machine, where the third standby machine is The identifier of the storage unit to be deleted is deleted from the entry corresponding to the identifier.
  • first standby device the second standby device, and the third standby device may refer to the same standby device or different standby devices.
  • an embodiment of the present invention provides a computer readable storage medium storing instructions (implementation code), when executed on a computer, causing a computer to execute the foregoing according to the instructions The method of the first aspect.
  • an embodiment of the present invention provides a computer program product comprising instructions, which when executed on a computer, cause the computer to perform the method described in the first aspect above based on the instructions.
  • the host modifies the local cache of the host or the page in the storage device to generate a corresponding operation log (for example, all redo logs).
  • the control node can obtain the operation record required by each standby machine in the database system by searching the mapping table (for example, a part of the redo log required), and send the operation record to the corresponding record. Backup machine so that The standby unit updates the storage unit in the local cache.
  • the standby device can send a registration request or a deletion request to the control node according to the change of the page in the standby memory, so that the control node implements the update of the mapping table and introduces the mapping.
  • the update mechanism of the table makes the database system more practical and reliable.
  • the control node can specifically send the required operation record to the standby machine (for example, a required partial redo log), and avoid irrelevant operation records (for example, irrelevant redo logs).
  • the transmission in the communication network effectively reduces the operation record in the communication network and saves network resources.
  • the host only needs to send an operation log to the control node.
  • the operation record received by the standby device is required, so that the discarding operation for the unrelated operation record is avoided, and the embodiment of the present invention is implemented. It can effectively reduce the consumption of the main/standby CPU.
  • FIG. 1-a is a schematic structural diagram of a database system provided by the prior art
  • FIG. 1 is a schematic diagram of a process of processing a redo log in a database system provided by the prior art
  • FIG. 2 is a schematic structural diagram of a database system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a database system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of still another database system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of still another database system according to an embodiment of the present invention.
  • FIG. 3-d is a schematic diagram of still another database system according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data transmission method of a primary standby machine according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of still another method for data transmission between a master and a backup machine according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a mapping table according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a redo log processing process in a database system according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of still another method for data transmission between a master and a backup machine according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of a standby machine sending a registration request to a host according to an embodiment of the present disclosure
  • FIG. 10 is a schematic flowchart of a backup machine sending a delete request to a host according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of a control node according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of still another control node according to an embodiment of the present invention.
  • FIG. 1-a is a schematic structural diagram of a database system provided by the prior art, as shown in FIG. 1-a.
  • the database system includes a host, at least one standby device (the standby device 1 and the backup device 2 in the figure), and a storage device, where the host is respectively connected to the standby device and the storage device, and the standby device is also The storage device is connected, and the host and the standby device are connected to the Internet.
  • the database instance of the host and the standby share the database in the same storage medium, but only the host can write and update the database, and all the standbys can only read the database.
  • the host receives a write operation command issued by the user to perform a read/write transaction, it first determines whether the table of the read/write transaction involves the database.
  • the local cache such as host memory
  • the relevant page is read from the storage device into the local cache, and the related page records the table.
  • the host then writes to the table (eg, inserts and updates the table), modifies the associated page in the local cache, and records the redo (redo) log of the page modification.
  • the host sends all redo logs involved in the transaction to all standbys of the database system.
  • the standby machine receives the redo log, if the page involved in the redo log is in the local cache of the standby machine (such as the standby memory), the redo log is applied to the page involved in the local cache of the standby, and the involved The page will be updated accordingly; if the page involved in the redo log is not in the local cache of the standby, the redo log will be discarded.
  • the host and the backup device are connected to the same storage device, the user can read some pages of the storage device into the host memory through the control host, and the user can also control the device.
  • the other pages in the storage device are read into the standby memory. This may cause the pages in the host memory and the standby memory to be inconsistent.
  • the existing pages between different standby devices may also be inconsistent. Therefore, when the standby machine is performing read and write services and correspondingly modifying the pages in the host memory, in order to keep the pages in the standby memory of all standby machines in the latest state, the host will refer to the current transaction when the transaction commits. All redo logs are broadcast to all standby machines. That is to say, in all the redo logs received by the standby machine, some redo logs are irrelevant to the standby machine, so the backup machine needs to discard the part of the redo logs, which causes CPU consumption and Waste of network resources.
  • FIG. 1-b is a schematic diagram of a redo log processing flow in a database system provided by the prior art.
  • FIG. 1-b it is assumed that in an application scenario, there are five pages of page 1, page 2, page 3, page 4, and page 5 in the host memory; page 2 and page 6 in the memory of the standby machine 1 Page 7 of the three pages, the standby machine 2 has three pages of page 1, page 3, and page 6, and the host modifies the three pages of page 1, page 2, and page 3 when performing the read/write transaction.
  • the redo logs of these three pages are generated: redo log 1, redo log 2, redo log 3.
  • the redo logs of the three pages are distributed to the standby 1 and the standby 2.
  • standby 1 When standby 1 receives the redo logs of the three pages 1, 2, and 3, it finds that there is no page 1 and page 3 in the standby memory, and then redo log 1 and redo log 3 are discarded. Only redo logs are applied in page 2. .
  • Redo logs can cause a waste of network resources.
  • the host since the host sends irrelevant redo logs to the standby 1 and the standby respectively, it causes additional consumption to the CPU of the host, and the standby 1 and the standby 2 need to discard the respective unrelated redo logs, so The CPUs of the standby 1 and the standby 2 also cause additional consumption.
  • the embodiment of the invention provides a new database system, which can solve the defects in the prior art, realize effective management of the redo log, reduce the transmission of irrelevant redo logs between the primary and backup machines, and further reduce the primary/standby machine. CPU consumption and waste of network resources.
  • the database system is compared with the database system of the prior art described above. Don't include:
  • the control node may be used to obtain an operation log, the operation log includes one or more operation records, each operation record corresponds to one storage unit, and each operation record represents the host Recording a write operation to a storage unit of the storage device.
  • the operation log is a collection of all redo logs, and the storage unit is a page, so each operation record is a corresponding page.
  • the redo log the host performs read and write transactions, modifies the related page in the local cache, and records the redo log modified by the page.
  • Each redo log is an operation record corresponding to one page. When the transaction is committed, the host sends all redo logs (operation logs) involved in the transaction to the control node.
  • the control node After receiving the operation log, the control node manages the operation log, and determines a page existing in each standby machine in the database system, and then sends a redo required for the standby machine corresponding to each standby machine.
  • Log operation log
  • the host performs the read/write transaction, the three pages of page 1, page 2, and page 3 are modified, and correspondingly The operation records of these three pages are generated: redo log 1, redo log 2, redo log 3.
  • the host sends an operation log composed of the operation records of the three pages to the control node, and after receiving the operation log, the control node determines that the standby machine 1 and the standby machine 2 exist respectively.
  • the page is determined based on the operation log to determine that the operation record required by the standby machine 1 is redo log 2, and the operation record required to determine the standby machine 2 is redo log 1 and redo log 3, and then the control node sends the redo log 2 To standby 1, send redo log 1 and redo log 3 to standby 2. It can be seen that since the control node sends the required redo log to the standby machine in a targeted manner, the transmission of the irrelevant redo log in the communication network is avoided, which effectively reduces the redo log in the communication network and saves network resources.
  • the number of the control nodes may be one or more.
  • the Multiple control nodes are set up in the new database system, and each control node can govern different standby units.
  • the operation log is separately distributed to the plurality of control nodes, and after receiving the operation log, the control node manages the operation log and prepares each of the standby units.
  • the machine separately sends the required operation records (redo logs), and each standby machine updates the corresponding pages based on the required operation records, so that the pages involved in the operation logs can be consistent in the host and standby machines in the database system.
  • redo logs required operation records
  • each standby machine updates the corresponding pages based on the required operation records, so that the pages involved in the operation logs can be consistent in the host and standby machines in the database system.
  • the introduction of multiple control nodes will more effectively reduce the consumption of the main/standby CPU and the waste of network resources.
  • a storage unit is used to describe database data of a specific storage area, and the storage unit refers to a fixed-size storage space in which data is recorded.
  • the storage unit may be a page, or may be a data storage defined by other forms. Space, such as a block, a sector, and the like.
  • the technical solution of the embodiment of the present invention is described for the storage unit using the page, and other forms of the storage unit are defined (block, sector, etc.). The implementation of the method is similar to the implementation of the page and will not be explained one by one.
  • the storage unit may refer to a page, and may also be a combination of multiple pages in a specific scenario.
  • the description of the page for the storage unit is only for explaining the technical solutions of the embodiments of the present invention, and should not constitute a limitation on the scope of application of the present invention.
  • a database system provided by an embodiment of the present invention includes a host, at least one standby machine (the number of standby machines in the figure is multiple), a storage device, and a control node.
  • the host is connected to the control node and the storage device respectively, and the standby device is respectively connected to the control node and the storage device, and the host and the standby device can communicate with the outside world (for example, can be connected to the Internet) ),among them:
  • the host is configured to send an operation log to the control node, where the operation log includes at least one operation record, each operation record corresponds to one storage unit, and each operation record represents a local cache of the host to the host or Recording of a write operation by a storage unit in the storage device;
  • the control node is configured to receive the operation log; for the first standby machine, determine a first storage unit set existing in the first standby machine, and determine a second storage unit set corresponding to the at least one operation record Generating an intersection of the storage units according to the storage unit that exists in the first storage unit set and the second storage unit set; acquiring an operation record corresponding to the intersection of the storage unit in the operation log, and The corresponding operation record is sent to the first standby machine, where the first standby machine is any one of the at least one standby machine;
  • the first standby machine is configured to receive the corresponding operation record, and perform a corresponding operation on the storage unit in the local cache indicated by the corresponding operation record.
  • the determining, by the control node, the determining, by the determining, the first set of storage units that are present in the first standby device includes:
  • mapping table includes at least one entry, each entry includes an identifier of a standby machine, and the The identifier of all the storage units corresponding to the local cache of the standby machine, where one entry includes the identifier of the first standby machine and the identifier of all the storage units corresponding to the local cache of the first standby machine.
  • the database system further includes a second standby machine, configured to determine whether the read operation instruction exists in the local cache, if a read operation instruction is received Corresponding storage unit, if not present, reading the corresponding storage unit from the storage device, and sending a registration request to the control node, where the registration request includes the identifier and location of the second standby machine Describe the identifier of the corresponding storage unit, where the second standby machine is any one of the at least one standby machine;
  • the control node is further configured to receive the registration request, and determine an entry corresponding to the identifier of the second standby machine according to the identifier of the second standby machine, and add the identifier of the corresponding storage unit to the The entry corresponding to the identifier of the second standby machine.
  • the database system further includes a third standby machine, where the third standby machine is configured to determine a storage unit to be deleted if the storage unit in the local cache needs to be eliminated. And sending a deletion request to the control node, where the deletion request includes an identifier of the third standby machine and an identifier of the storage unit to be deleted,
  • the third standby machine is any one of at least one standby machine;
  • the control node is further configured to receive the deletion request, and determine, according to the identifier of the third standby machine, an entry corresponding to the identifier of the third standby machine, where the identifier of the identifier of the third standby machine is Delete the identifier of the storage unit to be deleted.
  • a database system provided by an embodiment of the present invention includes a host, multiple standby units, a storage device, and a control node, where the control node is multiple a physical server, where different physical servers are connected to different standby units, each standby unit includes one or more standby units, and the host is respectively connected to the plurality of physical servers and the storage device, and the standby units are respectively
  • the storage device is connected to a corresponding physical server, and the host and the standby device can communicate with the outside world (for example, can be connected to the Internet).
  • each physical server can have the functions described by the control node in Figure 3-a above, except that the control node in Figure 3-a governs all the standby machines in the database system, and Figure 3
  • Each physical server in -b governs a corresponding standby unit.
  • the mapping table preset by the control node in FIG. 3-a has related information of all standby machines (all standby entries), and is preset by each physical server in FIG. 3-b.
  • the mapping table has information about the standby machine in the standby unit (the entry of the standby machine in the standby unit).
  • the host is configured to separately send an operation log to the plurality of physical servers.
  • each standby machine In order to implement the jurisdiction of each physical server to the standby unit, information can be configured for each standby machine.
  • each standby machine is configured with a member list, and the member list records the standby device.
  • the standby device may send a registration request to the physical server according to the identifier of the corresponding physical server.
  • the standby device may send a deletion request to the physical server according to the identifier of the corresponding physical server.
  • control node is an independent device in the database system.
  • the control node is not necessarily an independent device.
  • the control node may be built into the host or exist as a functional module of the host.
  • the control node can be built into a standby or exist as a functional module of a standby.
  • the database system includes: including a host And at least one standby device and a storage device, wherein the host is respectively connected to the at least one standby device and the storage device, the at least one standby device is connected to the storage device, and the host and the backup device share the storage
  • the data in the device, the host and the standby machine can communicate with the outside world (such as being connectable to the Internet). among them:
  • the host is configured to generate an operation log, where the operation log includes at least one operation record, and each operation record indicates a record of the host performing a write operation on a local cache of the host or a storage unit in the storage device. And determining a first storage unit set corresponding to the first standby machine, and determining a second storage unit set corresponding to the at least one operation record, where the first standby machine is one of the at least one standby machine Machine, the first standby machine a storage unit corresponding to the first storage unit set is stored in the local cache, the corresponding storage unit is read from the storage device; an operation record corresponding to the intersection of the storage unit is obtained in the operation log, and Transmitting, to the first standby machine, the storage unit is an intersection of the storage unit of the first storage unit set and the second storage unit set;
  • the first standby machine is configured to receive the corresponding operation record, and perform a corresponding operation on the storage unit in the local cache indicated by the corresponding operation record.
  • the host determines, according to a preset mapping table, a first storage unit set that exists in the first standby machine, where the mapping table includes at least one entry, and each entry includes An identifier of a standby machine and an identifier of all the storage units corresponding to the local cache of the standby machine, where one entry includes the identifier of the first standby machine and the identifier of all the storage units corresponding to the local cache of the first standby machine.
  • the mapping table includes at least two entries, different tables. The identifiers of all storage units corresponding to the item are inconsistent.
  • the standby sends a registration request to the host.
  • the standby device sends a delete request to the host.
  • the functions of the original host and the functions of the control node can be integrated into the same server, thereby forming a host in the embodiment of the present invention.
  • the host can not only generate an operation log but also implement the functions of the above control node, thereby achieving Management of operational records in the operational log.
  • the database system includes: a host, a standby device as a control node, other backup devices, and a storage device, wherein the host is respectively connected to the control node and the storage device, and the other standby devices respectively and the control device
  • the node is connected to the storage device, and the control node is connected to the storage device, and the host, the control node, and other backup devices can be connected to the outside world (for example, can be connected to the Internet).
  • the function of the original standby machine and the function of the control node can be integrated into the same server, thereby forming a standby machine of the embodiment of the present invention, and the control node can complete the work itself as a backup machine (for example, from the storage device)
  • the functions of the above control nodes can also be implemented, thereby achieving management of the operation records in the operation log.
  • a host, a standby device, and other standby devices refer to the description in Figure 3-a, which is not detailed here.
  • the present invention provides a data transfer method for the primary and backup machines in real time, which is applied to the control node in the database system of the read-write multi-write shared storage architecture, and is described from one side, and the method includes:
  • Step S401 Acquire an operation log generated by the host.
  • control node may be a standalone device (for example, a standalone server), a host, or a standby device.
  • control node When the control node can be an independent device, when the host performs a read/write transaction, the host performs a write operation on a local cache or a storage unit of the storage device, and generates an operation log correspondingly, and the operation log includes at least one operation record, corresponding to each Operation record indicates a local cache of the host to the host or a storage list in the storage device.
  • the element performs a record of the write operation, and the host sends the operation log to the control node, and correspondingly, the control node obtains the operation log.
  • control node When the control node is a host, it can be understood that when the host performs a read/write transaction, the operation log is directly obtained after the operation log is generated.
  • control node When the control node is a standby device and the host performs a read/write transaction, after the operation log is generated, the host sends the operation log to the standby machine. Correspondingly, the standby machine obtains the operation log.
  • Step S402 determining a first storage unit set corresponding to the first standby machine, and determining a second storage unit set corresponding to the at least one operation record.
  • the first standby machine is any one of the at least one standby machine.
  • the first standby machine is a Any spare machine other than the control node in at least one standby machine.
  • a storage unit corresponding to the first storage unit set is stored in the first standby machine, and the corresponding storage unit is read from the storage device.
  • the control node determines, according to the preset mapping table, the first storage unit set that exists in the first standby machine, where the mapping table includes at least one entry, and each entry includes one standby An identifier of the storage unit and an identifier of all the storage units corresponding to the local cache of the standby machine, where an entry includes an identifier of the first standby machine and an identifier of all storage units corresponding to the local cache of the first standby machine.
  • the mapping table is used.
  • the number of the standby devices is the same as the number of the entries.
  • the identifiers of all the storage units corresponding to different entries are inconsistent.
  • Step S403 Acquire an operation record corresponding to the intersection of the storage unit in the operation log, and send the corresponding operation record to the first standby machine.
  • the intersection of the storage unit is an intersection of the storage unit of the first storage unit set and the second storage unit set, and after receiving the operation record, the first standby machine indicates the corresponding operation record.
  • the storage unit in the local cache performs the corresponding operation. That is to say, all the operation records received by the first standby machine can be applied to the local cache in the local cache, so that the relevant pages in the local cache are synchronized with the host.
  • the host modifies the local cache of the host or the page in the storage device to generate a corresponding operation log
  • the control node is an independent device or a certain device.
  • the host sends the operation log to the control node.
  • the control node obtains the operation log accordingly.
  • the control node can obtain the operation record required by each standby machine in the database system by searching the mapping table, and send the operation record to the corresponding standby machine, so that the standby machine can update the storage unit in the local cache.
  • the control node can send the operation record to the standby machine in a targeted manner, thereby avoiding the transmission of the unrelated operation record in the communication network, effectively reducing the operation record in the communication network, and saving the operation.
  • Internet resources When the control node is an independent device or a standby device, the host only needs to send an operation log (that is, all operation records) to the control node. After the processing of the control node, the operation records received by the standby device are required. Therefore, the discarding operation for the unrelated operation record is also omitted, and the embodiment of the present invention can effectively reduce the consumption of the main/standby CPU.
  • the embodiment of the present invention provides a data transmission method for the primary and backup machines, which is described from multiple sides.
  • the database system includes a host, at least one standby machine, a storage device, and a control The system is connected to the control node and the storage device, and the standby device is respectively connected to the control node and the storage device.
  • the method includes:
  • step S501 the host performs a read/write transaction.
  • the host and the standby device are servers, and the host and the backup device are connected to the storage device, that is, a storage device (shared storage medium), and the storage device may be an independent storage device ( Storage, Redundant Array of Independent Disks (RAID), or storage devices in a Storage Area Network (SAN).
  • a storage device stores a database
  • the data of the storage device is the database data.
  • the host performs transaction processing according to an operation instruction of the user (or user equipment), and the transaction may be a database transaction, for example, the database transaction is reading, editing, inserting, updating, etc. of a table in the database.
  • the host After the host receives the operation instruction of the user (or the user equipment), the host first queries the local cache (such as the host memory), and determines whether the database data indicated by the operation instruction exists in the local cache. If yes, the host directly refers to the The database data is processed; if it does not exist, the host reads the database data from the storage device into the local cache, that is, the host reads in a copy of the database data stored in the storage device. The host then processes the database data rows. After the host processes the database data rows in the local cache, the processed database data can be synchronized to the storage device.
  • the local cache such as the host memory
  • Step S502 The host generates an operation log.
  • the host can perform a write operation on the page, and the standby machine can only perform a read operation on the page.
  • the host performs a read/write transaction
  • the data row of the database is processed.
  • the host records an operation record modified by the page, where each operation record corresponds to one storage unit, and each operation record represents the host pair.
  • the host summarizes all the operation records to generate an operation log.
  • the operation is recorded as a redo log for a modified page, the operation log being a collection of redo logs for all modified pages.
  • the identifier of the page may be given to different pages, and the identifier of the page may be, for example, a page number or the like.
  • the event log includes one or more operational records, each operational record containing an identification of the corresponding page.
  • Step S503 The host sends an operation log to the control node.
  • the host sends the operation log to the control node.
  • the control node is an independent device, for example, independent. Physical server.
  • the host and the control node may perform a communication connection by using a wired or wireless manner, and configure information of the control node (such as the address of the control node) in the host, and after the host generates the operation log, based on the information Send the operation log to the control node.
  • Step S504 The control node determines a first storage unit set corresponding to the operation log.
  • the control node After the control node receives the operation log, in order to implement the management of the operation log, the control node needs to determine the storage unit set corresponding to the operation log, that is, the storage unit corresponding to the operation log needs to be determined, where the first storage is used.
  • the set of elements is intended to distinguish from the set of storage units below.
  • the control node needs to determine which pages the operation log corresponds to. In a specific implementation manner, since the operation log includes all operation records, each The operation record contains the identifier of the corresponding page, so the control node can determine which pages of the operation log correspond to by the identifier of the page.
  • Step S505 The control node determines a second storage unit set in the standby machine.
  • control node After the control node receives the operation log, in order to manage the operation log, the control node also needs to determine the storage unit set existing in the standby system in the database system, that is, determine which storage units exist in each standby machine.
  • the second set of memory locations is used here to distinguish from the set of memory cells above. Specifically, the control node needs to determine which pages exist in each standby.
  • a mapping table is pre-configured in the host, where the mapping table includes at least one entry, and each entry may include an identifier of a standby machine, and all storage units existing in the local cache of the standby machine.
  • the identifier of the standby device may be a standby number
  • the identifier of the storage unit may be a storage unit number (such as a page number)
  • the address of the standby device may be The IP/Mac address of the machine. It can be understood that the host can determine which storage units exist in any one of the standby machines in the database system through the query mapping table.
  • FIG. 6 is a simplified schematic diagram of a mapping table according to an embodiment of the present invention.
  • the mapping table includes the entry 1 and the entry 2, and the entry 1 includes the standby number ⁇ stander 1>, the page number ⁇ 1, 6, 7>, the IP address of the standby 1 or the Mac address, and the like.
  • Item 2 includes the backup machine number ⁇ stander 2>, the page number ⁇ 2, 8, 9>, the IP address/Mac address of the standby machine 2, etc., and the host can determine that the backup machine exists in the database system by querying the mapping table.
  • the standby machine 2 can also determine that the page existing in the local cache of the standby machine 1 is three pages with page numbers 1, 6, and 7, respectively, that is, the corresponding storage unit set is ⁇ page 1, page 6, page 7 ⁇ ; It is determined that the page existing in the local cache of the standby machine 2 is three pages with page numbers of 2, 8, and 9, respectively, that is, the corresponding storage unit set is ⁇ page 2, page 8, page 9 ⁇ .
  • the entries of the mapping table may cover related mapping information of all the standby machines in the database system. Therefore, by querying the mapping table, the host can determine which standby machines exist in the database system, and each standby machine exists. Which pages.
  • step S504 may be before step S505, after step S505, and at the same time as step S505.
  • step S504 may be before step S505, after step S505, and at the same time as step S505.
  • Step S506 The control node generates a storage unit intersection according to the storage unit that exists in the first storage unit set and the second storage unit set.
  • the control node determines which pages in the host are modified through step S504, and determines which pages exist in each standby machine through step S505. Therefore, the control node can determine which pages in the standby machine need to be updated.
  • the operation log includes multiple operation records, and each operation log is a redo log, and each redo log includes an identifier of a modified page of the host (for example, the redo log 1 includes the identifier of the page 1
  • the redo log 2 includes the identifier of the page 2, etc.
  • the control node finds that the operation log includes redo log 1, redo log 2, and redo log 3, that is, the host determined by the control node
  • the modified storage unit set is ⁇ Page 1, Page 2, Page 3 ⁇ .
  • the control node determines, according to the mapping table, that the standby machine 1 and the standby machine 2 exist in the database system, wherein the storage unit set existing in the standby machine 1 is determined to be ⁇ page 1, page 6, page 7 ⁇ , and the storage existing in the standby machine 2 is determined.
  • the unit collection is ⁇ page 2, page 8, page 9 ⁇ .
  • the page 1 exists in the above two storage unit sets at the same time, so the storage unit intersection is ⁇ page 1 ⁇ ; for the standby machine 2, the page 2 exists simultaneously in the above two storage unit sets.
  • the storage unit intersection is ⁇ page 2 ⁇ .
  • Step S507 The control node acquires an operation record corresponding to the intersection of the storage units in the operation log.
  • the control node may acquire an operation record corresponding to the intersection of the storage units in the operation log based on the intersection of the storage units.
  • each operation log is a redo log
  • the operation log includes redo log 1, redo log 2, and redo log 3.
  • the storage unit corresponding to standby 1 is ⁇ page 1 ⁇
  • the standby machine 2 The corresponding storage unit intersection is ⁇ page 2 ⁇ , then for the standby machine 1, the operation record acquired by the control node in the operation log is ⁇ redo log 1 ⁇ ; for the standby machine 2, the control node obtains in the operation log
  • the operation record is ⁇ redo log 2 ⁇ .
  • Step S508 The control node sends a corresponding operation record to the standby machine.
  • the control node After obtaining the operation record corresponding to the intersection of the storage units, the control node sends a corresponding operation record to the standby machine.
  • the control node may sequentially send the operation records required by the standby device to the standby device. For example, the control node queries the entries in the mapping table one by one, based on the operation. The log determines whether the standby device in the entry has a storage unit intersection, and if so, sends an operation record corresponding to the intersection of the storage unit to the standby.
  • the control records are respectively distributed to different standby machines.
  • Step S509 The standby machine performs a corresponding operation on the storage unit in the local cache indicated by the corresponding operation record.
  • the standby device After receiving the operation record, the standby device first determines whether the storage unit corresponding to the operation record exists in the local cache of the standby machine, and if so, applies the operation record to the corresponding storage unit, if not If it exists, the operation record is discarded. In the solution of the embodiment of the present invention, since the operation records received by the standby machine are the operation records required by the standby machine, the operation record discarding can be avoided.
  • the operation record 1 records a set of instructions for the host to modify the table in the locally cached page 1. Then, after receiving the operation record 1, the standby machine 1 also localizes the standby machine 1 based on the instruction set. The page 1 in the cache performs the corresponding operation, so that the page 1 in the standby machine is updated to the same state as the page 1 in the host, thereby realizing the synchronization of the same page of the master/slave machine.
  • the database system applied in the embodiment of the present invention includes a host, a control node, and a standby machine.
  • Standby 2 and storage device (not shown), there are page 1, page 2, page 3, page 4, page 5 in the host memory, page 1, page 6, page 7, memory of standby machine 2 in the memory of standby machine 1
  • page 2, page 3, and page 5 There are page 2, page 3, and page 5.
  • the node, the control node determines that the page 1, the page 2, and the page 3 in the host are modified based on the page label in the operation log, and searches the mapping table to determine that the page 1, the page 6, and the page 7 exist in the memory of the standby machine 1, so the host Obtain the redo log 1 required by the standby device 1 and send the redo log 1 to the standby device 1; determine that there are pages 2, 3, and 5 in the memory of the standby device 2, so the host obtains the standby from the operation log. Redo day required for machine 2 2 and 3 redo logs, redo logs and redo logs 2 and 3 is sent to the preparation machine 2.
  • the standby 1 After receiving the redo log 1, the standby 1 applies the redo log 1 to the page 1 in the memory of the standby 1, to implement the update of the page 1.
  • the standby machine 2 receives the redo log 2 and the redo log 3
  • the redo log 2 is applied to the page 2 in the memory of the standby 2 to implement the update of the page 2
  • the redo log 3 is applied to the page 3 in the memory of the standby 2, and the page is implemented. 3 updates.
  • the host modifies the local cache of the host or the page in the storage device, generates a corresponding operation log (all redo logs), and sends the operation log to the host.
  • Control node the control node can obtain the operation record required by each standby machine in the database system by searching the mapping table (required Part of the redo log), and send the operation record to the corresponding standby machine, so that the standby machine can update the storage unit in the local cache.
  • the control node can specifically send the required redo log to the standby machine, thereby avoiding the transmission of the irrelevant redo log in the communication network, thereby effectively reducing the redo log in the communication network. , saving network resources.
  • the host only needs to send all redo logs to the control node. After the management of the control node, the redo logs received by the standby device are required. Therefore, the discarding operation for the irrelevant redo logs is also omitted.
  • the embodiment of the invention can effectively reduce the consumption of the main/standby CPU.
  • FIG. 8 is a schematic flowchart diagram of another method for data transmission between a master and a backup machine according to an embodiment of the present invention.
  • the database system includes a host, at least one standby machine (the standby machine 1 and the backup machine 2 in the figure), a storage device (not shown), and a control node, and the host and the control node respectively Connected to the storage device, the standby device is respectively connected to the control node and the storage device.
  • the method includes:
  • Step S801 The standby device reads the storage unit from the storage device to the local cache.
  • the backup device shares the storage medium
  • the user may issue an instruction to the standby machine.
  • the standby device After receiving the instruction, the standby device first determines whether the local cache exists. If the storage unit corresponding to the instruction does not exist, the corresponding storage unit is read from the storage device, and the storage unit stores database data required by the user, that is, the standby device reads and stores the storage device. A copy of the database data in (copy).
  • the user or user equipment
  • the standby machine 1 searches the memory of the standby machine 1 based on the instruction, and finds the memory.
  • both the standby machine 1 and the standby machine 2 can judge the user (or the user equipment according to the instruction of the user (or user equipment). Whether the required page is cached locally or if it is not cached locally, the page is read from the storage device. It should be noted that different backup machines independently complete the above processes, and do not limit the order among them.
  • Step S802 The standby machine sends a registration request to the control node. Accordingly, the control node receives the registration request, and implements updating of the mapping table based on the registration request.
  • control node is an independent device, such as an independent physical server.
  • the standby device After the standby device reads the corresponding storage unit from the storage device to the local cache, the storage unit in the local cache changes.
  • the standby device in order to notify the control node of the change of the storage unit in the local cache of the standby device, the standby device generates a registration request correspondingly, and sends the registration request to the control node, where the registration request includes an identifier of the standby machine and an identifier of a storage unit to be registered (that is, the read-in The identity of the storage unit).
  • the control node After receiving the registration request, the control node determines, according to the identifier of the standby device, a mapping table entry corresponding to the identifier of the standby device, and adds the identifier of the storage unit to be registered to the entry corresponding to the identifier of the standby device. In the implementation of the update of the mapping table.
  • a registration request is generated correspondingly, where the registration request includes the identifier of the standby machine 1 and the identifier of the page 8 (page number). ).
  • the control node determines, according to the identifier of the standby device 1, that the entry corresponding to the identifier of the second standby device is the entry 1, so the control node adds the identifier of the page 8 to the entry 1 That is, the page number in the entry 1 of the mapping table is updated from ⁇ 1, 6, 7> to ⁇ 1, 6, 7, 8>, and the control node updates the mapping table.
  • both the standby machine 1 and the standby machine 2 can respectively send the registration request to the control node after generating the registration request, and accordingly, control The node responds to the registration request and completes the update of the mapping table.
  • the information of the control node may be configured in the standby machine in advance.
  • the member list is configured in the standby machine, and the member list records the identifier of the control node, and after the standby device generates the registration request, The standby device sends the registration request to the control node based on the identifier of the control node.
  • the member list in the standby machine records the identifier of the physical server corresponding to the standby unit. After the standby device generates the registration request, the standby device sends a registration request to the physical server according to the identifier of the physical server.
  • the registration request is first generated and sent to the control node.
  • the registration request facilitates the control node to update the mapping table before the page is read from the storage device to the local cache. That is to say, there is no necessary sequence between step S801 and step S802.
  • Step S803 The standby device empties the storage unit from the local cache.
  • the standby machine in a case where the storage space of the local cache of the standby machine is insufficient, the standby machine eliminates a part of the storage units in the local cache based on the pre-configured elimination policy, where the elimination refers to the storage unit. Data is deleted to get storage space.
  • the pre-configured elimination policy may be a storage time-based policy.
  • the standby device detects that the storage space of the local cache is insufficient (for example, the detected storage amount is greater than a preset threshold)
  • the standby device is configured.
  • the storage time of the storage unit in the local cache is detected, and the storage unit whose storage time is longer than the preset duration is eliminated.
  • the pre-configured elimination policy may also be a priority-based policy.
  • the standby device detects that the storage space of the local cache is insufficient, the standby device detects the priority of the storage unit in the local cache. Level, the storage unit with priority lower than the preset level is eliminated.
  • page 2 there are page 2, page 4, page 8 and page 9 in the memory of the standby 2.
  • the standby 2 determines that it needs to be eliminated based on the pre-configured elimination strategy.
  • the page is page 4, so standby 2 removes page 4 from memory, and the other pages remain unchanged.
  • both standby 1 and standby 2 can perform pages based on the pre-configured elimination policy in case of insufficient memory. Eliminated. It should be noted that different backup machines independently complete the above processes, and do not limit the order among them.
  • Step S804 The standby machine sends a deletion request to the control node. Accordingly, the control node receives the deletion request, and implements updating of the mapping table based on the deletion request.
  • the storage unit in the local cache changes.
  • the standby is generated accordingly.
  • the control node determines the mapping table entry corresponding to the identifier of the standby device according to the identifier of the standby device, and deletes the identifier of the storage unit to be deleted in the entry corresponding to the identifier of the standby device. , to achieve the update of the mapping table.
  • a deletion request is generated correspondingly, where the deletion request includes the identifier of the standby machine 2 and the identifier of the page 4 (page number).
  • the control node receives the control After the request is made, the entry corresponding to the identifier of the standby 2 is determined as the entry 2 according to the identifier of the standby 2, so the control node eliminates the identifier of the page 4 in the entry 2, that is, in the entry 2 of the mapping table.
  • the page number is updated from ⁇ 2, 4, 8, 9> to ⁇ 2, 8, 9>, and the control node updates the mapping table.
  • both the standby machine 1 and the standby machine 2 can respectively send the deletion request to the control node after generating the deletion request. Accordingly, the control node responds to the deletion request and completes the update of the mapping table.
  • the information of the control node may be configured in the standby machine in advance.
  • the member list is configured in the standby machine, and the member list records the identifier of the control node, and after the standby device generates the deletion request, The standby device sends the deletion request to the control node based on the identifier of the control node.
  • the member list in the standby machine records the identifier of the physical server corresponding to the standby unit. After the standby device generates the deletion request, the standby device sends a deletion request to the physical server according to the identifier of the physical server.
  • the standby device when the standby device detects that the memory is insufficient, the standby device determines the page to be eliminated based on the pre-configured elimination policy, and first generates a deletion request, and The deletion request is sent to the control node to facilitate the control node to update the mapping table before the page is eliminated from memory. That is to say, there is no necessary sequence between step S803 and step S804.
  • Step S805 The host performs a read/write transaction.
  • the host performs transaction processing according to an operation instruction of the user (or user equipment), and the transaction may be a database transaction, for example, the database transaction is reading, editing, inserting, updating, etc. of a table in the database.
  • the host After receiving the operation instruction of the user (or the user equipment), the host first queries the local cache (such as the host memory), and determines whether the database data indicated by the operation instruction exists in the local cache, and if yes, the host directly refers to the The database data is processed; if it does not exist, the host reads the database data from the storage device into the local cache, and then processes the database data row.
  • the local cache such as the host memory
  • Step S806 The host generates an operation log.
  • each operation record is generated for each page, where each operation record corresponds to one page, and each operation record represents a record of a page for writing operations.
  • each The operation is recorded as a redo log for a modified page.
  • the host summarizes all the operation records to generate an operation log, that is, the operation log is a collection of redo logs of all the modified pages.
  • Step S807 The host sends an operation log to the control node.
  • the host and the control node may be connected by using a wired or wireless connection.
  • the member list may be pre-configured in the host, and the member list records the identifier of the control node. After the host generates the operation log, Sending the operation log to the control node based on the identifier of the control node. It can be understood that, when the control node includes multiple physical servers, different physical servers administer different standby units. In this case, the member list in the standby machine records the identifiers of all physical servers, and the host generates operations. After the log, the host sends the operation log to each physical server separately based on the member list.
  • Step S808 The control node determines a first storage unit set corresponding to the operation log.
  • each operation record includes an identifier of the corresponding page, so the control node can determine which pages of the operation log correspond to by the identifier of the page.
  • Step S809 the control node determines a second storage unit set in the standby machine.
  • the host determines that the backup machine 1 and the standby machine 2 exist in the database system by querying the mapping table, and can determine which pages exist in the standby machine 1, and which pages exist in the standby machine 2.
  • step S808 there is no necessary sequence between step S808 and step S809.
  • Step S810 The control node generates a storage unit intersection according to the storage unit that exists in the first storage unit set and the second storage unit set.
  • control node determines, by step S808, which pages in the host are modified, and determines which pages exist in the standby 1 and the standby 2 through step S809. Therefore, the control node can separately determine the standby 1 and the standby 2 The corresponding storage unit intersection.
  • Step S811 The control node acquires an operation record corresponding to the intersection of the storage units in the operation log.
  • control node may obtain an operation record corresponding to the intersection of the storage unit in the operation log based on the intersection of the storage units.
  • Step S812 The control node sends a corresponding operation record to the standby machine.
  • the control node may sequentially send a corresponding operation record to the standby machine. For example, the control node queries the entry 1 in the mapping table, and first determines, based on the operation log, that the standby machine 1 has a storage unit intersection. Then, the operation record corresponding to the intersection of the storage unit is sent to the standby machine 1; then, the entry 2 in the mapping table is continued to be queried, and it is determined that the standby machine 2 has the intersection of the storage units, and then sends the intersection with the storage unit to the standby machine 1 Operational record.
  • control node queries the mapping table, and finds that both the standby device 1 and the standby device 2 have the intersection of the storage units, and the control node obtains the operation records corresponding to the intersection of the standby unit 1 and the standby unit 2, respectively.
  • the corresponding operation records are distributed to the standby 1 and the standby 2 from different ports.
  • Step S813 The standby machine performs a corresponding operation on the storage unit in the local cache indicated by the corresponding operation record.
  • the standby machine 1 after receiving the operation record sent by the control node, the standby machine 1 performs a corresponding operation on the storage unit in the local cache indicated by the operation record. Similarly, after receiving the operation record sent by the control node, the standby machine 2 performs a corresponding operation on the storage unit in the local cache indicated by the operation record.
  • steps S801 to S802 may appear before, during or after step S805 to step S812, and steps S803 to S804 may also appear before steps S805 to S812. In or after.
  • steps S801 to S802 may appear before, during or after step S805 to step S812, and steps S803 to S804 may also appear before steps S805 to S812. In or after.
  • steps S801 to S802 may appear before, during or after step S805 to step S812, and steps S803 to S804 may also appear before steps S805 to S812. In or after.
  • the host when the host performs a read/write transaction, the host caches the local cache or the page in the storage device, and generates a corresponding operation log (all redo logs).
  • the operation log is sent to the control node, and the control node can obtain the operation record (required partial redo log) required by each standby machine in the database system by searching the mapping table, and send the operation record to the corresponding standby machine.
  • the standby machine can send a registration request or a delete request to the control node according to the change of the page in the standby memory, so that the control node implements the update of the mapping table, and introduces an update mechanism of the mapping table. Can make the database system more practical and reliable.
  • the control node can send the required redo log to the standby machine in a targeted manner, thereby avoiding the transmission of the irrelevant redo log in the communication network, effectively reducing the redo log in the communication network, and saving the network. Resources.
  • the host only needs to send all redo logs to the control node. After the management of the control node, the redo logs received by the standby device are required. Therefore, the discarding operation for the irrelevant redo logs is also omitted.
  • the embodiment of the invention can effectively reduce the consumption of the main/standby CPU.
  • an embodiment of the present invention provides a control node 100.
  • the control node 100 includes a transmitter 1003, a receiver 1004, a memory 1002, and a processor 1001 coupled to the memory 1002.
  • the transmitter 1003, the receiver 1004, the memory 1002, and the processor 1001 may be connected by a bus or other means (in FIG. 11 as an example by a bus connection). among them:
  • the processor 1001 may be one or more central processing units (CPUs).
  • CPUs central processing units
  • One processor is used as an example in FIG. 10.
  • the processor 401 is a CPU
  • the CPU may be a single-core CPU. It can also be a multi-core CPU.
  • the memory 1002 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), and an Erasable Programmable Read Only Memory (EPROM). Or a Compact Read-Only Memory (CD-ROM), the memory 1002 is used for related instructions and data, and is also used to store program code, and the program code is specifically used to implement the embodiment of FIG. 5 or FIG. The function of the control node in ;
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • CD-ROM Compact Read-Only Memory
  • the transmitter 1003 is configured to send data to the outside;
  • the receiver 1004 is configured to receive data from the outside;
  • the processor 1001 is configured to call the program code stored in the memory 1002, and perform the following steps:
  • the receiver 1004 Acquiring, by the receiver 1004, an operation log generated by the host, and saving the operation log to the memory, wherein the operation log includes at least one operation record, and each operation record indicates a local cache or a host of the host to the host Recording a write operation of a storage unit in the storage device;
  • the corresponding operation record is transmitted to the first standby machine by the transmitter 1003.
  • the determining, by the processor 1001, determining that the first storage unit set exists in the first standby machine includes:
  • the processor 1001 performs to determine, according to a preset mapping table, a first storage unit set that exists in the first standby machine, where the mapping table includes at least one entry, and each entry includes an identifier of a standby machine. An identifier of all the storage units corresponding to the local cache of the standby machine, where one entry includes an identifier of the first standby machine and an identifier of all storage units corresponding to the local cache of the first standby machine.
  • mapping table includes at least one entry, including:
  • the mapping table includes at least two entries, and identifiers of all the storage units corresponding to the different entries. Inconsistent.
  • the processor is further configured to:
  • mapping table update request sent by the second standby machine, where the mapping table update request includes an identifier of the second standby machine and an identifier of a storage unit to be updated, where the second standby machine is Any one of the at least one standby machine;
  • An entry corresponding to the identifier of the second standby device is queried in the mapping table, and an entry corresponding to the identifier of the second standby device is updated based on the identifier of the storage unit to be updated.
  • the processor 1001 is specifically configured to:
  • a registration request sent by the second standby machine where the registration request includes an identifier of the second standby machine and an identifier of a storage unit to be registered, where the second standby machine is the at least Any one of the standby machines;
  • mapping table Querying, in the mapping table, an entry corresponding to the identifier of the second standby device, and adding the identifier of the storage unit to be registered to the entry corresponding to the identifier of the second standby device.
  • the processor 1001 is specifically configured to:
  • the receiving device 1004 receives the deletion request sent by the third standby machine, where the deletion request includes the identifier of the third standby machine and the identifier of the storage unit to be deleted, and the third standby machine is the at least one standby device. Any one of the machines in the machine;
  • mapping table Querying, in the mapping table, an entry corresponding to the identifier of the third standby device, and deleting the identifier of the storage unit to be deleted in the entry corresponding to the identifier of the third standby device.
  • the storage unit is a page.
  • control node may be an independent device in the database system, for example, the control node is an independent physical server.
  • the control node can also be a non-independent device.
  • the control node can be built into the host or exist as a functional module of the host (for example, the host and the control node are different virtual machines).
  • the control node can be built in a standby machine or exist as a functional module of a standby machine (for example, some The backup machines and control nodes are connected to the same physical server through different I/O interfaces.
  • FIG. 12 is a schematic structural diagram of another control node according to an embodiment of the present invention.
  • the control node 110 may include an obtaining unit 1101. Unit 1102 and transmitting unit 1103, wherein the description of each functional unit is as follows:
  • the obtaining unit 1101 is configured to acquire an operation log generated by the host, where the operation log includes at least one operation record, each operation record corresponds to one storage unit, and each operation record represents a local cache of the host to the host or Recording of a write operation by a storage unit in the storage device;
  • the processing unit 1102 is configured to determine a first storage unit set corresponding to the first standby machine, and determine the at least one
  • the operation unit records a corresponding second storage unit set, where the first standby machine is one of the at least one standby machine, and the first standby machine stores a storage unit corresponding to the first storage unit set; Acquiring an operation record corresponding to the intersection of the storage unit in the operation log, where the storage unit intersection is an intersection of the storage unit of the first storage unit set and the second storage unit set;
  • the transmitting unit 1103 is configured to send the corresponding operation record to the first standby machine.
  • the processing unit 1102 is configured to determine a first storage unit set that exists in the first standby machine, including:
  • the processing unit 1102 is configured to determine, according to a preset mapping table, a first storage unit set that exists in the first standby machine, where the mapping table includes at least one entry, each entry includes an identifier of a standby machine, and The identifier of all the storage units corresponding to the local cache of the standby machine, where one entry includes the identifier of the first standby machine and the identifier of all the storage units corresponding to the local cache of the first standby machine.
  • the mapping table includes at least two entries, and all storage units corresponding to different entries, where the number of the standby devices is at least two, and all the storage units corresponding to the local caches of the different standby devices are inconsistent.
  • the logo is inconsistent.
  • the obtaining unit 1101 is further configured to receive a mapping table update request sent by the second standby machine, where the mapping table update request includes an identifier of the second standby machine and an identifier of the storage unit to be updated, where The second standby machine is one of the at least one standby machine;
  • the processing unit 1102 is further configured to: query, in the mapping table, an entry corresponding to the identifier of the second standby device, and update the identifier of the second standby device based on the identifier of the storage unit to be updated. Corresponding entry.
  • the obtaining unit 1101 is further configured to receive a registration request sent by the second standby machine, where the registration request includes an identifier of the second standby machine and an identifier of a storage unit to be registered, where the second standby machine is Any one of the at least one standby machine; the processing unit 1102 is further configured to query, in the mapping table, an entry corresponding to the identifier of the second standby machine, and the storage unit to be registered The identifier is added to the entry corresponding to the identifier of the second standby machine.
  • the obtaining unit 1101 is further configured to receive the deletion request sent by the third standby machine, where the deletion request includes the identifier of the third standby machine and the identifier of the storage unit to be deleted, where the third standby machine is Any one of the at least one standby machine; the processing unit 1102 is further configured to query, in the mapping table, an entry corresponding to the identifier of the third standby machine, and corresponding to the identifier of the third standby machine The identifier of the storage unit to be deleted is deleted in the entry.
  • the storage unit is a page.
  • control node may be an independent device in the database system, for example, the control node is an independent physical server.
  • the control node can also be a non-independent device.
  • the control node can be built into the host or exist as a functional module of the host (for example, the host and the control node are different virtual machines).
  • the control node can be built in a standby machine or exist as a functional module of a standby machine (for example, some The backup machine and the control node are connected to each other through the I/O interface and coexist in the same physical server.
  • the control node is the host itself, that is to say, the host has the above The functions of the host in the embodiment also have the functions of the control node of the above embodiment.
  • control node 110 can be clearly understood by those skilled in the art, and therefore, for brevity of the description, no further details are provided herein.
  • the computer program product comprises one or more computer instructions which, when loaded and executed on a computer, produce, in whole or in part, a process or function according to an embodiment of the invention.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a network site, computer, server or data center Transmission to another network site, computer, server, or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer, or can be a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid state hard disk) or the like.
  • a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, etc.
  • an optical medium such as a DVD, etc.
  • a semiconductor medium such as a solid state hard disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种主备机数据传递方法、控制节点以及数据库系统,该方法包括:获取主机产生的操作日志,操作日志包括至少一个操作记录,每个操作记录表示主机对本地缓存或者存储设备的一个存储单元进行写操作的记录;确定第一备机对应的第一存储单元集合,确定至少一个操作记录对应的第二存储单元集合;在操作日志中获取与存储单元交集对应的操作记录,将对应的操作记录发送给第一备机,存储单元交集为所述第一存储单元集合和第二存储单元集合的存储单元的交集。采用上述方法,可实现减少主备机之间无效数据的传输,进而减少主/备机CPU的消耗以及网络资源的浪费。

Description

一种主备机数据传递方法、控制节点以及数据库系统
本申请要求于2017年01月26日提交中国专利局、申请号为201710057471.X、申请名称为“一种主备机数据传递方法、控制节点以及数据库系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及共享存储技术领域,尤其涉及一种主备机数据传递方法、控制节点以及数据库系统。
背景技术
在基于共享存储架构的集群系统中,主机(Host)和备机(standby)共享同一个存储设备,由于使用存储设备,因此主机和备机都可以使用存储设备的数据,存储设备的数据例如可以是数据库的数据,由集群软件管理主备机进行数据的访问。
其中,对于一写多读的集群系统,该集群系统以一个虚拟的IP地址对外提供服务,主机和备机都可以访问存储设备上的同一个页面,为了加快访问速度,主机和备机通常会将所需要的页面同步到本地缓存(内存)中,不过,只有主机才能对页面(page)进行写操作,备机只能对页面进行读操作。当主机提供服务时,可对存储设备上的页面进行读写,当主机修改一个页面的时候,会把相关的修改信息发送到备机,以便于备机将修改信息应用到内存,使内存所涉及到的页面与主机同一个页面保持同步。
然而,当备机数量较多时,如果主机读写事务比较频繁,那么集群系统中主机就要不断向备机广播相关的修改信息,备机也要不断地进行相对应的处理,这会导致主/备机的中央处理器(Central Processing Unit,CPU)的大量消耗以及网络带宽资源的大量占据。
发明内容
本发明实施例所提供一种主备机数据传递方法、控制节点以及数据库系统,以期减少主备机之间无效数据的传输,进而减少主/备机CPU的消耗以及网络带宽资源的浪费。
第一方面,本发明实施例提供了一种主备机数据传递方法,应用于数据库系统,所述方法包括:所述控制节点接收所述主机发送的操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录对应一个存储单元,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;对于第一备机,确定所述第一备机中存在的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合,所述第一备机为所述至少一个备机中的任意一个备机;根据同时存在于所述第一存储单元集合和所述第二存储单元集合的存储单元生成存储单元交集;在所述操作日志中获取与所述存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机。
上述第一方面从控制节点侧描述了本发明实施例所提供一种主备机数据传递方法,通过实施该方法,可实现备机只接收与该备机对应的操作记录,所以可减少主备机之间无效数据的传输,进而减少主/备机CPU的消耗以及网络带宽资源的浪费。
其中,存储单元来描述特定存储区域的数据库数据,所述存储单元是指记录有数据的一块固定大小的存储空间。结合第一方面,在一些可能的实施方式中,所述存储单元可以是页面(page),还可以是通过其他形式定义的数据存储空间,例如块(block),扇区(sector)等。
其中,在本发明实施例所描述的数据库系统中,主机可对存储单元进行写操作,备机只能对存储单元进行读操作。具体的,主机进行读写事务时,对存储单元中的数据库数据进行写操作,所述存储单元被修改,主机记录存储单元修改的操作记录,其中,每个操作记录对应一个存储单元,每个操作记录表示所述主机对所述本地缓存或者存储设备的一个存储单元进行写操作的记录。在主机完成对存储单元的修改,进行事务提交的时候,主机将所有的操作记录汇总生成操作日志。在具体的实施方式中,所述存储单元为页面,每个操作记录对应一个页面,所述操作记录例如为针对一个被修改页面的重操作(redo)日志,所述操作日志例如为所有被修改页面的redo日志的集合。
结合第一方面,在一些可能的实施方式中,所述确定所述第一备机中存在的第一存储单元集合包括:根据预设的映射表确定所述第一备机中存在的第一存储单元集合。
其中,在一具体实现方式中,在主机中预设映射表,所述映射表包括至少一个表项,每个表项可包括一个备机的标识、该备机本地缓存中存在的所有存储单元(如页面)的标识、该备机的地址等,具体的,所述备机的标识可为备机号,所述存储单元的标识可为存储单元编号(如页面号),所述备机的地址可为备机的IP/Mac地址。可以理解的,主机可通过查询映射表确定数据库系统中任意一个备机中存在的哪些存储单元,从而确定所述备机中存在的第一存储单元集合。
其中,在一具体实现方式中,由于所述操作日志包含所有的操作记录,每个操作记录包含所对应的页面的标识,所以控制节点通过所述页面的标识就可以确定所述操作日志对应哪些页面,从而确定所述操作日志对应的第二存储单元集合。
可以理解的,控制节点将第一存储单元集合和第二存储单元集合取交集,获得存储单元交集,即所述存储单元交集中的存储单元同时存在于第一存储单元集合和第二存储单元集合。
通过在控制节点中设置映射表,在操作记录中设置存储单元(如页面)的标识,那么,控制节点可以有效地判断主机更新了哪些页面,备机需要哪些页面,可以理解,主机根据备机中存在的页面将备机所需的操作记录发送给备机,由于备机接收到的操作记录均是备机所需的操作记录,所以可以避免将操作记录丢弃的情况,备机基于操作记录将相关的页面更新,从而可以实现主/备机相同的页面的数据保持同步更新,从而满足一写多读数据库系统中用户的需求。
在一具体的实施方式中,控制节点根据所述映射表确定第K备机中存在的第K存储单元集合,其中,所述第K备机为所述至少一个备机中除所述第一备机外的一个备机,所述第K存储单元集合和所述第一存储单元集合不一致。也就是说在数据库系统中,所述备机数量为至少两个,不同备机的本地缓存对应的所有存储单元的不一致,对应的,所述映射表包括至少两个表项,不同表项对应的所有存储单元的标识不一致,其中,所述不一致是指不完全相同。
结合第一方面,在一些可能的实施方式中,所述方法还包括:所述控制节点接收第二备机发送的映射表更新请求,其中,所述映射表更新请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的一个备机;
所述控制节点在所述映射表中查询与所述第二备机的标识对应的表项,以及将所述待更新的存储单元的标识增加到所述第二备机的标识对应的表项中。
具体的,所述映射表更新请求可以是注册请求或者淘汰请求。
具体的,在一些可能的实施方式中,所述控制节点接收第二备机发送的注册请求,其中,所述注册请求包括所述第二备机的标识和待注册的存储单元的标识,所述第二备机为所述至少一个备机中的任意一个备机;所述控制节点在所述映射表中查询与所述第二备机的标识对应的表项,以及将所述待注册的存储单元的标识增加到所述第二备机的标识对应的表项中。
具体的,在本发明实施例中,由于备机共享存储介质,所以用户需要查看数据库数据时,可向备机发出指令,备机收到该指令后,首先判断本地缓存是否存在所述指令对应的存储单元,若不存在,则需要从存储设备读入所述对应的存储单元。
可以理解的,如果备机从存储设备读入所述对应的存储单元至本地缓存,那么,本地缓存中的存储单元将发生变化。在这种情况下,为了向控制节点告知备机本地缓存中存储单元的变化情况,所述备机相应生成了注册请求,并向所述控制节点发送所述注册请求,所述注册请求包括所述备机的标识和待注册的存储单元的标识(亦即所述被读入的存储单元的标识)。控制节点接收所述注册请求后,根据所述备机的标识确定备机的标识对应的映射表表项,将所述待注册的存储单元的标识增加到所述备机的标识对应的表项中,实现映射表的更新。
具体的,在一些可能的实施方式中,所述控制节点接收第三备机发送的删除请求,其中,所述删除请求包括所述第三备机的标识和待删除的存储单元的标识,第三备机为所述至少一个备机中的任意一个备机;所述控制节点在所述映射表中查询与所述第三备机的标识对应的表项,以及在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
具体的,在本发明实施例中,在备机的本地缓存的存储空间不足的情况下,备机基于预先配置的淘汰策略将本地缓存中的部分存储单元淘汰,所述淘汰是指将该存储单元中的数据删除以获得存储空间。所述预先配置的淘汰策略可以是基于存储时间的策略,在一具体的实现方式中,当备机检测到本地缓存的存储空间不足(如检测到存储量大于预设阈值),那么,备机检测本地缓存中的存储单元的存储时间,将存储时间大于预设时长的存储单元淘汰。所述预先配置的淘汰策略也可以是基于优先级的策略,在另一具体的实现方式中,当备机检测到本地缓存的存储空间不足,那么,备机检测本地缓存中的存储单元的优先级,将优先级低于预设级别的存储单元淘汰。
可以理解的,如果备机从本地缓存中淘汰存储单元,那么本地缓存中的存储单元将发生变化,在这种情况下,为了向控制节点告知备机本地缓存中存储单元的变化情况,所述备机相应生成了删除请求,并向所述控制节点发送所述删除请求,所述删除请求包括所述备机的标识和待删除的存储单元的标识(亦即所述被淘汰的存储单元的标识)。控制节点接 收所述删除请求后,根据所述备机的标识确定备机的标识对应的映射表表项,在所述备机的标识对应的表项中删除所述待删除的存储单元的标识,实现映射表的更新。
可以看出,引入映射表的更新机制后,备机可根据自身具体情况向控制节点反馈注册/删除请求,控制节点更新映射表,使得控制节点获悉数据库系统中备机的最新状态,从而保证了数据库系统更加实用和可靠。
结合第一方面,在一些可能的实施方式中,所述控制节点为多个物理服务器,不同的物理服务器连接不同的备机组,每个备机组包括一个或多个备机,每个备机配置有成员列表,所述成员列表记录有备机所在的备机组对应的物理服务器的标识;
在这种实现方式中,所述主机向所述多个物理服务器分别发送操作日志。所述备机根据所述对应的物理服务器的标识执行向该物理服务器发送注册请求。所述备机根据所述对应的物理服务器的标识执行向该物理服务器发送删除请求。
在具体的实施例中,例如,当主机读写事务比较频繁导致大量redo日志产生,或者在备机数量非常多的情况下,可以在所述新的数据库系统中设置多个控制节点(即设置多个物理服务器),每个控制节点可以管辖不同的备机组。主机进行读写事务提交的时候,将操作日志分别向所述多个控制节点分发,控制节点接收到所述操作日志后,对操作日志进行管理,并向所管辖的备机组中的每个备机分别发送所需的操作记录(redo日志),每个备机再基于所需的操作记录更新相应的页面,使得数据库系统中,操作日志所涉及的页面在主机和备机中可以保持一致。多个控制节点的引入将更加有效地减少主/备机CPU的消耗以及网络资源的浪费。
第二方面,本发明实施例提供了一种控制节点,包括:存储器以及与所述存储器耦合的处理器、发射器和接收器,其中:所述发射器用于与向外部发送数据,所述接收器用于接收外部发送的数据,所述存储器用于存储第一方面描述的方法的实现代码以及相关数据(比如操作日志等),所述处理器用于执行所述存储器中存储的程序代码,即执行第一方面描述的方法。
第三方面,本发明实施例提供了又一种控制节点,包括获取单元、处理单元和发射单元,这些功能单元用于执行第一方面描述的方法。
第四方面,本发明实施例提供了一种数据库系统,所述数据库系统包括一个主机、至少一个备机、存储设备以及控制节点,所述主机分别与所述控制节点和所述存储设备连接,所述备机分别与所述控制节点和所述存储设备连接,其中,所述主机,用于向所述控制节点发送操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录对应一个存储单元,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;所述控制节点,用于接收所述操作日志;对于第一备机,确定所述第一备机中存在的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合;根据同时存在于所述第一存储单元集合和所述第二存储单元集合的存储单元生成存储单元交集;在所述操作日志中获取与所述存储单元交集对应的操作记录,以及将所述对应的操作记录发送给第一备机,所述第一备机为至少一个备机中的任意一个备机;所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
结合第四方面,在一些可能的实施方式中,所述存储单元为页面。
结合第四方面,在一些可能的实施方式中,所述控制节点执行所述确定所述第一备机中存在的第一存储单元集合包括:
所述控制节点根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
结合第四方面,在一些可能的实施方式中,所述数据库系统还包括第二备机,所述第二备机,用于在接收到读操作指令的情况下,判断所述本地缓存中是否存在所述读操作指令对应的存储单元,若不存在,则从所述存储设备中读入所述对应的存储单元,并向所述控制节点发送注册请求,所述注册请求包括所述第二备机的标识和所述对应的存储单元的标识,所述第二备机为至少一个备机中的任意一个备机;所述控制节点,还用于接收所述注册请求,并根据所述第二备机的标识确定所述第二备机的标识对应的表项,将所述对应的存储单元的标识增加到所述第二备机的标识对应的表项中。
结合第四方面,在一些可能的实施方式中,所述控制节点包括多个物理服务器,所述第二备机配置有成员列表,所述成员列表记录有所述多个物理服务中的一个物理服务器的标识;所述第二备机根据所述一个物理服务器的标识执行向该物理服务器发送注册请求。
结合第四方面,在一些可能的实施方式中,所述数据库系统还包括第三备机,所述第三备机,用于在需要对所述本地缓存中的存储单元进行淘汰的情况下,确定待删除的存储单元,并向所述控制节点发送删除请求,所述删除请求包括所述第三备机的标识和所述待删除的存储单元的标识,所述第三备机为至少一个备机中的任意一个备机;所述控制节点,还用于接收所述删除请求,并根据所述第三备机的标识确定第三备机的标识对应的表项,在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
结合第四方面,在一些可能的实施方式中,所述控制节点包括多个物理服务器,所述第三备机配置有成员列表,所述成员列表记录有所述多个物理服务中的一个物理服务器的标识;所述第三备机根据所述一个物理服务器的标识执行向该物理服务器发送删除请求。
需要说明的是,第一备机、第二备机和第三备机可以是指同一个备机,也可以是不同的备机。
第五方面,本发明实施例提供了一种数据库系统,所述数据库系统包括一个主机、至少一个备机、存储设备,所述主机分别与所述至少一个备机和所述存储设备连接,所述至少一个备机与所述存储设备连接,所述主机和备机共享所述存储设备中的数据,其中,
所述主机,用于生成操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;还用于确定第一备机对应的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机的本地缓存中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的;在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机,所述存储单元交集为所述第一存储单元集合和所述第二 存储单元集合的存储单元的交集;
所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
结合第五方面,在一些可能的实施方式中,所述主机执行所述确定所述第一备机中存在的第一存储单元集合包括:
所述主机根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
结合第五方面,在一些可能的实施方式中,所述映射表包括至少一个表项,包括:
在所述备机的数量为至少两个、且不同备机的本地缓存对应的所有存储单元不一致的情况下,所述映射表包括至少两个表项,不同表项对应的所有存储单元的标识不一致。
结合第五方面,在一些可能的实施方式中,在一些可能的实施方式中,所述存储单元为页面。
结合第五方面,在一些可能的实施方式中,所述数据库系统还包括第二备机,所述第二备机,用于在接收到读操作指令的情况下,判断所述本地缓存中是否存在所述读操作指令对应的存储单元,若不存在,则从所述存储设备中读入所述对应的存储单元,并向所述主机发送注册请求,所述注册请求包括所述第二备机的标识和所述对应的存储单元的标识,所述第二备机为至少一个备机中的任意一个备机;所述主机,还用于接收所述注册请求,并根据所述第二备机的标识确定所述第二备机的标识对应的表项,将所述对应的存储单元的标识增加到所述第二备机的标识对应的表项中。
结合第五方面,在一些可能的实施方式中,所述数据库系统还包括第三备机,所述第三备机,用于在需要对所述本地缓存中的存储单元进行淘汰的情况下,确定待删除的存储单元,并向所述主机发送删除请求,所述删除请求包括所述第三备机的标识和所述待删除的存储单元的标识,所述第三备机为至少一个备机中的任意一个备机;所述主机,还用于接收所述删除请求,并根据所述第三备机的标识确定第三备机的标识对应的表项,在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
需要说明的是,第一备机、第二备机和第三备机可以是指同一个备机,也可以是不同的备机。
第六方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有指令(实现代码),当其在计算机上运行时,可使得计算机基于所述指令执行上述第一方面所述的方法。
第七方面,本发明实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,可使得计算机基于所述指令执行上述第一方面所述的方法。
可以看出,实施本发明实施例的方案,在数据库系统中,主机在进行读写事务的情况下,修改主机的本地缓存或存储设备中的页面,生成相应操作日志(例如为所有redo日志)后,将操作日志发给控制节点,控制节点可通过查找映射表,获取数据库系统中每个备机所需的操作记录(例如为所需的部分redo日志),并将该操作记录发给对应的备机,以便 于备机更新本地缓存中的存储单元,在这个过程中,备机可以根据备机内存中页面的变化向控制节点发送注册请求或删除请求,以便于控制节点实现对映射表的更新,引入映射表的更新机制可使得数据库系统更加实用和可靠。通过本发明实施例,控制节点可有针对性地向备机发送所需的操作记录(例如为所需的部分redo日志),避免了不相关的操作记录(例如为不相关的redo日志)在通信网络中的传输,有效减少了通信网络中的操作记录,节约了网络资源。主机只需要向控制节点发送操作日志,经过控制节点的处理后,备机接收到的操作记录均是其所需的,所以避免了针对不相关的操作记录进行的丢弃操作,实施本发明实施例可以有效减少主/备机CPU的消耗。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对本发明实施例或现有技术中所需要使用的附图进行说明。
图1-a是现有技术提供的一种数据库系统结构示意图;
图1-b是现有技术提供的一种数据库系统中redo日志处理流程示意图;
图2是本发明实施例提供的一种数据库系统结构示意图;
图3-a是本发明实施例提供的一种数据库系统的示意图;
图3-b是本发明实施例提供的又一种数据库系统的示意图;
图3-c是本发明实施例提供的又一种数据库系统的示意图;
图3-d是本发明实施例提供的又一种数据库系统的示意图;
图4是本发明实施例提供的一种主备机数据传递方法流程示意图;
图5是本发明实施例提供的又一种主备机数据传递方法流程示意图;
图6是本发明实施例提供的一种映射表示意图;
图7是本发明实施例提供的一种数据库系统中redo日志处理流程示意图;
图8是本发明实施例提供的又一种主备机数据传递方法流程示意图;
图9是本发明实施例提供的一种备机向主机发送注册请求的流程示意图;
图10是本发明实施例提供的一种备机向主机发送删除请求的流程示意图;
图11是本发明实施例提供的一种控制节点结构示意图;
图12是本发明实施例提供的又一种控制节点结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。
首先介绍现有技术中基于一读多写的共享存储架构的集群系统,请参见图1-a,图1-a是现有技术提供的一种数据库系统结构示意图,如图1-a所示,该数据库系统包括一个主机、至少一个备机(图中为备机1和备机2)以及存储设备,所述主机分别与所述备机和所述存储设备连接,所述备机也和所述存储设备连接,主机、备机可与互联网连接。
在该数据库系统中,主机、备机的数据库实例(database instance)共享同一个存储介质中的数据库,但是,只有主机才能写入和更新数据库,所有备机只能读取数据库。主机在接收用户所发出的写操作指令进行读写事务时,首先判断读写事务涉及数据库的表是否 在本地缓存(如主机内存)中,如果涉及到的表不在本地缓存中,则会从存储设备读入相关的页面到本地缓存中,所述相关的页面记录有该表。然后,主机对该表进行写操作(例如对该表进行插入和更新操作),修改本地缓存中所述相关的页面,并记录页面修改的重操作(redo)日志。在所述事务提交的时候,主机把该事务涉及的所有redo日志发送给数据库系统的所有备机。备机收到该redo日志后,如果该redo日志所涉及的页面在该备机的本地缓存(如备机内存)中,将该redo日志应用到备机本地缓存所涉及的页面,该所涉及的页面就会相应地被更新;如果该redo日志所涉及的页面不在备机本地缓存中,则丢弃该redo日志。
在上述现有技术的数据库系统中,虽然主机和备机连接到同一个存储设备中,但用户可通过控制主机将存储设备中的某些页面读入到主机内存中,用户也可以通过控制备机将存储设备中的另一些页面读入到备机内存中,这导致了主机内存和备机内存中的页面可能不一致,不同备机之间的所存在的页面也可能不一致。所以当备机在进行读写业务并相应修改主机内存中的页面时,为了让所有备机的备机内存中页面都能保持在最新的状态,主机会在事务提交时把当前事务所涉及的所有redo日志广播给所有备机。也就是说,在备机所接收到的所有redo日志中,有一部分redo日志是与所述备机不相关的,所以该备机需要将所述一部分redo日志丢弃,这造成了CPU的消耗和网络资源的浪费。
举例来说,参见图1-b,图1-b是现有技术提供的数据库系统中redo日志处理流程示意图。如图1-b所示,假设在某种应用场景中,主机内存中有页面1、页面2、页面3、页面4、页面5这五个页面;备机1内存中有页面2、页面6、页面7这三个页面,备机2内存中有页面1、页面3、页面6这三个页面,主机在执行读写事务时,修改了页面1、页面2、页面3这三个页面,且相应地生成了这三个页面的redo日志:redo日志1,redo日志2,redo日志3。主机在事务提交的时候,把这三个页面的redo日志分发给备机1和备机2。
备机1收到1、2、3这三个页面的redo日志时,发现备机内存中没有页面1、页面3,则丢弃redo日志1和redo日志3,只在页面2中应用redo日志2。
备机2收到1、2、3这三个页面的redo日志时,发现备机内存中没有页面2,则丢弃redo日志2,在页面1中应用redo日志1,在页面3中应用redo日志3。
从上述过程可以看出,由于对于备机1而言redo日志1和redo日志3是不相关的,对于备机2而言redo日志2是不相关的,所以在广播过程中传输上述不相关的redo日志会造成网络资源的浪费。另外,由于主机向备机1和备机分别发送了不相关的redo日志,所以对主机的CPU造成了额外的消耗,而备机1和备机2需要分别丢弃各自不相关的redo日志,所以对备机1和备机2的CPU也都造成了额外的消耗。显而易见的,当主机读写事务比较频繁导致大量redo日志产生,或者在备机数量非常多的情况下,集群系统中的主机和备机都要处理大量不相关的redo日志,就会导致网络资源的大量浪费以及主机和备机CPU的大量消耗。
本发明实施例提供了一种新的数据库系统,可解决现有技术中的缺陷,实现对redo日志的有效管理,减少主备机之间不相关的redo日志的传输,进而减少主/备机CPU的消耗以及网络资源的浪费。参见图2,该数据库系统与上述现有技术中的数据库系统相比,区 别包括:
(1)在数据库系统中设置控制节点,所述控制节点可用于获取操作日志,所述操作日志包括一个或多个操作记录,每个操作记录对应一个存储单元,每个操作记录表示所述主机对所述存储设备的一个存储单元进行写操作的记录,在具体的实施例中,所述操作日志为所有redo日志的集合,所述存储单元为页面,所以,每个操作记录就是对应一个页面的redo日志,主机进行读写事务,修改本地缓存中所述相关的页面,并记录页面修改的redo日志,每个redo日志就是对应一个页面的操作记录。在所述事务提交的时候,主机把该事务涉及的所有redo日志(操作日志)发送给控制节点。控制节点接收到所述操作日志后,对操作日志进行管理,并确定所述数据库系统中的每个备机中所存在的页面,然后对应于每个备机分别发送该备机所需要的redo日志(操作记录)。比如针对图1-b所描述的应用场景中,主机内存中有页面1、页面2、页面3、页面4、页面5这五个页面;备机1内存中有页面2、页面6、页面7这三个页面,备机2内存中有页面1、页面3、页面6这三个页面,主机在执行读写事务时,修改了页面1、页面2、页面3这三个页面,且相应地生成了这三个页面的操作记录:redo日志1,redo日志2,redo日志3。主机在事务提交的时候,把这三个页面的操作记录所组成的操作日志发送所述控制节点,所述控制节点接收到所述操作日志后,分别确定备机1和备机2中所存在的页面,基于所述操作日志判断备机1所需的操作记录为redo日志2,判断备机2所需的操作记录为redo日志1和redo日志3,然后所述控制节点将redo日志2发给备机1,将redo日志1和redo日志3发给备机2。可以看出,由于控制节点有针对性地向备机发送所需的redo日志,避免了不相关的redo日志在通信网络中的传输,有效减少了通信网络中的redo日志,节约了网络资源。
(2)在主机、备机上配置相应的策略,当主机需要通知备机更新相对应的页面时,不需要向备机广播操作日志,只需将操作日志发送给控制节点,所以可以有效减少主机CPU的消耗。控制节点时将每个备机所需的操作记录(redo日志)分别以单播的形式传输至该备机,该备机接收到所述所需的操作记录(redo日志)后,将所述操作记录(redo日志)应用到相应的页面,实现这些页面的更新,从而确保备机中的这些页面与主机保持同步。可以看出,备机由于没有收到不相关的redo日志,所以备机省去了针对不相关的redo日志进行的丢弃操作,所以可以有效减少备机CPU的消耗。
在具体的实施例中,所述控制节点的数量可以是一个或者多个,例如,当主机读写事务比较频繁导致大量redo日志产生,或者在备机数量非常多的情况下,可以在所述新的数据库系统中设置多个控制节点,每个控制节点可以管辖不同的备机组。主机进行读写事务提交的时候,将操作日志分别向所述多个控制节点分发,控制节点接收到所述操作日志后,对操作日志进行管理,并向所管辖的备机组中的每个备机分别发送所需的操作记录(redo日志),每个备机再基于所需的操作记录更新相应的页面,使得数据库系统中,操作日志所涉及的页面在主机和备机中可以保持一致。多个控制节点的引入将更加有效地减少主/备机CPU的消耗以及网络资源的浪费。
需要说明的是,在本发明实施例所描述的数据库系统中,使用存储单元来描述特定存储区域的数据库数据,所述存储单元是指记录有数据的一块固定大小的存储空间。在具体的实施方式中,所述存储单元可以是页面(page),还可以是通过其他形式定义的数据存储 空间,例如块(block),扇区(sector)等等。需要说明的是,在本发明后述实施例中,为了方便进行方案的详细说明,针对存储单元将使用页面对本发明实施例技术方案进行描述,对于存储单元其他形式的定义(块、扇区等等)所带来的实施方式类似于页面的实施方式,将不再一一说明。还需要说明的是,所述存储单元可以是指一个页面,在特定场景中也可以是多个页面的组合。本发明对于存储单元用页面进行描述仅仅用于解释本发明实施例的技术方案,不应构成对本发明适用范围的限制。
参见图3-a,在一种应用场景中,在本发明实施例提供的一种数据库系统包括一个主机、至少一个备机(图中备机数量为多个)、存储设备以及控制节点,所述主机分别与所述控制节点和所述存储设备连接,所述备机分别与所述控制节点和所述存储设备连接,所述主机和备机可与外界进行通信连接(如可与互联网连接),其中:
所述主机,用于向所述控制节点发送操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录对应一个存储单元,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
所述控制节点,用于接收所述操作日志;对于第一备机,确定所述第一备机中存在的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合;根据同时存在于所述第一存储单元集合和所述第二存储单元集合的存储单元生成存储单元交集;在所述操作日志中获取与所述存储单元交集对应的操作记录,以及将所述对应的操作记录发送给第一备机,所述第一备机为至少一个备机中的任意一个备机;
所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
在一具体的实现方式中,所述控制节点执行所述确定所述第一备机中存在的第一存储单元集合包括:
所述控制节点根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
在一具体实施例中,所述数据库系统还包括第二备机,所述第二备机,用于在接收到读操作指令的情况下,判断所述本地缓存中是否存在所述读操作指令对应的存储单元,若不存在,则从所述存储设备中读入所述对应的存储单元,并向所述控制节点发送注册请求,所述注册请求包括所述第二备机的标识和所述对应的存储单元的标识,所述第二备机为至少一个备机中的任意一个备机;
所述控制节点,还用于接收所述注册请求,并根据所述第二备机的标识确定所述第二备机的标识对应的表项,将所述对应的存储单元的标识增加到所述第二备机的标识对应的表项中。
在一具体实施例中,所述数据库系统还包括第三备机,所述第三备机,用于在需要对所述本地缓存中的存储单元进行淘汰的情况下,确定待删除的存储单元,并向所述控制节点发送删除请求,所述删除请求包括所述第三备机的标识和所述待删除的存储单元的标识, 所述第三备机为至少一个备机中的任意一个备机;
所述控制节点,还用于接收所述删除请求,并根据所述第三备机的标识确定第三备机的标识对应的表项,在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
参见图3-b,在又一种应用场景中,在本发明实施例提供的一种数据库系统,包括一个主机、多个备机组、存储设备以及控制节点,其中,所述控制节点为多个物理服务器,不同的物理服务器连接不同的备机组,每个备机组包括一个或多个备机,所述主机分别与所述多个物理服务器和所述存储设备连接,所述备机分别与所述存储设备和相应的物理服务器连接,所述主机和备机可与外界进行通信连接(如可与互联网连接)。
也就说,每一个物理服务器都可以具备上述图3-a中控制节点所描述的功能,所不同的是,图3-a中控制节点管辖的是数据库系统中所有的备机,而图3-b中的每一个物理服务器管辖一个对应的备机组。在具体的实施例中,图3-a中控制节点预设的映射表具有所有备机的相关信息(所有备机的表项),而在图3-b中的每一个物理服务器预设的映射表具有对应备机组中的备机的相关信息(备机组中的备机的表项)。
在这种实现方式中,所述主机用于向所述多个物理服务器分别发送操作日志。
为了实现每一个物理服务器对备机组的管辖,可对每个备机进行信息配置,在一具体的实现方式中,每个备机配置有成员列表,所述成员列表记录有备机所在的备机组对应的物理服务器的标识;
在一具体的实现方式中,所述备机可根据所述对应的物理服务器的标识向该物理服务器发送注册请求。
在一具体的实现方式中,所述备机可根据所述对应的物理服务器的标识向该物理服务器发送删除请求。
上述所举例的两种数据库系统中,控制节点都是数据库系统中独立的设备,然而,需要说明的是,在本发明实施例的具体实现形式中,控制节点未必是独立的设备。这种情况下,在一种应用场景中,控制节点可以内置于主机中,或者作为主机的一种功能模块而存在。在另一种应用场景中,控制节点可以内置于某个备机中,或者作为某个备机的一种功能模块而存在。
举例来说,当控制节点内置于主机中,或者作为主机的一种功能模块而存在时,参见图3-c,在本发明实施例提供的一种应用场景中,数据库系统包括:包括一个主机、至少一个备机、存储设备,所述主机分别与所述至少一个备机和所述存储设备连接,所述至少一个备机与所述存储设备连接,所述主机和备机共享所述存储设备中的数据,所述主机和备机可与外界进行通信连接(如可与互联网连接)。其中:
所述主机,用于生成操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;还用于确定第一备机对应的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机 的本地缓存中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的;在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
在一具体的实现方式中,所述主机根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
在一具体的实现方式中,在所述备机的数量为至少两个、且不同备机的本地缓存对应的所有存储单元不一致的情况下,所述映射表包括至少两个表项,不同表项对应的所有存储单元的标识不一致。
在一具体的实现方式中,所述备机向所述主机发送注册请求。
在一具体的实现方式中,所述备机向所述主机发送删除请求。
也就说,可将原有主机的功能和控制节点的功能整合到同一服务器中,从而形成本发明实施例的主机,主机不但可以产生操作日志,还可以实现上述控制节点的功能,从而达到对操作日志中的操作记录的管理。
又举例来说,当控制节点内置于某个备机中,或者作为某个备机的一种功能模块而存在时,参见图3-d,在本发明实施例提供的一种应用场景中,数据库系统包括:包括一个主机、作为控制节点的某个备机、其他备机、存储设备,所述主机分别与所述控制节点和所述存储设备连接,所述其他备机分别与所述控制节点和所述存储设备连接,所述控制节点与所述存储设备连接,所述主机、控制节点、其他备机可与外界进行通信连接(如可与互联网连接)。
也就说,可将原有备机的功能和控制节点的功能整合到同一服务器中,从而形成本发明实施例的某个备机,控制节点可以完成本身作为备机的工作(例如从存储设备读入页面等),还可以实现上述控制节点的功能,从而达到对操作日志中的操作记录的管理。有关主机、控制节点的某个备机、其他备机的具体实现方式可参见图3-a的描述,这里不再详述。
基于新的数据库系统,本发明实时提供了一种主备机数据传递方法,应用于一读多写共享存储架构的数据库系统中的控制节点,从单侧进行描述,所述方法包括:
步骤S401、获取所述主机产生的操作日志。
在具体的应用场景中,所述控制节点可以是独立的设备(例如独立的服务器),也可以是主机,还可以是某个备机。
当控制节点可以是独立的设备时,主机进行读写事务时,主机对本地缓存或者存储设备的一个存储单元进行写操作,并相应生成操作日志,所述操作日志包括至少一个操作记录,对应每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单 元进行写操作的记录,主机将所述操作日志发送至控制节点,相应的,控制节点获得所述操作日志。
当控制节点为主机,可以理解的,主机进行读写事务时,生成操作日志后,直接获得所述操作日志。
当控制节点为某个备机,主机进行读写事务时,生成操作日志后,主机将所述操作日志发送至该备机,相应的,该备机获得所述操作日志。
步骤S402、确定第一备机对应的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合。
其中,当控制节点为独立设备或者主机时,所述第一备机为所述至少一个备机中的任意一个备机,当控制节点为某个备机时,所述第一备机为所述至少一个备机中除了控制节点外的任意备机。所述第一备机中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的。
在具体的实施例中,控制节点根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
在具体的实施例中,当控制节点为独立设备或者主机时,在所述备机的数量为至少两个、且不同备机的本地缓存对应的所有存储单元不一致的情况下,所述映射表包括至少两个表项,备机数量与表项的数量一致,但是不同表项对应的所有存储单元的标识不一致。
步骤S403、在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机。
其中,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集,第一备机接收到所述操作记录后,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。也就是说,第一备机接收的所有操作记录在本地缓存中均可应用到本地缓存中,从而实现本地缓存中相关页面与主机保持同步。
可以看出,实施本发明实施例的方案,主机在进行读写事务的情况下,修改主机的本地缓存或存储设备中的页面,生成相应操作日志,当控制节点是独立的设备或者某个备机时,主机将操作日志发给控制节点,当控制节点是主机本身时,控制节点相应获得所述操作日志。然后,控制节点可通过查找映射表,获取数据库系统中每个备机所需的操作记录,并将该操作记录发给对应的备机,以便于备机更新本地缓存中的存储单元。也就是说,通过本发明实施例,控制节点可有针对性地向备机发送操作记录,避免了不相关的操作记录在通信网络中的传输,有效减少了通信网络中的操作记录,节约了网络资源。当控制节点是独立设备或者某个备机时,主机只需要向控制节点发送操作日志(即所有的操作记录),经过控制节点的处理后,备机接收到的操作记录均是其所需的,所以也省去了针对不相关的操作记录进行的丢弃操作,实施本发明实施例可以有效减少主/备机CPU的消耗。
基于上述新的数据库系统,本发明实施例提供了一种主备机数据传递方法,从多侧进行描述。在本发明实施例中,数据库系统包括一个主机、至少一个备机、存储设备以及控 制节点,所述主机分别与所述控制节点和所述存储设备连接,所述备机分别与所述控制节点和所述存储设备连接,参见图5,所述方法包括:
步骤S501、主机进行读写事务。
在本发明实施例所述的数据库系统中,所述主机和备机为服务器,主机和备机和所述存储设备连接即存储设备(共享存储介质),所述存储设备可以是独立的存储器(storage)、磁盘阵列(Redundant Array of Independent Disks,RAID),也可以是存储区域网(Storage Area Network,SAN)中的存储设备。通过共享存储介质,主机和备机都可以使用存储设备中的数据。比如存储设备中存储有数据库,存储设备的数据为所述数据库数据。
主机根据用户(或用户设备)的操作指令进行事务的处理,所述事务可以是数据库事务,例如所述数据库事务为对数据库中的表的读取、编辑、插入、更新等。在主机接收到用户(或用户设备)的操作指令后,主机首先查询本地缓存(如主机内存),判断本地缓存中是否存在所述操作指令所指示的数据库数据,如果存在,主机直接对所述数据库数据进行处理;如果不存在,那么主机就从存储设备中将该数据库数据读入到本地缓存中,也就是说,主机读入的是存储在存储设备中的数据库数据的副本(copy),然后,主机再对该数据库数据行处理。主机对本地缓存中的数据库数据行处理之后,可将所述经处理后的数据库数据同步到存储设备。
步骤S502、主机生成操作日志。
在本发明实施例所描述的数据库系统中,主机可对页面进行写操作,备机只能对页面进行读操作。主机进行读写事务时,对该数据库数据行处理,当所述页面被修改,主机记录页面修改的操作记录,其中,每个操作记录对应一个存储单元,每个操作记录表示所述主机对所述本地缓存或者存储设备的一个页面进行写操作的记录。在主机完成对页面的修改,进行事务提交的时候,主机将所有的操作记录汇总生成操作日志。在具体的实施方式中,所述操作记录为针对一个被修改页面的redo日志,所述操作日志为所有被修改页面的redo日志的集合。
在一具体实现方式中,可以针对不同的页面赋予页面的标识,所述页面的标识为例如可以是页面号等。所述事件日志包括一个或多个操作记录,每个操作记录包含所对应的页面的标识。
步骤S503、主机向控制节点发送操作日志。
在所述操作日志生成后,为了实现本发明实施例所描述的技术方案,主机将所述操作日志发送给控制节点,在本发明实施例中,所述控制节点为独立的设备,例如独立的物理服务器。在具体的实现方式中,主机与控制节点可通过有线或者无线的方式进行通信连接,在主机中配置控制节点的信息(例如控制节点的地址等信息),在主机生成操作日志后,基于该信息将所述操作日志发送给控制节点。
步骤S504、控制节点确定操作日志对应的第一存储单元集合。
控制节点接收到操作日志后,为了实现对操作日志的管理,控制节点需要确定所述操作日志对应的存储单元集合,也就是说,需要确定所述操作日志对应哪些存储单元,这里使用第一存储单元集合是为了与下文的存储单元集合进行区分。控制节点需要确定所述操作日志对应哪些页面,在一具体实现方式中,由于所述操作日志包含所有的操作记录,每 个操作记录包含所对应的页面的标识,所以控制节点通过所述页面的标识就可以确定所述操作日志对应哪些页面。
步骤S505、控制节点确定备机中的第二存储单元集合。
控制节点接收到操作日志后,为了实现对操作日志的管理,控制节点还需要确定数据库系统中备机所存在的存储单元集合,也就是确定每一个备机中存在哪些存储单元。这里使用第二存储单元集合是为了与上文的存储单元集合进行区分。具体的,控制节点需要确定每一个备机存在哪些页面。
在一具体实现方式中,在主机中预设有映射表,所述映射表包括至少一个表项,每个表项可包括一个备机的标识、该备机本地缓存中存在的所有存储单元的标识、该备机的地址等,具体的,所述备机的标识可为备机号,所述存储单元的标识可为存储单元编号(如页面号),所述备机的地址可为备机的IP/Mac地址。可以理解的,主机可通过查询映射表确定数据库系统中任意一个备机中存在的哪些存储单元。
举例来说,参见图6,图6为本发明实施例所提供的一种映射表的简单示意图。所述映射表包括表项1、表项2等,表项1包括备机号<备机1>、页面号<1、6、7>、备机1的IP地址/Mac地址等等,表项2包括备机号<备机2>、页面号<2、8、9>、备机2的IP地址/Mac地址等等,主机通过查询所述映射表可以确定数据库系统中存在备机1、备机2等,还可以确定备机1的本地缓存中所存在的页面为页面号分别为1、6、7的三个页面,即对应的存储单元集合为{页面1,页面6,页面7};确定备机2的本地缓存中所存在的页面为页面号分别为2、8、9的三个页面,即对应的存储单元集合为{页面2,页面8,页面9}。在所述映射表中,其表项可以涵盖数据库系统中所有备机的相关映射信息,所以,主机通过查询所述映射表,就可以确定数据库系统中存在哪些备机,每个备机中存在哪些页面。
需要说明的是,步骤S504和步骤S505之间没有必然的先后顺序,也就是说,在具体的实施例中,步骤S504可以在步骤S505之前,也可以在步骤S505之后,还可以和步骤S505同时进行,上述实施例的描述不应理解为对本发明的限制。
步骤S506、控制节点根据同时存在于第一存储单元集合和第二存储单元集合的存储单元生成存储单元交集。
具体的,控制节点通过步骤S504确定了主机中的哪些页面被修改,通过步骤S505确定了每个备机存在哪些页面,所以,控制节点可确定备机中哪些页面需要更新。例如,在一具体实现方式中,所述操作日志包括多个操作记录,每个操作日志为一个redo日志,每个redo日志包括主机被修改页面的标识(如redo日志1包括页面1的标识,redo日志2包括页面2的标识等等),那么,控制节点在接收到操作日志后,发现操作日志中包括redo日志1、redo日志2和redo日志3,也就是说,控制节点确定的主机中被修改的存储单元集合为{页面1,页面2,页面3}。控制节点根据映射表确定数据库系统中存在备机1和备机2,其中,确定备机1中存在的存储单元集合为{页面1,页面6,页面7},确定备机2中存在的存储单元集合为{页面2,页面8,页面9}。那么,对于备机1而言,页面1同时存在于上述两种存储单元集合中,故存储单元交集为{页面1};对于备机2而言,页面2同时存在于上述两种存储单元集合中,存储单元交集为{页面2}。
步骤S507、控制节点在所述操作日志中获取与存储单元交集对应的操作记录。
可以理解的,在确定了存储单元交集之后,控制节点可基于所述存储单元交集在操作日志中获取与所述存储单元交集对应的操作记录。例如,在具体的实施方式中,每个操作日志为一个redo日志,操作日志包括redo日志1、redo日志2和redo日志3,备机1对应的存储单元交集为{页面1},备机2对应的存储单元交集为{页面2},那么针对备机1,控制节点在所述操作日志中获取的操作记录为{redo日志1};针对备机2,控制节点在所述操作日志中获取的操作记录为{redo日志2}。
步骤S508、控制节点向备机发送对应的操作记录。
在获得与存储单元交集对应的操作记录后,控制节点向备机发送对应的操作记录。当备机数量为多个时,在一具体的实现方式中,控制节点可依次向备机发送备机所需要的操作记录,例如,所述控制节点逐一查询映射表中的表项,基于操作日志判断该表项中的备机是否具有存储单元交集,如果有,则向该备机发送与该存储单元交集对应的操作记录。在另一具体实现方式中,控制节点获取不同的备机所需要的操作记录后,同时向不同的备机分发各自需要的操作记录。
步骤S509、备机对对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
备机接收到操作记录后,首先判断所述操作记录所对应的存储单元是否存在于备机的本地缓存中,如果存在,则将所述操作记录应用到所述对应的存储单元中,如果不存在,则将该操作记录丢弃。在本发明实施例的方案中,由于备机接收到的操作记录均是备机所需的操作记录,所以可以避免将操作记录丢弃的情况。
举例来说,操作记录1记录有主机对本地缓存的页面1中的表进行数据修改的指令集合,那么,备机1接收到操作记录1后,也会基于该指令集合对备机1的本地缓存中的页面1执行相应的操作,使得备机中的页面1更新到与主机中的页面1相同的状态,从而实现了主/备机相同页面的同步。
为了更好地理解本发明实施例的上述步骤,下面举一个具体的实施例场景作为例子,如图7所示,在本发明实施例所应用的数据库系统包括主机、控制节点、备机1、备机2以及存储设备(图未示),主机内存中存在页面1、页面2、页面3、页面4、页面5,备机1内存中存在页面1、页面6、页面7,备机2内存中存在页面2、页面3、页面5,主机进行读写事务时,修改页面1、页面2、页面3,相应生成操作日志(包含redo日志1、redo日志2、redo日志3),并发给控制节点,控制节点基于操作日志中的页面标示确定主机中的页面1、页面2、页面3被修改,查找映射表,确定备机1中内存中存在页面1、页面6、页面7,所以主机从操作日志中获取备机1所需的redo日志1,并将redo日志1发送至备机1;确定备机2中内存中存在页面2、页面3、页面5,所以主机从操作日志中获取备机2所需的redo日志2和redo日志3,并将redo日志2和redo日志3发送至备机2。备机1接收到redo日志1后,应用redo日志1到备机1内存中的页面1,实现页面1的更新。备机2接收到redo日志2和redo日志3后,应用redo日志2到备机2内存中的页面2,实现页面2的更新;应用redo日志3到备机2内存中的页面3,实现页面3的更新。
可以看出,实施本发明实施例的方案,主机在进行读写事务的情况下,修改主机的本地缓存或存储设备中的页面,生成相应操作日志(所有redo日志)后,将操作日志发给控制节点,控制节点可通过查找映射表,获取数据库系统中每个备机所需的操作记录(所需 的部分redo日志),并将该操作记录发给对应的备机,以便于备机更新本地缓存中的存储单元。也就是说,通过本发明实施例,控制节点可有针对性地向备机发送所需的redo日志,避免了不相关的redo日志在通信网络中的传输,有效减少了通信网络中的redo日志,节约了网络资源。主机只需要向控制节点发送所有redo日志,经过控制节点的管理后,备机接收到的redo日志均是其所需的,所以也省去了针对不相关的redo日志进行的丢弃操作,实施本发明实施例可以有效减少主/备机CPU的消耗。
参见图8,图8是本发明实施例提供的另一种主备机数据传递方法的流程示意图。在本发明实施例中,数据库系统包括一个主机、至少一个备机(图中为备机1和备机2)、存储设备(图未示)以及控制节点,所述主机分别与所述控制节点和所述存储设备连接,所述备机分别与所述控制节点和所述存储设备连接,参见图8,所述方法包括:
步骤S801、备机从存储设备读入存储单元至本地缓存。
在本发明实施例中,由于备机共享存储介质,所以用户(或用户设备)需要查看数据库数据时,可向备机发出指令,备机收到该指令后,首先判断本地缓存是否存在所述指令对应的存储单元,若不存在,则从存储设备读入所述对应的存储单元,所述存储单元存储有用户所需的数据库数据,也就是说,备机读入的是存储在存储设备中的数据库数据的副本(copy)。在具体的实施例中,如图9所示,用户(或用户设备)向备机1发出指令以便于查看页面8中的数据,备机1基于该指令查找备机1的内存,发现内存中只存在页面1、页面6和页面7,不存在页面8,所以备机1访问存储设备,并从存储设备读入页面8至备机1的内存,这样,用户(或用户设备)就可以正常地查看页面8中的数据。
可以理解的,当数据库系统中有多个备机时,如具有备机1和备机2,那么备机1和备机2均可以根据用户(或用户设备)的指令判断用户(或用户设备)所需要的页面是否在本地缓存,如果不在本地缓存,则从存储设备读入该页面。需要说明的是,不同备机独立完成上述过程,彼此之间并不限定先后顺序。
步骤S802、备机向控制节点发送注册请求,相应地,控制节点接收所述注册请求,并基于所述注册请求实现对映射表的更新。
在本发明实施例中,所述控制节点为独立的设备,例如独立的物理服务器。
在备机从存储设备读入所述对应的存储单元至本地缓存后,本地缓存中的存储单元发生变化,在这种情况下,为了向控制节点告知备机本地缓存中存储单元的变化情况,所述备机相应生成了注册请求,并向所述控制节点发送所述注册请求,所述注册请求包括所述备机的标识和待注册的存储单元的标识(亦即所述被读入的存储单元的标识)。控制节点接收所述注册请求后,根据所述备机的标识确定备机的标识对应的映射表表项,将所述待注册的存储单元的标识增加到所述备机的标识对应的表项中,实现映射表的更新。
在具体的实施例中,如图9所示,备机1从存储设备读入页面8后,相应生成了注册请求,所述注册请求中包括备机1的标识和页面8的标识(页面号)。控制节点接收到所述控制请求后,根据所述备机1的标识确定第二备机的标识对应的表项为表项1,所以控制节点将所述页面8的标识增加到表项1中,也就是映射表中表项1中页面号从<1、6、7>更新为<1、6、7、8>,控制节点更新所述映射表。
当数据库系统中有多个备机时,如具有备机1和备机2,那么备机1和备机2均可在生成注册请求后分别向控制节点发送所述注册请求,相应地,控制节点响应所述注册请求,并完成映射表的更新。
在本发明的具体实现方式中,可预先在备机中配置控制节点的信息,具体的,在备机中配置成员列表,所述成员列表记录有控制节点的标识,在备机生成注册请求后,备机基于所述控制节点的标识向控制节点发送所述注册请求。可以理解的,当所述控制节点包括多个物理服务器时,不同的物理服务器管辖不同的备机组,此时备机中的成员列表记录有该备机所在备机组对应的物理服务器的标识,在备机生成注册请求后,该备机根据所述物理服务器的标识向所述物理服务器发送注册请求。
需要说明的是,在本发明的又一种具体实现方式中,备机在开始读操作的情况下,如果判断该读操作对应的页面不在内存中,则首先生成注册请求,并向控制节点发送所述注册请求以便于控制节点更新所述映射表,然后才从存储设备中读入该页面至本地缓存。也就是说,步骤S801和步骤S802之间没有必然的先后顺序。
步骤S803、备机从本地缓存中淘汰存储单元。
在本发明实施例中,在备机的本地缓存的存储空间不足的情况下,备机基于预先配置的淘汰策略将本地缓存中的部分存储单元淘汰,所述淘汰是指将该存储单元中的数据删除以获得存储空间。所述预先配置的淘汰策略可以是基于存储时间的策略,在一具体的实现方式中,当备机检测到本地缓存的存储空间不足(如检测到存储量大于预设阈值),那么,备机检测本地缓存中的存储单元的存储时间,将存储时间大于预设时长的存储单元淘汰。所述预先配置的淘汰策略也可以是基于优先级的策略,在另一具体的实现方式中,当备机检测到本地缓存的存储空间不足,那么,备机检测本地缓存中的存储单元的优先级,将优先级低于预设级别的存储单元淘汰。
举例来说,参见图10,备机2的内存中存在页面2、页面4、页面8和页面9,在备机2内存不足的情况下,备机2基于预先配置的淘汰策略确定需要淘汰的页面为页面4,所以备机2从内存淘汰页面4,其他页面保持不变。
可以理解的,当数据库系统中有多个备机时,如具有备机1和备机2,那么备机1和备机2均可在内存不足的情况下,基于预先配置的淘汰策略进行页面淘汰。需要说明的是,不同备机独立完成上述过程,彼此之间并不限定先后顺序。
步骤S804、备机向控制节点发送删除请求,相应地,控制节点接收所述删除请求,并基于所述删除请求实现对映射表的更新。
备机从本地缓存中淘汰存储单元后,由于本地缓存中的存储单元发生变化,在这种情况下,为了向控制节点告知备机本地缓存中存储单元的变化情况,所述备机相应生成了删除请求,并向所述控制节点发送所述删除请求,所述删除请求包括所述备机的标识和待删除的存储单元的标识(亦即所述被淘汰的存储单元的标识)。控制节点接收所述删除请求后,根据所述备机的标识确定备机的标识对应的映射表表项,在所述备机的标识对应的表项中删除所述待删除的存储单元的标识,实现映射表的更新。
在具体的实施例中,如图10所示,备机2从内存中淘汰页面4后,相应生成了删除请求,所述删除请求中包括备机2的标识和页面4的标识(页面号)。控制节点接收到所述控 制请求后,根据所述备机2的标识确定备机2的标识对应的表项为表项2,所以控制节点在表项2中淘汰页面4的标识,也就是映射表中表项2中页面号从<2、4、8、9>更新为<2、8、9>,控制节点更新所述映射表。
可以理解的,当数据库系统中有多个备机时,如具有备机1和备机2,那么备机1和备机2均可在生成删除请求后分别向控制节点发送所述删除请求,相应地,控制节点响应所述删除请求,并完成映射表的更新。
在本发明的具体实现方式中,可预先在备机中配置控制节点的信息,具体的,在备机中配置成员列表,所述成员列表记录有控制节点的标识,在备机生成删除请求后,备机基于所述控制节点的标识向控制节点发送所述删除请求。可以理解的,当所述控制节点包括多个物理服务器时,不同的物理服务器管辖不同的备机组,此时备机中的成员列表记录有该备机所在备机组对应的物理服务器的标识,在备机生成删除请求后,该备机根据所述物理服务器的标识向所述物理服务器发送删除请求。
需要说明的是,在本发明的又一种具体实现方式中,在备机检测到内存不足的情况下,备机基于预先配置的淘汰策略确定了需要淘汰的页面,则首先生成删除请求,并向控制节点发送所述删除请求以便于控制节点更新所述映射表,然后才从内存中淘汰该页面。也就是说,步骤S803和步骤S804之间没有必然的先后顺序。
还可以理解的是,步骤S801~步骤S802与步骤S803~步骤S804之间也没有必然的先后顺序。
步骤S805、主机进行读写事务。
主机根据用户(或用户设备)的操作指令进行事务的处理,所述事务可以是数据库事务,例如所述数据库事务为对数据库中的表的读取、编辑、插入、更新等。主机在主机接收到用户(或用户设备)的操作指令后,首先查询本地缓存(如主机内存),判断本地缓存中是否存在所述操作指令所指示的数据库数据,如果存在,主机直接对所述数据库数据进行处理;如果不存在,那么主机就从存储设备中将该数据库数据读入到本地缓存中,再对该数据库数据行处理。
步骤S806、主机生成操作日志。
主机对页面进行修改的情况下,针对每一个页面分别生成操作记录,其中,每个操作记录对应一个页面,每个操作记录表示一个页面进行写操作的记录,在具体的实施方式中,每个操作记录为针对一个被修改页面的redo日志。在主机完成对页面的修改,进行事务提交的时候,主机将所有的操作记录汇总生成操作日志,即所述操作日志为所有被修改页面的redo日志的集合。
步骤S807、主机向控制节点发送操作日志。
主机与控制节点可通过有线或者无线的方式进行通信连接,在具体的实现方式中,可在在主机中预先配置成员列表,所述成员列表记录有控制节点的标识,在主机生成操作日志后,基于所述控制节点的标识将所述操作日志发送给控制节点。可以理解的,可以理解的,当所述控制节点包括多个物理服务器时,不同的物理服务器管辖不同的备机组,此时备机中的成员列表记录有所有物理服务器的标识,在主机生成操作日志后,主机基于所述成员列表将所述操作日志分别发送给每一个物理服务器。
步骤S808、控制节点确定操作日志对应的第一存储单元集合。
在一具体实现方式中,由于所述操作日志包含所有的操作记录,每个操作记录包含所对应的页面的标识,所以控制节点通过所述页面的标识就可以确定所述操作日志对应哪些页面。
步骤S809、控制节点确定备机中的第二存储单元集合。
在一具体实现方式中,主机通过查询所述映射表,确定数据库系统中存在备机1和备机2,并且可以确定备机1中存在哪些页面,备机2中存在哪些页面。
可以理解的是,步骤S808和步骤S809之间没有必然的先后顺序。
步骤S810、控制节点根据同时存在于第一存储单元集合和第二存储单元集合的存储单元生成存储单元交集。
具体的,控制节点通过步骤S808确定了主机中的哪些页面被修改,通过步骤S809确定了备机1和备机2中分别存在哪些页面,所以,控制节点可分别确定备机1和备机2所对应的存储单元交集。
步骤S811、控制节点在所述操作日志中获取与存储单元交集对应的操作记录。
可以理解的,在分别确定了备机1和备机2对应的存储单元交集之后,控制节点可分别基于存储单元交集在操作日志中获取与该存储单元交集对应的操作记录。
步骤S812、控制节点向备机发送对应的操作记录。
在一具体的实现方式中,控制节点可依次向备机发送对应的操作记录,举例来说,所述控制节点查询映射表中的表项1,首先基于操作日志判断备机1具有存储单元交集后,向备机1发送与该存储单元交集对应的操作记录;然后继续查询映射表中的表项2,判断备机2具有存储单元交集后,再向备机1发送与该存储单元交集对应的操作记录。
在另一具体实现方式中,所述控制节点查询映射表,发现备机1和备机2均具有存储单元交集,控制节点获取备机1和备机2存储单元交集对应的操作记录后,分别从不同的端口向备机1和备机2分发各自对应的操作记录。
步骤S813、备机对对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
可以理解的,备机1接收到控制节点发送的操作记录后,对该操作记录所指示本地缓存中的存储单元执行相应的操作。同样,备机2接收到控制节点发送的操作记录后,对该操作记录所指示本地缓存中的存储单元执行相应的操作。
需要说明的是,关于步骤S805~步骤S813的具体实现方式可参考图5~图7实施例相关的描述。
还需要说明的是,在具体的应用场景中,步骤S801~步骤S802可出现在步骤S805~步骤S812之前、之中或者之后,步骤S803~步骤S804也可出现在步骤S805~步骤S812之前、之中或者之后。上述实施例的描述仅仅为了说明本发明实施例的一种应用情景,不应理解为对本发明的限制。
可以看出,实施本发明实施例的方案,在数据库系统中,主机在进行读写事务的情况下,修改主机的本地缓存或存储设备中的页面,生成相应操作日志(所有redo日志)后,将操作日志发给控制节点,控制节点可通过查找映射表,获取数据库系统中每个备机所需的操作记录(所需的部分redo日志),并将该操作记录发给对应的备机,以便于备机更新 本地缓存中的存储单元,在这个过程中,备机可以根据备机内存中页面的变化向控制节点发送注册请求或删除请求,以便于控制节点实现对映射表的更新,引入映射表的更新机制可使得数据库系统更加实用和可靠。通过本发明实施例,控制节点可有针对性地向备机发送所需的redo日志,避免了不相关的redo日志在通信网络中的传输,有效减少了通信网络中的redo日志,节约了网络资源。主机只需要向控制节点发送所有redo日志,经过控制节点的管理后,备机接收到的redo日志均是其所需的,所以也省去了针对不相关的redo日志进行的丢弃操作,实施本发明实施例可以有效减少主/备机CPU的消耗。
基于同一发明构思,本发明实施例提供一种控制节点100,请参见图11,控制节点100包括:发射器1003、接收器1004、存储器1002和与存储器1002耦合的处理器1001。发射器1003、接收器1004、存储器1002和处理器1001可通过总线或者其它方式连接(图11中以通过总线连接为例)。其中:
处理器1001,可以是一个或多个中央处理器(Central Processing Unit,CPU),图10中以一个处理器为例,在处理器401是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
存储器1002,包括但不限于是随机存储记忆体(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、或便携式只读存储器(Compact Disc Read-Only Memory,CD-ROM),该存储器1002用于相关指令及数据,还用于存储程序代码,所述程序代码具体用于实现图5或图8实施例中的所述控制节点的功能;
发射器1003用于向外部发送数据;
接收器1004用于从外部接收数据;
具体的,处理器1001用于调用存储器1002中存储的程序代码,并执行以下步骤:
利用所述接收器1004获取所述主机产生的操作日志,并保存到所述存储器中,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
确定第一备机对应的第一存储单元集合,以及确定所述至少一个操作记录对应的第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元;
在所述操作日志中获取与存储单元交集对应的操作记录,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
利用所述发射器1003将所述对应的操作记录发送给所述第一备机。
具体的,所述处理器1001执行确定所述第一备机中存在的第一存储单元集合包括:
所述处理器1001执行根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
可选的,所述映射表包括至少一个表项,包括:
在所述备机的数量为至少两个、且不同备机的本地缓存对应的所有存储单元不一致的情况下,所述映射表包括至少两个表项,不同表项对应的所有存储单元的标识不一致。
可选的,所述处理器还用于:
利用所述接收器获取第二备机发送的映射表更新请求,其中,所述映射表更新请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的任意一个备机;
在所述映射表中查询与所述第二备机的标识对应的表项,以及基于所述待更新的存储单元的标识更新所述第二备机的标识对应的表项。
在具体的实施例中,所述处理器1001具体用于:
利用所述接收器1004接收第二备机发送的注册请求,其中,所述注册请求包括所述第二备机的标识和待注册的存储单元的标识,所述第二备机为所述至少一个备机中的任意一个备机;
在所述映射表中查询与所述第二备机的标识对应的表项,以及将所述待注册的存储单元的标识增加到所述第二备机的标识对应的表项中。
在具体的实施例中,所述处理器1001具体用于:
利用所述接收器1004接收第三备机发送的删除请求,其中,所述删除请求包括所述第三备机的标识和待删除的存储单元的标识,第三备机为所述至少一个备机中的任意一个备机;
在所述映射表中查询与所述第三备机的标识对应的表项,以及在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
具体的,所述存储单元为页面。
需要说明的是,在本发明实施例中,控制节点可以是数据库系统中独立的设备,例如,所述控制节点是独立的物理服务器。控制节点也可以是非独立的设备,这种情况下,在一种应用场景中,控制节点可以内置于主机中,或者作为主机的一种功能模块而存在(例如主机和控制节点作为不同的虚拟机,通过I/O接口连接,共存于同一物理服务器);在另一种应用场景中,控制节点可以内置于某个备机中,或者作为某个备机的一种功能模块而存在(例如某个备机和控制节点作为不同的虚拟机,通过I/O接口连接,共存于同一物理服务器)。
还需要说明的,处理器1001的执行步骤以及处理器1001涉及的其他技术特征还可参照图5或图8方法实施例中所述控制节点的相关内容,这里不再赘述。
基于同一发明构思,本发明实施例提供一种控制节点110,请参见图12,图12是本发明实施例提供的又一种控制节点的结构示意图,该控制节点110可包括获取单元1101、处理单元1102和发射单元1103,其中,各个功能单元的描述如下:
获取单元1101,用于获取所述主机产生的操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录对应一个存储单元,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
处理单元1102,用于确定第一备机对应的第一存储单元集合,以及确定所述至少一个 操作记录对应的第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元;还用于在所述操作日志中获取与存储单元交集对应的操作记录,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
发射单元1103,用于将所述对应的操作记录发送给所述第一备机。
其中,处理单元1102用于确定所述第一备机中存在的第一存储单元集合,包括:
处理单元1102用于根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
其中,在所述备机的数量为至少两个、且不同备机的本地缓存对应的所有存储单元不一致的情况下,所述映射表包括至少两个表项,不同表项对应的所有存储单元的标识不一致。
其中,所述获取单元1101还用于,接收第二备机发送的映射表更新请求,其中,所述映射表更新请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的一个备机;
所述处理单元1102还用于,在所述映射表中查询与所述第二备机的标识对应的表项,以及基于所述待更新的存储单元的标识更新所述第二备机的标识对应的表项。
具体的,获取单元1101还用于接收第二备机发送的注册请求,其中,所述注册请求包括所述第二备机的标识和待注册的存储单元的标识,所述第二备机为所述至少一个备机中的任意一个备机;处理单元1102还用于在所述映射表中查询与所述第二备机的标识对应的表项,以及将所述待注册的存储单元的标识增加到所述第二备机的标识对应的表项中。
具体的,获取单元1101还用于接收第三备机发送的删除请求,其中,所述删除请求包括所述第三备机的标识和待删除的存储单元的标识,第三备机为所述至少一个备机中的任意一个备机;处理单元1102还用于在所述映射表中查询与所述第三备机的标识对应的表项,以及在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
具体的,所述存储单元为页面。
需要说明的是,在本发明实施例中,控制节点可以是数据库系统中独立的设备,例如,所述控制节点是独立的物理服务器。控制节点也可以是非独立的设备,这种情况下,在一种应用场景中,控制节点可以内置于主机中,或者作为主机的一种功能模块而存在(例如主机和控制节点作为不同的虚拟机,通过I/O接口连接,共存于同一物理服务器);在另一种应用场景中,控制节点可以内置于某个备机中,或者作为某个备机的一种功能模块而存在(例如某个备机和控制节点作为不同的虚拟机,通过I/O接口连接,共存于同一物理服务器);在又一种应用场景中,所述控制节点为主机本身,也就是说该主机除了具备上述实施例中主机的功能,还具备上述实施例控制节点的功能。
需要说明的,通过前述图5或图8实施例的详细描述,本领域技术人员可以清楚的知道控制节点110所包含的各个功能单元的实现方法,所以为了说明书的简洁,在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者任意组合来实现。当使用软件实现时,可以全部或者部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令,在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网络站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、微波等)方式向另一个网络站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,也可以是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带等)、光介质(例如DVD等)、或者半导体介质(例如固态硬盘)等等。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。

Claims (23)

  1. 一种主备机数据传递方法,其特征在于,所述方法应用于数据库系统中的控制节点,所述数据库系统还包括一个主机、控制节点、至少一个备机以及存储设备,所述主机和备机共享所述存储设备中的数据,所述方法包括:
    获取所述主机产生的操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
    确定第一备机对应的第一存储单元集合,以及确定第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的,所述第二存储单元集合包括所述至少一个操作记录对应的至少一个存储单元;
    在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的交集。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述第一备机中存在的第一存储单元集合包括:
    根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    接收第二备机发送的映射表更新请求,其中,所述映射表更新请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的一个备机;
    在所述映射表中查询与所述第二备机的标识对应的表项,以及基于所述待更新的存储单元的标识更新所述第二备机的标识对应的表项。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    根据所述映射表确定第三备机中存在的第三存储单元集合,其中,所述第三备机为所述至少一个备机中除所述第一备机外的一个备机,所述第三存储单元集合和所述第一存储单元集合不一致。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述存储单元为页面。
  6. 一种控制节点,其特征在于,包括:存储器、处理器、发射器和接收器,其中,所述存储器用于存储数据和指令,所述处理器用于调用所述存储器中存储的所述指令执行以下步骤:
    利用所述接收器获取所述主机产生的操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
    确定第一备机对应的第一存储单元集合,以及确定第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储 单元,所述第二存储单元集合包括所述至少一个操作记录对应的至少一个存储单元;
    在所述操作日志中获取与存储单元交集对应的操作记录,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的交集;
    利用所述发射器将所述对应的操作记录发送给所述第一备机。
  7. 根据权利要求6所述的控制节点,其特征在于,所述处理器执行确定所述第一备机中存在的第一存储单元集合包括:
    所述处理器执行根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
  8. 根据权利要求7所述的控制节点,其特征在于,所述处理器还用于:
    利用所述接收器获取第二备机发送的映射表更新请求,其中,所述映射表更新请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的任意一个备机;
    在所述映射表中查询与所述第二备机的标识对应的表项,以及基于所述待更新的存储单元的标识更新所述第二备机的标识对应的表项。
  9. 根据权利要求7或8所述的控制节点,其特征在于,所述处理器还用于:
    根据所述映射表确定第三备机中存在的第三存储单元集合,其中,所述第三备机为所述至少一个备机中除所述第一备机外的一个备机,所述第三存储单元集合和所述第一存储单元集合不一致。
  10. 一种控制节点,其特征在于,包括:
    获取单元,用于获取所述主机产生的操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
    处理单元,用于确定第一备机对应的第一存储单元集合,以及确定第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元,所述第二存储单元集合包括所述至少一个操作记录对应的至少一个存储单元;还用于在所述操作日志中获取与存储单元交集对应的操作记录,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
    发射单元,用于将所述对应的操作记录发送给所述第一备机。
  11. 根据权利要求10所述的控制节点,其特征在于,所述处理单元用于确定所述第一备机中存在的第一存储单元集合,具体为:
    所述处理单元用于根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括所述第一备机的标识和所述第一备机的本地缓存对应的所有存储单元的标识。
  12. 根据权利要求11所述的控制节点,其特征在于,
    所述获取单元还用于,接收第二备机发送的映射表更新请求,其中,所述映射表更新 请求包括所述第二备机的标识和待更新的存储单元的标识,所述第二备机为所述至少一个备机中的一个备机;
    所述处理单元还用于,在所述映射表中查询与所述第二备机的标识对应的表项,以及基于所述待更新的存储单元的标识更新所述第二备机的标识对应的表项。
  13. 根据权利要求11或12所述的控制节点,其特征在于,所述处理单元还用于:
    根据所述映射表确定第三备机中存在的第三存储单元集合,其中,所述第三备机为所述至少一个备机中除所述第一备机外的一个备机,所述第三存储单元集合和所述第一存储单元集合不一致。
  14. 一种数据库系统,其特征在于,所述数据库系统包括一个主机、至少一个备机、存储设备以及控制节点,所述主机和备机共享所述存储设备中的数据,其中,
    所述主机,用于向所述控制节点发送操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;
    所述控制节点,用于接收所述操作日志;确定第一备机对应的第一存储单元集合,以及确定第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的,所述第二存储单元集合包括所述至少一个操作记录对应的至少一个存储单元;在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
    所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
  15. 根据权利要求14所述的数据库系统,其特征在于,所述控制节点执行所述确定所述第一备机中存在的第一存储单元集合包括:
    所述控制节点根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
  16. 根据权利要求15所述的数据库系统,其特征在于,所述数据库系统还包括第二备机,
    所述第二备机,用于在接收到读操作指令的情况下,判断所述本地缓存中是否存在所述读操作指令对应的存储单元,若不存在,则从所述存储设备中读入所述对应的存储单元,并向所述控制节点发送注册请求,所述注册请求包括所述第二备机的标识和所述对应的存储单元的标识,所述第二备机为至少一个备机中的一个备机;
    所述控制节点,还用于接收所述注册请求,并根据所述第二备机的标识确定所述第二备机的标识对应的表项,将所述对应的存储单元的标识增加到所述第二备机的标识对应的表项中。
  17. 根据权利要求16所述的数据库系统,其特征在于,所述控制节点包括多个物理服 务器,所述第二备机配置有成员列表,所述成员列表记录有所述多个物理服务中的一个物理服务器的标识;
    所述第二备机执行向所述控制节点发送注册请求,具体为:
    所述第二备机根据所述一个物理服务器的标识执行向该物理服务器发送注册请求。
  18. 根据权利要求15至17任一项所述的数据库系统,其特征在于,所述数据库系统还包括第三备机,
    所述第三备机,用于在需要对本地缓存中的存储单元进行删除的情况下,根据所述本地缓存中的存储单元的存储时间或者优先级确定待删除的存储单元,并向所述控制节点发送删除请求,所述删除请求包括所述第三备机的标识和所述待删除的存储单元的标识,所述第三备机为至少一个备机中的一个备机;
    所述控制节点,还用于接收所述删除请求,并根据所述第三备机的标识确定第三备机的标识对应的表项,在所述第三备机的标识对应的表项中删除所述待删除的存储单元的标识。
  19. 根据权利要求15至18任一项所述的数据库系统,其特征在于,所述控制节点还用于:
    根据所述映射表确定第四备机中存在的第三存储单元集合,其中,所述第四备机为所述至少一个备机中除所述第一备机外的一个备机,所述第三存储单元集合和所述第一存储单元集合不一致。
  20. 一种数据库系统,其特征在于,所述数据库系统包括一个主机、至少一个备机、存储设备,所述主机和备机共享所述存储设备中的数据,其中,
    所述主机,用于生成操作日志,其中,所述操作日志包括至少一个操作记录,每个操作记录表示所述主机对主机的本地缓存或者所述存储设备中的一个存储单元进行写操作的记录;还用于确定第一备机对应的第一存储单元集合,以及确定第二存储单元集合,所述第一备机为所述至少一个备机中的一个备机,所述第一备机中保存有第一存储单元集合对应的存储单元,所述对应的存储单元是从所述存储设备读入的,所述第二存储单元集合包括所述至少一个操作记录对应的至少一个存储单元;在所述操作日志中获取与存储单元交集对应的操作记录,以及将所述对应的操作记录发送给所述第一备机,所述存储单元交集为所述第一存储单元集合和所述第二存储单元集合的存储单元的交集;
    所述第一备机,用于接收所述对应的操作记录,对所述对应的操作记录所指示本地缓存中的存储单元执行相应的操作。
  21. 根据权利要求20所述的数据库系统,其特征在于,所述主机执行所述确定所述第一备机中存在的第一存储单元集合包括:
    所述主机根据预设的映射表确定所述第一备机中存在的第一存储单元集合,其中,所述映射表包括至少一个表项,每个表项包括一个备机的标识和该备机的本地缓存对应的所有存储单元的标识,其中一个表项包括第一备机的标识和第一备机的本地缓存对应的所有存储单元的标识。
  22. 根据权利要求21所述的数据库系统,其特征在于,所述主机还用于:
    根据所述映射表确定第二备机中存在的第三存储单元集合,其中,所述第二备机为所 述至少一个备机中除所述第一备机外的一个备机,所述第三存储单元集合和所述第一存储单元集合不一致。
  23. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-5任一项所述的方法。
PCT/CN2017/095477 2017-01-26 2017-08-01 一种主备机数据传递方法、控制节点以及数据库系统 WO2018137327A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/522,073 US10831612B2 (en) 2017-01-26 2019-07-25 Primary node-standby node data transmission method, control node, and database system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710057471.XA CN108363641B (zh) 2017-01-26 2017-01-26 一种主备机数据传递方法、控制节点以及数据库系统
CN201710057471.X 2017-01-26

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/522,073 Continuation US10831612B2 (en) 2017-01-26 2019-07-25 Primary node-standby node data transmission method, control node, and database system

Publications (1)

Publication Number Publication Date
WO2018137327A1 true WO2018137327A1 (zh) 2018-08-02

Family

ID=62979004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095477 WO2018137327A1 (zh) 2017-01-26 2017-08-01 一种主备机数据传递方法、控制节点以及数据库系统

Country Status (3)

Country Link
US (1) US10831612B2 (zh)
CN (1) CN108363641B (zh)
WO (1) WO2018137327A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3961400A4 (en) * 2019-05-13 2022-07-06 Huawei Technologies Co., Ltd. METHOD FOR REPAIRING DATABASE SYSTEM FAILURES, DATABASE SYSTEM AND COMPUTER DEVICE

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065327A1 (en) * 2017-08-31 2019-02-28 Nicira, Inc. Efficient versioned object management
US11567923B2 (en) 2019-06-05 2023-01-31 Oracle International Corporation Application driven data change conflict handling system
US11645265B2 (en) * 2019-11-04 2023-05-09 Oracle International Corporation Model for handling object-level database transactions in scalable computing applications
CN111400100B (zh) * 2020-03-16 2021-05-28 杭州坤泽实业股份有限公司 一种分布式软件备份的管理方法及其系统
CN111651526B (zh) * 2020-08-04 2020-11-13 北京和利时系统工程有限公司 冗余前端处理器的数据同步方法、前端处理器和处理系统
CN112084261B (zh) * 2020-09-03 2024-05-10 上海达梦数据库有限公司 一种数据同步方法、系统、节点及存储介质
US11972117B2 (en) * 2021-07-19 2024-04-30 EMC IP Holding Company LLC Selecting surviving storage node based on environmental conditions
CN113434476B (zh) * 2021-08-26 2022-03-01 阿里云计算有限公司 数据同步方法、装置、设备、系统、存储介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446972A (zh) * 2008-12-12 2009-06-03 中兴通讯股份有限公司 一种动态数据同步的方法和系统
CN102081611A (zh) * 2009-11-26 2011-06-01 中兴通讯股份有限公司 一种主、备网管系统数据库同步的实现方法及装置
CN102708150A (zh) * 2012-04-12 2012-10-03 华为技术有限公司 异步复制数据的方法、装置和系统
CN105045877A (zh) * 2015-07-20 2015-11-11 深圳市深信服电子科技有限公司 数据库数据分片存储方法和装置、数据查询方法和装置
US20150347237A1 (en) * 2013-06-24 2015-12-03 Sap Se N to m host system copy

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555404A (en) * 1992-03-17 1996-09-10 Telenor As Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas
US8145686B2 (en) * 2005-05-06 2012-03-27 Microsoft Corporation Maintenance of link level consistency between database and file system
US20070220059A1 (en) * 2006-03-20 2007-09-20 Manyi Lu Data processing node
US8145838B1 (en) * 2009-03-10 2012-03-27 Netapp, Inc. Processing and distributing write logs of nodes of a cluster storage system
US8327186B2 (en) * 2009-03-10 2012-12-04 Netapp, Inc. Takeover of a failed node of a cluster storage system on a per aggregate basis
US8108343B2 (en) * 2009-04-23 2012-01-31 Microsoft Corporation De-duplication and completeness in multi-log based replication
US8069366B1 (en) * 2009-04-29 2011-11-29 Netapp, Inc. Global write-log device for managing write logs of nodes of a cluster storage system
US9805108B2 (en) * 2010-12-23 2017-10-31 Mongodb, Inc. Large distributed database clustering systems and methods
CN103019875B (zh) * 2012-12-19 2015-12-09 北京世纪家天下科技发展有限公司 一种实现数据库双主改造的方法及装置
US10747746B2 (en) * 2013-04-30 2020-08-18 Amazon Technologies, Inc. Efficient read replicas
CN103365746B (zh) * 2013-07-03 2016-12-28 华为技术有限公司 一种同步方法、设备及系统
CN104346373B (zh) * 2013-07-31 2017-12-15 华为技术有限公司 分区日志队列同步管理方法及设备
CN104021132B (zh) * 2013-12-08 2017-08-22 郑州正信科技发展股份有限公司 主备数据库数据一致性核查备份方法及其系统
CN103729442B (zh) * 2013-12-30 2017-11-24 华为技术有限公司 记录事务日志的方法和数据库引擎
US9785510B1 (en) * 2014-05-09 2017-10-10 Amazon Technologies, Inc. Variable data replication for storage implementing data backup
CN105354046B (zh) * 2015-09-15 2019-03-26 深信服科技股份有限公司 基于共享磁盘的数据库更新处理方法及系统
CN105930500A (zh) * 2016-05-06 2016-09-07 华为技术有限公司 数据库系统中事务恢复的方法与数据库管理系统
CN106354865B (zh) * 2016-09-09 2020-05-15 北京奇虎科技有限公司 一种同步主从数据库的方法、装置和系统
US11347774B2 (en) * 2017-08-01 2022-05-31 Salesforce.Com, Inc. High availability database through distributed store

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446972A (zh) * 2008-12-12 2009-06-03 中兴通讯股份有限公司 一种动态数据同步的方法和系统
CN102081611A (zh) * 2009-11-26 2011-06-01 中兴通讯股份有限公司 一种主、备网管系统数据库同步的实现方法及装置
CN102708150A (zh) * 2012-04-12 2012-10-03 华为技术有限公司 异步复制数据的方法、装置和系统
US20150347237A1 (en) * 2013-06-24 2015-12-03 Sap Se N to m host system copy
CN105045877A (zh) * 2015-07-20 2015-11-11 深圳市深信服电子科技有限公司 数据库数据分片存储方法和装置、数据查询方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3961400A4 (en) * 2019-05-13 2022-07-06 Huawei Technologies Co., Ltd. METHOD FOR REPAIRING DATABASE SYSTEM FAILURES, DATABASE SYSTEM AND COMPUTER DEVICE
US11829260B2 (en) 2019-05-13 2023-11-28 Huawei Technologies Co., Ltd. Fault repair method for database system, database system, and computing device

Also Published As

Publication number Publication date
US10831612B2 (en) 2020-11-10
CN108363641A (zh) 2018-08-03
CN108363641B (zh) 2022-01-14
US20190347167A1 (en) 2019-11-14

Similar Documents

Publication Publication Date Title
WO2018137327A1 (zh) 一种主备机数据传递方法、控制节点以及数据库系统
US7480654B2 (en) Achieving cache consistency while allowing concurrent changes to metadata
CN101567805B (zh) 并行文件系统发生故障后的恢复方法
US6950915B2 (en) Data storage subsystem
CN109547512B (zh) 一种基于NoSQL的分布式Session管理的方法及装置
WO2019119212A1 (zh) 识别osd亚健康的方法、装置和数据存储系统
US9515878B2 (en) Method, medium, and system for configuring a new node in a distributed memory network
US8977703B2 (en) Clustering without shared storage
JP2020529673A (ja) 分散ストアによる高可用性データベース
US20080215836A1 (en) Method of managing time-based differential snapshot
JP2004199420A (ja) 計算機システム、磁気ディスク装置、および、ディスクキャッシュ制御方法
JP2008225765A (ja) ネットワークストレージ・システムとその管理方法及び制御プログラム
JP4201447B2 (ja) 分散処理システム
WO2023207492A1 (zh) 一种数据处理方法、装置、设备及可读存储介质
US20140244936A1 (en) Maintaining cache coherency between storage controllers
CN113010549A (zh) 基于异地多活系统的数据处理方法、相关设备及存储介质
CN104517067B (zh) 访问数据的方法、装置及系统
KR101601877B1 (ko) 분산 파일시스템에서 클라이언트가 데이터 저장에 참여하는 장치 및 방법
JP2006508459A (ja) nウェイ共用ストレージ・システムにおけるフラッシュ・コピーのためのハイパフォーマンス・ロック管理
JP2004334481A (ja) 仮想化情報管理装置
JP3848268B2 (ja) 計算機システム、計算機装置、計算機システムにおけるデータアクセス方法及びプログラム
US20210064576A1 (en) Indexing splitter for any pit replication
WO2023019953A1 (zh) 数据同步方法、系统、服务器及存储介质
CN116049306A (zh) 数据同步方法、装置、电子设备以及可读存储介质
EP4198701A1 (en) Active-active storage system and data processing method based on same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17893993

Country of ref document: EP

Kind code of ref document: A1