WO2021143039A1 - 一种确定分布式存储系统中的数据回滚时段的方法 - Google Patents
一种确定分布式存储系统中的数据回滚时段的方法 Download PDFInfo
- Publication number
- WO2021143039A1 WO2021143039A1 PCT/CN2020/095846 CN2020095846W WO2021143039A1 WO 2021143039 A1 WO2021143039 A1 WO 2021143039A1 CN 2020095846 W CN2020095846 W CN 2020095846W WO 2021143039 A1 WO2021143039 A1 WO 2021143039A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- monitoring
- time
- node
- master node
- persistence operation
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012544 monitoring process Methods 0.000 claims abstract description 281
- 230000002688 persistence Effects 0.000 claims abstract description 107
- 230000002085 persistent effect Effects 0.000 claims description 35
- 230000015654 memory Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 abstract description 12
- 230000007246 mechanism Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000011084 recovery Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
Definitions
- This application relates to the field of data processing, and in particular to a method, device, machine-readable medium, and system for determining a data rollback period in a distributed storage system.
- a distributed storage system is a physical storage resource managed by a storage system based on a client/server model. It is connected to a node through a computer network and can effectively solve the problem of data storage and management.
- a storage system fixed at a certain location is expanded to any number of locations/storage systems, and a large number of nodes form a storage system network.
- Each node can be distributed in different locations, through the network for communication and data transmission between nodes.
- users do not need to care about which node the data is stored on or from which node the data is obtained, but only need to manage and store the data in the system like a local file system.
- the embodiment of the present application provides a method for determining a data rollback period in a distributed storage system, including:
- the system includes multiple nodes.
- the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the election time of the master node. Time is used to record the time when the node was elected as the master node;
- the node names in two adjacent monitoring records are different, obtain the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the persistence operation in the monitoring record with the earlier monitoring time
- the time period between the completion time of the data is used as the data rollback period
- the node name in two adjacent monitoring records is the same and the election time of the monitoring record is different, obtain the election time and the earlier monitoring time of the monitoring record with the later monitoring time in the two adjacent monitoring records
- the time period between the completion time of the persistence operation in the monitoring record is used as the data rollback period.
- the completion time of the persistence operation is the minimum value among the completion times of the persistence operation of each node in the system.
- the completion time of the persistence operation is the median of the completion time of the persistence operation of each node in the system.
- the multiple nodes include a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
- it also includes:
- the slave node with the longest completion time of the persistence operation is selected from the slave nodes as the new master node.
- it also includes:
- the persistent operation of the system is periodically monitored based on an adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
- the embodiment of the present application also provides an apparatus for determining a data rollback period in a distributed storage system, including:
- Monitoring module the monitoring module is used to periodically monitor the persistence operation of the system and generate monitoring records.
- the system includes multiple nodes.
- the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and Monitoring time and the election time of the master node, where the election time is used to record the time when the node was elected the master node;
- Comparison module the comparison module is used to compare the node names in two adjacent monitoring records at the monitoring time
- the acquiring module is used to obtain the monitoring time and the earlier monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records when the node names in the two adjacent monitoring records are different
- the time period between the completion time of the persistence operation in the monitoring record is used as the data rollback period
- the acquisition module is also used to obtain the election of the monitoring record with the later monitoring time in the two adjacent monitoring records when the node name in the two adjacent monitoring records is the same and the election time of the monitoring record is not the same
- the time period between the time and the completion time of the persistence operation in the monitoring record with the earlier monitoring time is used as the data rollback period.
- it also includes:
- the configuration module is used to configure the completion time of the persistence operation as the minimum value among the completion times of the persistence operation of each node in the system.
- the configuration module is also used to configure the completion time of the persistence operation as the median of the completion time of the persistence operation of each node in the system.
- the configuration module is also used to configure multiple nodes as a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
- the configuration module is also used to select the slave node with the longest completion time of the persistence operation from the slave nodes as the new master node when the master node fails.
- the configuration module is further configured to periodically monitor the persistent operation of the system based on an adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
- the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
- the machine executes the above-mentioned method for determining a data rollback period in a distributed storage system.
- the embodiment of the present application also provides a system, including:
- Memory used to store instructions executed by one or more processors of the system
- the processor is one of the processors of the system and is used to execute the above-mentioned method for determining the data rollback period in the distributed storage system.
- the present invention also provides a method, device, machine-readable medium and system for determining the data rollback period in a distributed storage system, without manual intervention, can automatically discover the data rollback situation, and accurately obtain the time of data loss part. This is very useful for further data loss processing.
- Fig. 1 shows a schematic flowchart of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
- Fig. 2 shows a schematic structural diagram of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
- Fig. 3 shows a schematic structural diagram of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
- Fig. 4 shows a schematic structural diagram of an apparatus for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
- Fig. 5 shows a block diagram of a system according to some embodiments of the present application.
- Fig. 6 shows a block diagram of a system on chip (SoC) according to some embodiments of the present application.
- SoC system on chip
- module can refer to or include an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) that executes one or more software or firmware programs, and /Or memory, combinational logic circuits, and/or other suitable hardware components that provide the described functions, or may be part of these hardware components.
- ASIC application specific integrated circuit
- processor shared, dedicated, or group
- processor shared, dedicated, or group
- combinational logic circuits and/or other suitable hardware components that provide the described functions, or may be part of these hardware components.
- the processor may be a microprocessor, a digital signal processor, a microcontroller, etc., and/or any combination thereof.
- the processor may be a single-core processor, a multi-core processor, etc., and/or any combination thereof.
- the embodiment of the present invention uses a persistence mechanism, which is a mechanism for converting data between a persistent state and an instantaneous state. In layman's terms, it is to persist transient data, such as cached data, into persistent data. In a storage device where the persistent data obtained based on the persistence mechanism can be permanently stored, even if the storage device is down, as long as the persistent data is not damaged, the persistent data will not be lost.
- a persistence mechanism is a mechanism for converting data between a persistent state and an instantaneous state. In layman's terms, it is to persist transient data, such as cached data, into persistent data.
- transient data such as cached data
- a method, a device, a machine-readable medium, and a system for determining a data rollback period in a distributed storage system are disclosed.
- the nodes indicated in the embodiments of the present invention are used to persist output data. For example, assuming that node a is a node that performs data persistence, then after node a is executed, the data at point a can be persisted. Generally, persistent data can be stored in memory and then persisted to a storage medium, or it can be stored directly in the file system.
- a node may persist cached data into persistent data based on the persistence mechanism, and it may be to record the write operation for the cached data in the node into a persistent file.
- the above-mentioned write operations include data addition operations, deletion operations, and modification operations.
- Figure 1 shows a flow chart of the method.
- the method according to the embodiment of the present invention will be described in detail including:
- the 101 Periodically monitor the persistence operation of the system and generate monitoring records.
- the system includes multiple nodes.
- the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time;
- the system here can, but is not limited to, preset a timer or a timing threshold (time threshold), and the timer can periodically start a monitoring program/monitoring task to acquire and save information of at least one node.
- the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
- the monitoring program/monitoring task can be a cyclically executed program. Based on the above timing threshold interval, the information of at least one node is periodically acquired and saved.
- the timer or timing threshold interval can be configured in minutes or seconds. That is to say, a monitoring program/monitoring task can be started regularly at intervals of minutes or seconds.
- the monitoring information (node information) of the node is obtained by monitoring the communication port of the node through the monitoring program/monitoring task executed by the monitoring terminal.
- the communication between the monitoring terminal and the node can be realized by using, but not limited to, general message middleware, and the monitoring information (node information) of the node can be obtained by monitoring the communication port.
- the manner of monitoring the monitoring program/monitoring task executed by the monitoring terminal includes, but is not limited to, setting a timer or a timing threshold in advance, and the timer can periodically start a monitoring program/monitoring task to obtain information of at least one node.
- the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
- the monitoring program can be a cyclically executed program, and based on the interval of the above-mentioned timing threshold, the information of at least one node is periodically acquired and stored.
- the information of the node may include, but is not limited to, the name of the node, ip address, master node information, slave node information, and completion time of the persistence operation.
- the nodes in the system here can be configured to include at least one master node and at least one slave node corresponding to the master node.
- the master node can, but is not limited to, divide a large data persistence task into multiple small data persistence tasks.
- the task is distributed to at least one slave node, and each node records the completion time of the corresponding execution data persistence operation.
- the monitoring program/monitoring task records the execution time of the monitoring program/monitoring task, that is, the monitoring time; the corresponding relationship between the execution time and the information of the aforementioned node is saved as the monitoring record.
- the saved data includes but is not limited to the form of data key-value pairs, that is, the saved data has the form of key-value, where key is a data key, value is a data value, and the data value can be data in various forms.
- the saved data can be easily managed in the form of data key-value pairs.
- the two monitoring records can be the first monitoring record and the second monitoring record.
- the monitoring time of the second monitoring record is later than the monitoring time of the first monitoring record.
- obtaining the difference between the monitoring time contained in the second monitoring record and the completion time of the node's persistence operation contained in the first monitoring record That is, the time period between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time is regarded as the data rollback period.
- the above monitoring records record the information of the current master node in the node cluster.
- the slave node is based on the node.
- the cluster recovery mechanism takes over the task of the master node and continues to perform the operation of persistent data.
- the node information in the two adjacent monitoring records will be different.
- the monitoring time saved in the second monitoring record (the second monitoring record) of the two adjacent monitoring records is later than the first monitoring record.
- the monitoring time can be used as the time point when the data processing system discovers the failure of the master node, and at the same time the completion time of the persistence operation of the node information saved in the previous monitoring record (the first monitoring record) in the two adjacent monitoring records , Can be used as the starting time point of the failure of the master node.
- the difference between the starting time point of the failure of the master node and the time point when the failure is found in the master node is calculated.
- the time period between these two time points is confirmed
- the system discovers that the master node has changed during the time period, and the persistent data in the node cluster needs to be rolled back.
- the difference calculation is performed at the time point of the failure in the master node to determine the data rollback period for the rollback operation.
- the time period between the election times in the two adjacent monitoring records is acquired as the data rollback period.
- the above monitoring record records the information of the current master node in the node cluster.
- a slave node will take over the task of the master node and continue to execute persistent data.
- the predecessor’s master node After the predecessor’s master node is repaired and the slave node fails, the predecessor’s master node again takes over the identity of the master node to continue the operation of persistent data.
- the monitoring program monitors and generates two adjacent monitoring records of these two nodes.
- the nodes contained in the two adjacent monitoring records The name is the same, but the next monitoring record (the second monitoring record) in the two adjacent monitoring records, the monitoring time is later than the monitoring time of the previous monitoring record (the first monitoring record), and the one saved in the second monitoring record
- the selected time can be used as the end time point when the data processing system detects the failure of the master node.
- the completion time of the persistence operation saved in the first monitoring record can be used as the start time point when the master node fails.
- the election time in the record is calculated as the difference, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the node cluster data. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
- the information of the master node and the completion time of the node's persistence operation recorded by the master node and the slave node corresponding to the master node are periodically obtained; the persistence operation of the slave node corresponding to the master node is obtained The minimum value in the completion time; stores the information of the master node and the corresponding relationship between the minimum value.
- the saved data includes but is not limited to the form of data key-value pairs, that is, the saved data has the form of key-value, where key is a data key, value is a data value, and the data value can be data in various forms.
- the saved data can be easily managed in the form of data key-value pairs.
- the multiple nodes include a master node and at least one slave node.
- the slave node with the longest completion time of the persistence operation is selected as the successor from the slave nodes corresponding to the master node.
- the completion time of the persistence operation is the median of the completion time of the persistence operation of each node in the system.
- obtaining the median of the completion time of the persistence operation of the slave node corresponding to the master node includes: the master node and the slave node both perform the operation of persisting data, and at a certain moment, the master node fails , Select the slave node with the largest value of the completion time of the persistence operation corresponding to the master node, and take over the task of the master node to continue the operation of persisting data.
- the node information of the new master node contains the completion time of the persistence operation and is set to the median of the completion time of the persistence operation of all slave nodes corresponding to the failed master node.
- storing the information of the successor master node and the corresponding relationship between the median includes that the above maintenance program can periodically obtain and save the node information of at least one new master node and the median of the data persistence time. Correspondence between.
- both the master node and the slave node perform the operation of persisting data.
- the slave node will take over the task of the master node to continue the operation of persisting data based on the node cluster recovery mechanism. For another example, after a slave node fails, a new node is created to replace the failed slave node, and then the newly created node will replace the failed slave node to complete the operation of executing the persistent data.
- the node cluster recovery mechanism mentioned above is a mechanism established between the master node and its corresponding slave node in the node cluster system. This mechanism enables the master node and its corresponding slave node to establish a recovery takeover relationship. It can guarantee the data processed by the concurrent persistent node cluster.
- the slave nodes include but are not limited to synchronizing local data or updating data that is different from the master node according to the update operation of the master node. It can ensure that the same data is updated between the master node and the slave node or Concurrently update different data; in addition, when a new master node is created or restarted, the slave node can take over the master node to process the next step of persistent data operation.
- the monitoring system can be used as a device to control the master node and the slave node. It can monitor whether the master node and the slave node are faulty in real time.
- the technology used by the monitoring system for real-time monitoring of node failures can refer to the existing technology. The embodiments of the present invention will not be described in detail here.
- Each node in all node clusters includes but is not limited to carrying a node configuration file.
- the node configuration file contains the IP address (Internet Protocol Address) of each node in the node library cluster and is used to identify The identification information of each node, that is, the above-mentioned universal unique identification code UUID.
- a non-faulty slave node is selected from the slave nodes corresponding to the master node as the successor master node; the slave node corresponding to the master node is obtained The completion time of the node's persistence operation recorded by the node; obtain the minimum value of the completion time of the persistence operation of the slave node.
- the completion time of the node's persistence operation recorded by the slave node corresponding to the master node obtains the minimum value of the completion time of the slave node's persistence operation, including: both the master node and the slave node Perform the operation of persistent data.
- the master node fails, the slave node with the smallest value of the completion time of the persistent operation corresponding to the master node is selected, and the task of the master node continues to execute the persistent data operate.
- the node information of the new master node contains the completion time of the persistence operation and is set to the minimum value of the completion time of the persistence operation.
- the above maintenance program can periodically obtain and save the node information of at least one new master node and the minimum value of the data persistence time. Correspondence.
- the method of the present invention is also applicable to the situation that the master node and the slave node corresponding to the master node both fail, and the master node and the slave node fail at the same time; it can also be the case that the master node fails first, and the new Before the slave node that replaced the original master node completes data synchronization, the slave node fails again. It can be understood that in this case, the master node and the slave node corresponding to the master node both fail, or the slave node may fail earlier. Before the new slave node completes data synchronization, the master node fails. Similarly, in this case, the master node and the slave node corresponding to the master node both fail.
- Monitoring module 201 the monitoring module 201 is used to periodically monitor the persistence operation of the system and generate monitoring records.
- the system includes multiple nodes.
- the monitoring records include the node name of the master node that performs the persistence operation in the system and the completion of the persistence operation. Time and monitoring time.
- the monitoring module 201 is used to start a preset timing monitoring task.
- the monitoring task monitors the persistent operation of the system according to the preset time interval and generates monitoring records, that is, the monitoring module 201 monitors the monitoring program/monitoring task executed by the monitoring terminal
- the communication port of the node obtains the monitoring information (node information) of the node.
- the communication between the monitoring terminal and the node can be realized by using, but not limited to, general message middleware, and the monitoring information (node information) of the node can be obtained by monitoring the communication port.
- the way to monitor the monitoring program/monitoring task executed by the monitoring terminal includes, but is not limited to, pre-setting a timer or time interval threshold.
- the timer can start a monitoring program/monitoring task periodically to obtain information about at least one node based on the threshold of the time interval .
- the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
- the monitoring program can be a cyclically executed program. Based on the above-mentioned timing threshold interval, the information of at least one node is periodically acquired and saved.
- the monitoring program/monitoring task records the execution time of the monitoring program/monitoring task, that is, the monitoring time; save The corresponding relationship between the execution time and the information of the aforementioned node is the monitoring record.
- the information of the node may include, but is not limited to, the name of the node, ip address, master node information, slave node information, and completion time of the persistence operation.
- the comparison module 202 is used to compare the node names in the two adjacent monitoring records in the monitoring time.
- the obtaining module 203 is used to obtain the monitoring time and the monitoring time in the monitoring record with the later monitoring time among the two adjacent monitoring records when the node names in the two adjacent monitoring records are different
- the time period between the completion time of the persistence operation in the earlier monitoring record is used as the data rollback period.
- the comparison module 202 compares two adjacent monitoring records.
- the two monitoring records may be the first monitoring record and the second monitoring record.
- the monitoring time of the second monitoring record is later than the monitoring time of the first monitoring record.
- the obtaining module 203 obtains the monitoring time contained in the second monitoring record and the persistence operation of the node contained in the first monitoring record.
- the difference in completion time that is, the time period between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time, as the data Rollback period.
- the above monitoring records record the information of the current master node in the node cluster.
- the comparison module 202 compares two adjacent monitoring records. At a certain moment, if a failure occurs in the master node, the slave node is based on the node cluster recovery mechanism. , To take over the task of the master node to continue the operation of persistent data. At this time, the node information contained in the two adjacent monitoring records has the same name, but the latter monitoring record (the second monitoring record) in the two adjacent monitoring records, the monitoring time is later than the previous monitoring record ( The monitoring time of the first monitoring record), the election time saved in the second monitoring record, can be used as the end time point when the data processing system finds a failure in the master node.
- the first monitoring record is in the front of the two adjacent monitoring records at the same time.
- the completion time of the persistence operation saved in a monitoring record (the first monitoring record) can be used as the starting time point when the master node fails.
- the acquisition module 203 compares the starting time point when the master node fails and the time when the master node is found. The difference is calculated at the time point when the node fails, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the cluster data of the node. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
- the system discovers that the master node has changed during the time period, and the persistent data in the node cluster needs to be rolled back.
- the difference calculation is performed at the time point of the failure in the master node to determine the data rollback period for the rollback operation.
- the acquisition module is also used to obtain the time period between the election times in two adjacent monitoring records when the node names in two adjacent monitoring records are the same and the election times of the monitoring records are not the same, as data Rollback period.
- the above monitoring records record the information of the current master node in the node cluster.
- the slave node will take over the task of the master node and continue to perform the operation of persistent data.
- the former master node again takes over the identity of the master node to continue the operation of persistent data.
- the node information in the two adjacent monitoring records is the same, and the next one in the two adjacent monitoring records (the second monitoring record, the monitoring time is later than the monitoring time of the first monitoring record) is selected Time can be used as the point in time when the data processing system detects the failure of the master node.
- the election time saved in the previous monitoring record (the first monitoring record) of the two adjacent monitoring records can be used as the starting point for the failure of the master node.
- the acquisition module calculates the difference between the election times in the two records, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the node cluster data. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
- the configuration module 204 can also configure the nodes in the system with multiple nodes including a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
- the master node can, but is not limited to, divide a large data persistence task into multiple small data persistence tasks and distribute them to at least one slave node, and each node records the completion time of the corresponding execution data persistence operation.
- the configuration module 204 is used to select the slave node with the longest completion time of the persistence operation from the slave nodes as the new master node when the master node fails.
- the configuration module 204 is also configured to configure the completion time of the persistence operation as the minimum value among the completion times of the persistence operation of each node in the system.
- the configuration module 204 regularly obtains the information of the master node and the completion time of the node's persistence operation recorded by the master node and the slave node corresponding to the master node; obtains the persistence operation information of the slave node corresponding to the master node The minimum value in the completion time; the information of the master node and the corresponding relationship between the minimum value are stored.
- the configuration module 204 is configured to configure the completion time of the persistence operation as the median of the completion time of the persistence operation of each node in the system.
- the configuration module 204 obtains the median of the completion time of the persistence operation of the slave node corresponding to the master node, including: the master node and the slave node both execute the operation of persisting data.
- the master node If the node fails, the slave node with the largest value of the completion time of the persistence operation corresponding to the master node is selected, and the task of the master node is continued to perform the operation of persisting data.
- the node information of the new master node contains the completion time of the persistence operation and is set to the median of the completion time of the persistence operation of all slave nodes corresponding to the failed master node.
- the configuration module also periodically monitors the persistent operation of the system based on the adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
- the configuration module here can, but is not limited to, preset a timer or a timing threshold (time threshold), and the timer can periodically start a monitoring program/monitoring task to acquire and save information about at least one node.
- the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
- the machine executes a method for determining the data rollback period in the distributed storage system, including: Monitor the persistence operation of the system and generate monitoring records.
- the system includes multiple nodes.
- the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time; compare the monitoring time on a relative basis The node name in the two adjacent monitoring records; in the case where the node names in the two adjacent monitoring records are different, obtain the monitoring time and monitoring of the monitoring record with the later monitoring time in the two adjacent monitoring records
- the time period between the completion time of the persistence operation in the earlier monitoring record is used as the data rollback period.
- the embodiment of the present application also provides a system, including:
- Memory used to store instructions executed by one or more processors of the system
- the processor is one of the processors of the system. It is used to execute a method for determining the data rollback period in the distributed storage system, including periodically monitoring the persistent operation of the system and generating monitoring records.
- the system includes multiple Node, the monitoring record includes the node name of the primary node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time; compare the node names in the two adjacent monitoring records in the monitoring time; in the adjacent two When the node names in the monitoring records are different, obtain the difference between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time Time period, as the data rollback period.
- the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
- the machine executes the above-mentioned method for determining a data rollback period in a distributed storage system.
- the embodiment of the present application also provides a system, including:
- Memory used to store instructions executed by one or more processors of the system
- the processor is one of the processors of the system and is used to execute the above-mentioned method for determining the data rollback period in the distributed storage system.
- FIG. 5 shown is a block diagram of a system 500 according to an embodiment of the present application.
- Figure 5 schematically illustrates an example system 500 according to various embodiments.
- the system 500 may include one or more processors 504, a system control logic 508 connected to at least one of the processors 504, a system memory 512 connected to the system control logic 508, and a system control logic 508 connected to the system control logic 508.
- NVM non-volatile memory
- the processor 504 may include one or more single-core or multi-core processors. In some embodiments, the processor 504 may include any combination of a general-purpose processor and a special-purpose processor (for example, a graphics processor, an application processor, a baseband processor, etc.). In an embodiment in which the system 500 adopts an eNB (Evolved Node B, enhanced base station) 101 or a RAN (Radio Access Network, radio access network) controller 102, the processor 504 may be configured to execute various conforming embodiments, For example, one or more of the embodiments shown in FIG. 1.
- eNB evolved Node B, enhanced base station
- RAN Radio Access Network, radio access network
- system control logic 508 may include any suitable interface controller to provide any suitable interface to at least one of the processors 504 and/or any suitable device or component in communication with the system control logic 508.
- system control logic 508 may include one or more memory controllers to provide an interface to the system memory 512.
- the system memory 512 can be used to load and store data and/or instructions.
- the memory 512 of the system 500 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).
- DRAM dynamic random access memory
- the NVM/memory 516 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
- the NVM/memory 516 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as HDD (Hard Disk Drive, hard disk drive), CD (Compact Disc , At least one of an optical disc drive and a DVD (Digital Versatile Disc, Digital Versatile Disc) drive.
- the NVM/memory 516 may include a part of the storage resources on the device where the system 500 is installed, or it may be accessed by the device, but not necessarily a part of the device.
- the NVM/storage 516 can be accessed over the network via the network interface 520.
- system memory 512 and the NVM/memory 516 may respectively include: a temporary copy and a permanent copy of the instruction 524.
- the instructions 524 may include instructions that when executed by at least one of the processors 504 cause the system 500 to implement the method shown in FIG. 1.
- the instructions 524, hardware, firmware, and/or software components thereof may additionally/alternatively be placed in the system control logic 508, the network interface 520, and/or the processor 504.
- the network interface 520 may include a transceiver for providing a radio interface for the system 500 to communicate with any other suitable devices (such as a front-end module, an antenna, etc.) through one or more networks.
- the network interface 520 may be integrated with other components of the system 500.
- the network interface 520 may be integrated in at least one of the processor 504, the system memory 512, the NVM/memory 516, and a firmware device (not shown) with instructions.
- the system 500 implements the method shown in FIG. 1.
- the network interface 520 may further include any suitable hardware and/or firmware to provide a multiple input multiple output radio interface.
- the network interface 520 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.
- At least one of the processors 504 may be packaged with the logic of one or more controllers for the system control logic 508 to form a system in package (SiP). In one embodiment, at least one of the processors 504 may be integrated on the same die with the logic of one or more controllers for the system control logic 508 to form a system on chip (SoC).
- SiP system in package
- SoC system on chip
- the system 500 may further include: an input/output (I/O) device 532.
- the I/O device 532 may include a user interface to enable a user to interact with the system 500; the design of the peripheral component interface enables the peripheral component to also interact with the system 500.
- the system 500 further includes a sensor for determining at least one of environmental conditions and location information related to the system 500.
- the user interface may include, but is not limited to, a display (e.g., liquid crystal display, touch screen display, etc.), speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
- a display e.g., liquid crystal display, touch screen display, etc.
- speakers e.g., speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
- the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
- the senor may include, but is not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit.
- the positioning unit may also be part of or interact with the network interface 520 to communicate with components of the positioning network (eg, global positioning system (GPS) satellites).
- GPS global positioning system
- FIG. 6 shows a block diagram of a SoC (System on Chip) 600.
- SoC 600 includes: interconnection unit 650, which is coupled to application processor 615; system agent unit 670; bus controller unit 680; integrated memory controller unit 640; one group or one or more co-processing
- the device 620 may include an integrated graphics logic, an image processor, an audio processor, and a video processor; a static random access memory (SRAM) unit 630; and a direct memory access (DMA) unit 660.
- the coprocessor 620 includes a dedicated processor, such as, for example, a network or communication processor, a compression engine, a GPGPU, a high-throughput MIC processor, or an embedded processor, or the like.
- the various embodiments of the mechanism disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
- the embodiments of the present application can be implemented as a computer program or program code executed on a programmable system.
- the programmable system includes at least one processor and a storage system (including volatile and non-volatile memory and/or storage elements) , At least one input device and at least one output device.
- Program codes can be applied to input instructions to perform the functions described in this application and generate output information.
- the output information can be applied to one or more output devices in a known manner.
- a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
- DSP digital signal processor
- ASIC application specific integrated circuit
- the program code can be implemented in a high-level programming language or an object-oriented programming language to communicate with the processing system.
- assembly language or machine language can also be used to implement the program code.
- the mechanism described in this application is not limited to the scope of any particular programming language. In either case, the language can be a compiled language or an interpreted language.
- the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
- the disclosed embodiments can also be implemented as instructions carried by or stored on one or more transient or non-transitory machine-readable (eg, computer-readable) storage media, which can be executed by one or more processors Read and execute.
- the instructions can be distributed through a network or through other computer-readable media.
- a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (for example, a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), magnetic or optical card, flash memory, or A tangible machine-readable memory used to transmit information (for example, carrier waves, infrared signals, digital signals, etc.) using the Internet with electric, optical, acoustic or other forms of propagating signals. Therefore, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (for example, a computer).
- each unit/module mentioned in each device embodiment of this application is a logical unit/module.
- a logical unit/module can be a physical unit/module or a physical unit/ A part of the module can also be realized by a combination of multiple physical units/modules.
- the physical realization of these logical units/modules is not the most important.
- the combination of the functions implemented by these logical units/modules is the solution to this application.
- the above-mentioned device embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems proposed by this application. This does not mean that the above-mentioned device embodiments do not exist. Other units/modules.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Debugging And Monitoring (AREA)
- Retry When Errors Occur (AREA)
Abstract
Description
Claims (14)
- 一种确定分布式存储系统中的数据回滚时段的方法,其特征在于,包括:周期性地监控所述系统的持久化操作并生成监控记录,所述系统包括多个节点,所述监控记录包括所述系统中执行持久化操作的主节点的节点名、所述持久化操作的完成时间、监控时间和所述主节点的当选时间,所述当选时间用于记录所述节点当选所述主节点的时间;比较所述监控时间上相邻的两条所述监控记录中的节点名;在相邻的两条所述监控记录中的节点名不同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的所述监控时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段;在相邻的两条所述监控记录中的节点名相同且所述监控记录的当选时间不相同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的当选时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段。
- 根据权利要求1所述的方法,其特征在于,还包括:所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的最小值。
- 根据权利要求2所述的方法,其特征在于,还包括:所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的中位数。
- 根据权利要求1所述的方法,其特征在于,还包括:所述多个节点包括一个主节点和至少一个从节点,并且在所述主节点故障的情况下,所述从节点中的一个转换为新的主节点。
- 根据权利要求4所述的方法,其特征在于,还包括:在所述主节点故障的情况下,从所述从节点中选择持久化操作的完成时间最大的所述从节点作为新的主节点。
- 根据权利要求4所述的方法,其特征在于,还包括:基于可调节的时间阈值周期性地监控所述系统的持久化操作,所述时间阈值的间隔被配置为以分或者以秒为单位中的一种。
- 一种确定分布式存储系统中的数据回滚时段的装置,其特征在于,包括:监控模块,所述监控模块用于周期性地监控所述系统的持久化操作并生成监控记录,所述系统包括多个节点,所述监控记录包括所述系统中执行持久化操作的主节点的节点名、所述持久化操作的完成时间、监控时间和所述主节点的当选时间,所述当选时间用于记录所述节点当选所述主节点的时间;比较模块,所述比较模块用于比较所述监控时间上相邻的两条所述监控记录中的节点名;获取模块,所述获取模块用于,在相邻的两条所述监控记录中的节点名不同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的所述监控时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段;所述获取模块还用于,在相邻的两条所述监控记录中的节点名相同且所述监控记录的当选时间不相同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的当选时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段。
- 根据权利要求7所述的装置,其特征在于,还包括:配置模块,所述配置模块用于配置所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的最小值。
- 根据权利要求8所述的装置,其特征在于,还包括:所述配置模块用于配置所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的中位数。
- 根据权利要求8所述的装置,其特征在于,还包括:所述配置模块用于配置所述多个节点包括一个主节点和至少一个从节点,并且在所述主节点故障的情况下,所述从节点中的一个转换为新的主节点。
- 根据权利要求10所述的装置,其特征在于,还包括:所述配置模块用于在所述主节点故障的情况下,从所述从节点中选择持久化操作的完成时间最大的所述从节点作为新的主节点。
- 根据权利要求8所述的装置,其特征在于,还包括:所述配置模块用于配置基于可调节的时间阈值周期性地监控所述系统的持久化操作,所述时间阈值的间隔被配置为以分或者以秒为单位中的一种。
- 一种机器可读介质,其特征在于,所述机器可读介质上存储有指令,该指令在机器上执行时使机器执行权利要求1至6中任一项所述的确定分布式存储系统中的数据回滚时段的方法。
- 一种系统,其特征在于,包括:存储器,用于存储由系统的一个或多个处理器执行的指令,以及处理器,是系统的处理器之一,用于执行权利要求1至6中任一项所述的确定分布式存储系统中的数据回滚时段的。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010031344.4A CN111208949B (zh) | 2020-01-13 | 2020-01-13 | 一种确定分布式存储系统中的数据回滚时段的方法 |
CN202010031344.4 | 2020-01-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021143039A1 true WO2021143039A1 (zh) | 2021-07-22 |
Family
ID=70790094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/095846 WO2021143039A1 (zh) | 2020-01-13 | 2020-06-12 | 一种确定分布式存储系统中的数据回滚时段的方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111208949B (zh) |
WO (1) | WO2021143039A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111208949B (zh) * | 2020-01-13 | 2020-12-25 | 上海依图网络科技有限公司 | 一种确定分布式存储系统中的数据回滚时段的方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462432A (zh) * | 2014-12-15 | 2015-03-25 | 成都英力拓信息技术有限公司 | 自适应的分布式计算方法 |
CN105468297A (zh) * | 2015-11-18 | 2016-04-06 | 临沂大学 | 一种云存储系统内主从设备数据快速同步的方法 |
CN107168642A (zh) * | 2017-03-30 | 2017-09-15 | 北京奇艺世纪科技有限公司 | 一种数据存储方法及系统 |
CN107577717A (zh) * | 2017-08-09 | 2018-01-12 | 阿里巴巴集团控股有限公司 | 一种保障数据一致性的处理方法、装置及服务器 |
CN108984779A (zh) * | 2018-07-25 | 2018-12-11 | 郑州云海信息技术有限公司 | 分布式文件系统快照回滚元数据处理方法、装置及设备 |
US10235066B1 (en) * | 2017-04-27 | 2019-03-19 | EMC IP Holding Company LLC | Journal destage relay for online system checkpoint creation |
WO2019089599A1 (en) * | 2017-10-31 | 2019-05-09 | Ab Initio Technology Llc | Managing a computing cluster using durability level indicators |
CN111208949A (zh) * | 2020-01-13 | 2020-05-29 | 上海依图网络科技有限公司 | 一种确定分布式存储系统中的数据回滚时段的方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10324637B1 (en) * | 2016-12-13 | 2019-06-18 | EMC IP Holding Company LLC | Dual-splitter for high performance replication |
CN106814972B (zh) * | 2016-12-22 | 2018-04-17 | 北京华云网际科技有限公司 | 分布式块存储的快照节点的回滚方法和装置 |
CN108874552B (zh) * | 2018-06-28 | 2021-09-21 | 杭州云毅网络科技有限公司 | 分布式锁执行方法、装置及系统、应用服务器和存储介质 |
-
2020
- 2020-01-13 CN CN202010031344.4A patent/CN111208949B/zh active Active
- 2020-06-12 WO PCT/CN2020/095846 patent/WO2021143039A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462432A (zh) * | 2014-12-15 | 2015-03-25 | 成都英力拓信息技术有限公司 | 自适应的分布式计算方法 |
CN105468297A (zh) * | 2015-11-18 | 2016-04-06 | 临沂大学 | 一种云存储系统内主从设备数据快速同步的方法 |
CN107168642A (zh) * | 2017-03-30 | 2017-09-15 | 北京奇艺世纪科技有限公司 | 一种数据存储方法及系统 |
US10235066B1 (en) * | 2017-04-27 | 2019-03-19 | EMC IP Holding Company LLC | Journal destage relay for online system checkpoint creation |
CN107577717A (zh) * | 2017-08-09 | 2018-01-12 | 阿里巴巴集团控股有限公司 | 一种保障数据一致性的处理方法、装置及服务器 |
WO2019089599A1 (en) * | 2017-10-31 | 2019-05-09 | Ab Initio Technology Llc | Managing a computing cluster using durability level indicators |
CN108984779A (zh) * | 2018-07-25 | 2018-12-11 | 郑州云海信息技术有限公司 | 分布式文件系统快照回滚元数据处理方法、装置及设备 |
CN111208949A (zh) * | 2020-01-13 | 2020-05-29 | 上海依图网络科技有限公司 | 一种确定分布式存储系统中的数据回滚时段的方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111208949B (zh) | 2020-12-25 |
CN111208949A (zh) | 2020-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017124938A1 (zh) | 一种数据同步方法、装置及系统 | |
CN110048896B (zh) | 一种集群数据获取方法、装置及设备 | |
US11991094B2 (en) | Metadata driven static determination of controller availability | |
CN111225064A (zh) | Ceph集群部署方法、系统、设备和计算机可读存储介质 | |
CN115562911B (zh) | 虚拟机数据备份方法及装置、系统、电子设备、存储介质 | |
US11397632B2 (en) | Safely recovering workloads within a finite timeframe from unhealthy cluster nodes | |
CN111880956A (zh) | 一种数据同步方法和装置 | |
JP2019204527A (ja) | 記憶機器のデータ位置の処理方法及び処理装置、コンピュータ機器並びにコンピュータ読み取り可能な記憶媒体 | |
CN115328662A (zh) | 一种进程线程资源管理控制方法及系统 | |
WO2021143039A1 (zh) | 一种确定分布式存储系统中的数据回滚时段的方法 | |
CN116302352A (zh) | 集群灾备处理方法、装置、电子设备和存储介质 | |
CN113238778B (zh) | 一种升级bios固件的方法、系统、设备及介质 | |
CN111031126A (zh) | 集群缓存共享方法、系统、设备及存储介质 | |
CN112685486B (zh) | 数据库集群的数据管理方法、装置、电子设备及存储介质 | |
CN112181049B (zh) | 集群时间同步方法、装置、系统、设备及可读存储介质 | |
CN112000850B (zh) | 进行数据处理的方法、装置、系统及设备 | |
WO2018010603A1 (zh) | 基于视频云存储系统的存储模式升级方法、装置和系统 | |
CN110750424B (zh) | 资源巡检方法和装置 | |
CN110071778B (zh) | 一种对时方法、装置、设备及介质 | |
CN114363356B (zh) | 数据同步方法、系统、装置、计算机设备和存储介质 | |
CN114500289B (zh) | 控制平面恢复方法、装置、控制节点及存储介质 | |
WO2015035891A1 (zh) | 补丁方法、设备及系统 | |
CN106453656B (zh) | 一种集群主机选取方法及装置 | |
CN112527561B (zh) | 基于物联网云存储的数据备份方法及装置 | |
CN114880717A (zh) | 数据归档方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20914187 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20914187 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.03.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20914187 Country of ref document: EP Kind code of ref document: A1 |