WO2021143039A1 - 一种确定分布式存储系统中的数据回滚时段的方法 - Google Patents

一种确定分布式存储系统中的数据回滚时段的方法 Download PDF

Info

Publication number
WO2021143039A1
WO2021143039A1 PCT/CN2020/095846 CN2020095846W WO2021143039A1 WO 2021143039 A1 WO2021143039 A1 WO 2021143039A1 CN 2020095846 W CN2020095846 W CN 2020095846W WO 2021143039 A1 WO2021143039 A1 WO 2021143039A1
Authority
WO
WIPO (PCT)
Prior art keywords
monitoring
time
node
master node
persistence operation
Prior art date
Application number
PCT/CN2020/095846
Other languages
English (en)
French (fr)
Inventor
刘明伟
吴永军
江旭楷
陈萌辉
Original Assignee
上海依图网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海依图网络科技有限公司 filed Critical 上海依图网络科技有限公司
Publication of WO2021143039A1 publication Critical patent/WO2021143039A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems

Definitions

  • This application relates to the field of data processing, and in particular to a method, device, machine-readable medium, and system for determining a data rollback period in a distributed storage system.
  • a distributed storage system is a physical storage resource managed by a storage system based on a client/server model. It is connected to a node through a computer network and can effectively solve the problem of data storage and management.
  • a storage system fixed at a certain location is expanded to any number of locations/storage systems, and a large number of nodes form a storage system network.
  • Each node can be distributed in different locations, through the network for communication and data transmission between nodes.
  • users do not need to care about which node the data is stored on or from which node the data is obtained, but only need to manage and store the data in the system like a local file system.
  • the embodiment of the present application provides a method for determining a data rollback period in a distributed storage system, including:
  • the system includes multiple nodes.
  • the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the election time of the master node. Time is used to record the time when the node was elected as the master node;
  • the node names in two adjacent monitoring records are different, obtain the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the persistence operation in the monitoring record with the earlier monitoring time
  • the time period between the completion time of the data is used as the data rollback period
  • the node name in two adjacent monitoring records is the same and the election time of the monitoring record is different, obtain the election time and the earlier monitoring time of the monitoring record with the later monitoring time in the two adjacent monitoring records
  • the time period between the completion time of the persistence operation in the monitoring record is used as the data rollback period.
  • the completion time of the persistence operation is the minimum value among the completion times of the persistence operation of each node in the system.
  • the completion time of the persistence operation is the median of the completion time of the persistence operation of each node in the system.
  • the multiple nodes include a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
  • it also includes:
  • the slave node with the longest completion time of the persistence operation is selected from the slave nodes as the new master node.
  • it also includes:
  • the persistent operation of the system is periodically monitored based on an adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
  • the embodiment of the present application also provides an apparatus for determining a data rollback period in a distributed storage system, including:
  • Monitoring module the monitoring module is used to periodically monitor the persistence operation of the system and generate monitoring records.
  • the system includes multiple nodes.
  • the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and Monitoring time and the election time of the master node, where the election time is used to record the time when the node was elected the master node;
  • Comparison module the comparison module is used to compare the node names in two adjacent monitoring records at the monitoring time
  • the acquiring module is used to obtain the monitoring time and the earlier monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records when the node names in the two adjacent monitoring records are different
  • the time period between the completion time of the persistence operation in the monitoring record is used as the data rollback period
  • the acquisition module is also used to obtain the election of the monitoring record with the later monitoring time in the two adjacent monitoring records when the node name in the two adjacent monitoring records is the same and the election time of the monitoring record is not the same
  • the time period between the time and the completion time of the persistence operation in the monitoring record with the earlier monitoring time is used as the data rollback period.
  • it also includes:
  • the configuration module is used to configure the completion time of the persistence operation as the minimum value among the completion times of the persistence operation of each node in the system.
  • the configuration module is also used to configure the completion time of the persistence operation as the median of the completion time of the persistence operation of each node in the system.
  • the configuration module is also used to configure multiple nodes as a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
  • the configuration module is also used to select the slave node with the longest completion time of the persistence operation from the slave nodes as the new master node when the master node fails.
  • the configuration module is further configured to periodically monitor the persistent operation of the system based on an adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
  • the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
  • the machine executes the above-mentioned method for determining a data rollback period in a distributed storage system.
  • the embodiment of the present application also provides a system, including:
  • Memory used to store instructions executed by one or more processors of the system
  • the processor is one of the processors of the system and is used to execute the above-mentioned method for determining the data rollback period in the distributed storage system.
  • the present invention also provides a method, device, machine-readable medium and system for determining the data rollback period in a distributed storage system, without manual intervention, can automatically discover the data rollback situation, and accurately obtain the time of data loss part. This is very useful for further data loss processing.
  • Fig. 1 shows a schematic flowchart of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
  • Fig. 2 shows a schematic structural diagram of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
  • Fig. 3 shows a schematic structural diagram of a method for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
  • Fig. 4 shows a schematic structural diagram of an apparatus for determining a data rollback period in a distributed storage system according to some embodiments of the present application.
  • Fig. 5 shows a block diagram of a system according to some embodiments of the present application.
  • Fig. 6 shows a block diagram of a system on chip (SoC) according to some embodiments of the present application.
  • SoC system on chip
  • module can refer to or include an application specific integrated circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) that executes one or more software or firmware programs, and /Or memory, combinational logic circuits, and/or other suitable hardware components that provide the described functions, or may be part of these hardware components.
  • ASIC application specific integrated circuit
  • processor shared, dedicated, or group
  • processor shared, dedicated, or group
  • combinational logic circuits and/or other suitable hardware components that provide the described functions, or may be part of these hardware components.
  • the processor may be a microprocessor, a digital signal processor, a microcontroller, etc., and/or any combination thereof.
  • the processor may be a single-core processor, a multi-core processor, etc., and/or any combination thereof.
  • the embodiment of the present invention uses a persistence mechanism, which is a mechanism for converting data between a persistent state and an instantaneous state. In layman's terms, it is to persist transient data, such as cached data, into persistent data. In a storage device where the persistent data obtained based on the persistence mechanism can be permanently stored, even if the storage device is down, as long as the persistent data is not damaged, the persistent data will not be lost.
  • a persistence mechanism is a mechanism for converting data between a persistent state and an instantaneous state. In layman's terms, it is to persist transient data, such as cached data, into persistent data.
  • transient data such as cached data
  • a method, a device, a machine-readable medium, and a system for determining a data rollback period in a distributed storage system are disclosed.
  • the nodes indicated in the embodiments of the present invention are used to persist output data. For example, assuming that node a is a node that performs data persistence, then after node a is executed, the data at point a can be persisted. Generally, persistent data can be stored in memory and then persisted to a storage medium, or it can be stored directly in the file system.
  • a node may persist cached data into persistent data based on the persistence mechanism, and it may be to record the write operation for the cached data in the node into a persistent file.
  • the above-mentioned write operations include data addition operations, deletion operations, and modification operations.
  • Figure 1 shows a flow chart of the method.
  • the method according to the embodiment of the present invention will be described in detail including:
  • the 101 Periodically monitor the persistence operation of the system and generate monitoring records.
  • the system includes multiple nodes.
  • the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time;
  • the system here can, but is not limited to, preset a timer or a timing threshold (time threshold), and the timer can periodically start a monitoring program/monitoring task to acquire and save information of at least one node.
  • the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
  • the monitoring program/monitoring task can be a cyclically executed program. Based on the above timing threshold interval, the information of at least one node is periodically acquired and saved.
  • the timer or timing threshold interval can be configured in minutes or seconds. That is to say, a monitoring program/monitoring task can be started regularly at intervals of minutes or seconds.
  • the monitoring information (node information) of the node is obtained by monitoring the communication port of the node through the monitoring program/monitoring task executed by the monitoring terminal.
  • the communication between the monitoring terminal and the node can be realized by using, but not limited to, general message middleware, and the monitoring information (node information) of the node can be obtained by monitoring the communication port.
  • the manner of monitoring the monitoring program/monitoring task executed by the monitoring terminal includes, but is not limited to, setting a timer or a timing threshold in advance, and the timer can periodically start a monitoring program/monitoring task to obtain information of at least one node.
  • the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
  • the monitoring program can be a cyclically executed program, and based on the interval of the above-mentioned timing threshold, the information of at least one node is periodically acquired and stored.
  • the information of the node may include, but is not limited to, the name of the node, ip address, master node information, slave node information, and completion time of the persistence operation.
  • the nodes in the system here can be configured to include at least one master node and at least one slave node corresponding to the master node.
  • the master node can, but is not limited to, divide a large data persistence task into multiple small data persistence tasks.
  • the task is distributed to at least one slave node, and each node records the completion time of the corresponding execution data persistence operation.
  • the monitoring program/monitoring task records the execution time of the monitoring program/monitoring task, that is, the monitoring time; the corresponding relationship between the execution time and the information of the aforementioned node is saved as the monitoring record.
  • the saved data includes but is not limited to the form of data key-value pairs, that is, the saved data has the form of key-value, where key is a data key, value is a data value, and the data value can be data in various forms.
  • the saved data can be easily managed in the form of data key-value pairs.
  • the two monitoring records can be the first monitoring record and the second monitoring record.
  • the monitoring time of the second monitoring record is later than the monitoring time of the first monitoring record.
  • obtaining the difference between the monitoring time contained in the second monitoring record and the completion time of the node's persistence operation contained in the first monitoring record That is, the time period between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time is regarded as the data rollback period.
  • the above monitoring records record the information of the current master node in the node cluster.
  • the slave node is based on the node.
  • the cluster recovery mechanism takes over the task of the master node and continues to perform the operation of persistent data.
  • the node information in the two adjacent monitoring records will be different.
  • the monitoring time saved in the second monitoring record (the second monitoring record) of the two adjacent monitoring records is later than the first monitoring record.
  • the monitoring time can be used as the time point when the data processing system discovers the failure of the master node, and at the same time the completion time of the persistence operation of the node information saved in the previous monitoring record (the first monitoring record) in the two adjacent monitoring records , Can be used as the starting time point of the failure of the master node.
  • the difference between the starting time point of the failure of the master node and the time point when the failure is found in the master node is calculated.
  • the time period between these two time points is confirmed
  • the system discovers that the master node has changed during the time period, and the persistent data in the node cluster needs to be rolled back.
  • the difference calculation is performed at the time point of the failure in the master node to determine the data rollback period for the rollback operation.
  • the time period between the election times in the two adjacent monitoring records is acquired as the data rollback period.
  • the above monitoring record records the information of the current master node in the node cluster.
  • a slave node will take over the task of the master node and continue to execute persistent data.
  • the predecessor’s master node After the predecessor’s master node is repaired and the slave node fails, the predecessor’s master node again takes over the identity of the master node to continue the operation of persistent data.
  • the monitoring program monitors and generates two adjacent monitoring records of these two nodes.
  • the nodes contained in the two adjacent monitoring records The name is the same, but the next monitoring record (the second monitoring record) in the two adjacent monitoring records, the monitoring time is later than the monitoring time of the previous monitoring record (the first monitoring record), and the one saved in the second monitoring record
  • the selected time can be used as the end time point when the data processing system detects the failure of the master node.
  • the completion time of the persistence operation saved in the first monitoring record can be used as the start time point when the master node fails.
  • the election time in the record is calculated as the difference, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the node cluster data. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
  • the information of the master node and the completion time of the node's persistence operation recorded by the master node and the slave node corresponding to the master node are periodically obtained; the persistence operation of the slave node corresponding to the master node is obtained The minimum value in the completion time; stores the information of the master node and the corresponding relationship between the minimum value.
  • the saved data includes but is not limited to the form of data key-value pairs, that is, the saved data has the form of key-value, where key is a data key, value is a data value, and the data value can be data in various forms.
  • the saved data can be easily managed in the form of data key-value pairs.
  • the multiple nodes include a master node and at least one slave node.
  • the slave node with the longest completion time of the persistence operation is selected as the successor from the slave nodes corresponding to the master node.
  • the completion time of the persistence operation is the median of the completion time of the persistence operation of each node in the system.
  • obtaining the median of the completion time of the persistence operation of the slave node corresponding to the master node includes: the master node and the slave node both perform the operation of persisting data, and at a certain moment, the master node fails , Select the slave node with the largest value of the completion time of the persistence operation corresponding to the master node, and take over the task of the master node to continue the operation of persisting data.
  • the node information of the new master node contains the completion time of the persistence operation and is set to the median of the completion time of the persistence operation of all slave nodes corresponding to the failed master node.
  • storing the information of the successor master node and the corresponding relationship between the median includes that the above maintenance program can periodically obtain and save the node information of at least one new master node and the median of the data persistence time. Correspondence between.
  • both the master node and the slave node perform the operation of persisting data.
  • the slave node will take over the task of the master node to continue the operation of persisting data based on the node cluster recovery mechanism. For another example, after a slave node fails, a new node is created to replace the failed slave node, and then the newly created node will replace the failed slave node to complete the operation of executing the persistent data.
  • the node cluster recovery mechanism mentioned above is a mechanism established between the master node and its corresponding slave node in the node cluster system. This mechanism enables the master node and its corresponding slave node to establish a recovery takeover relationship. It can guarantee the data processed by the concurrent persistent node cluster.
  • the slave nodes include but are not limited to synchronizing local data or updating data that is different from the master node according to the update operation of the master node. It can ensure that the same data is updated between the master node and the slave node or Concurrently update different data; in addition, when a new master node is created or restarted, the slave node can take over the master node to process the next step of persistent data operation.
  • the monitoring system can be used as a device to control the master node and the slave node. It can monitor whether the master node and the slave node are faulty in real time.
  • the technology used by the monitoring system for real-time monitoring of node failures can refer to the existing technology. The embodiments of the present invention will not be described in detail here.
  • Each node in all node clusters includes but is not limited to carrying a node configuration file.
  • the node configuration file contains the IP address (Internet Protocol Address) of each node in the node library cluster and is used to identify The identification information of each node, that is, the above-mentioned universal unique identification code UUID.
  • a non-faulty slave node is selected from the slave nodes corresponding to the master node as the successor master node; the slave node corresponding to the master node is obtained The completion time of the node's persistence operation recorded by the node; obtain the minimum value of the completion time of the persistence operation of the slave node.
  • the completion time of the node's persistence operation recorded by the slave node corresponding to the master node obtains the minimum value of the completion time of the slave node's persistence operation, including: both the master node and the slave node Perform the operation of persistent data.
  • the master node fails, the slave node with the smallest value of the completion time of the persistent operation corresponding to the master node is selected, and the task of the master node continues to execute the persistent data operate.
  • the node information of the new master node contains the completion time of the persistence operation and is set to the minimum value of the completion time of the persistence operation.
  • the above maintenance program can periodically obtain and save the node information of at least one new master node and the minimum value of the data persistence time. Correspondence.
  • the method of the present invention is also applicable to the situation that the master node and the slave node corresponding to the master node both fail, and the master node and the slave node fail at the same time; it can also be the case that the master node fails first, and the new Before the slave node that replaced the original master node completes data synchronization, the slave node fails again. It can be understood that in this case, the master node and the slave node corresponding to the master node both fail, or the slave node may fail earlier. Before the new slave node completes data synchronization, the master node fails. Similarly, in this case, the master node and the slave node corresponding to the master node both fail.
  • Monitoring module 201 the monitoring module 201 is used to periodically monitor the persistence operation of the system and generate monitoring records.
  • the system includes multiple nodes.
  • the monitoring records include the node name of the master node that performs the persistence operation in the system and the completion of the persistence operation. Time and monitoring time.
  • the monitoring module 201 is used to start a preset timing monitoring task.
  • the monitoring task monitors the persistent operation of the system according to the preset time interval and generates monitoring records, that is, the monitoring module 201 monitors the monitoring program/monitoring task executed by the monitoring terminal
  • the communication port of the node obtains the monitoring information (node information) of the node.
  • the communication between the monitoring terminal and the node can be realized by using, but not limited to, general message middleware, and the monitoring information (node information) of the node can be obtained by monitoring the communication port.
  • the way to monitor the monitoring program/monitoring task executed by the monitoring terminal includes, but is not limited to, pre-setting a timer or time interval threshold.
  • the timer can start a monitoring program/monitoring task periodically to obtain information about at least one node based on the threshold of the time interval .
  • the monitoring program/monitoring task can communicate with the node by accessing the access interface of the above-mentioned node, and obtain the information stored in the node.
  • the monitoring program can be a cyclically executed program. Based on the above-mentioned timing threshold interval, the information of at least one node is periodically acquired and saved.
  • the monitoring program/monitoring task records the execution time of the monitoring program/monitoring task, that is, the monitoring time; save The corresponding relationship between the execution time and the information of the aforementioned node is the monitoring record.
  • the information of the node may include, but is not limited to, the name of the node, ip address, master node information, slave node information, and completion time of the persistence operation.
  • the comparison module 202 is used to compare the node names in the two adjacent monitoring records in the monitoring time.
  • the obtaining module 203 is used to obtain the monitoring time and the monitoring time in the monitoring record with the later monitoring time among the two adjacent monitoring records when the node names in the two adjacent monitoring records are different
  • the time period between the completion time of the persistence operation in the earlier monitoring record is used as the data rollback period.
  • the comparison module 202 compares two adjacent monitoring records.
  • the two monitoring records may be the first monitoring record and the second monitoring record.
  • the monitoring time of the second monitoring record is later than the monitoring time of the first monitoring record.
  • the obtaining module 203 obtains the monitoring time contained in the second monitoring record and the persistence operation of the node contained in the first monitoring record.
  • the difference in completion time that is, the time period between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time, as the data Rollback period.
  • the above monitoring records record the information of the current master node in the node cluster.
  • the comparison module 202 compares two adjacent monitoring records. At a certain moment, if a failure occurs in the master node, the slave node is based on the node cluster recovery mechanism. , To take over the task of the master node to continue the operation of persistent data. At this time, the node information contained in the two adjacent monitoring records has the same name, but the latter monitoring record (the second monitoring record) in the two adjacent monitoring records, the monitoring time is later than the previous monitoring record ( The monitoring time of the first monitoring record), the election time saved in the second monitoring record, can be used as the end time point when the data processing system finds a failure in the master node.
  • the first monitoring record is in the front of the two adjacent monitoring records at the same time.
  • the completion time of the persistence operation saved in a monitoring record (the first monitoring record) can be used as the starting time point when the master node fails.
  • the acquisition module 203 compares the starting time point when the master node fails and the time when the master node is found. The difference is calculated at the time point when the node fails, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the cluster data of the node. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
  • the system discovers that the master node has changed during the time period, and the persistent data in the node cluster needs to be rolled back.
  • the difference calculation is performed at the time point of the failure in the master node to determine the data rollback period for the rollback operation.
  • the acquisition module is also used to obtain the time period between the election times in two adjacent monitoring records when the node names in two adjacent monitoring records are the same and the election times of the monitoring records are not the same, as data Rollback period.
  • the above monitoring records record the information of the current master node in the node cluster.
  • the slave node will take over the task of the master node and continue to perform the operation of persistent data.
  • the former master node again takes over the identity of the master node to continue the operation of persistent data.
  • the node information in the two adjacent monitoring records is the same, and the next one in the two adjacent monitoring records (the second monitoring record, the monitoring time is later than the monitoring time of the first monitoring record) is selected Time can be used as the point in time when the data processing system detects the failure of the master node.
  • the election time saved in the previous monitoring record (the first monitoring record) of the two adjacent monitoring records can be used as the starting point for the failure of the master node.
  • the acquisition module calculates the difference between the election times in the two records, and the time period between these two time points is confirmed as the data rollback period of the data rollback in the node cluster data. Within the range of the difference (data rollback period), roll back the data of the node corresponding to the node information contained in the first monitoring record.
  • the configuration module 204 can also configure the nodes in the system with multiple nodes including a master node and at least one slave node, and in the case of a failure of the master node, one of the slave nodes is converted to a new master node.
  • the master node can, but is not limited to, divide a large data persistence task into multiple small data persistence tasks and distribute them to at least one slave node, and each node records the completion time of the corresponding execution data persistence operation.
  • the configuration module 204 is used to select the slave node with the longest completion time of the persistence operation from the slave nodes as the new master node when the master node fails.
  • the configuration module 204 is also configured to configure the completion time of the persistence operation as the minimum value among the completion times of the persistence operation of each node in the system.
  • the configuration module 204 regularly obtains the information of the master node and the completion time of the node's persistence operation recorded by the master node and the slave node corresponding to the master node; obtains the persistence operation information of the slave node corresponding to the master node The minimum value in the completion time; the information of the master node and the corresponding relationship between the minimum value are stored.
  • the configuration module 204 is configured to configure the completion time of the persistence operation as the median of the completion time of the persistence operation of each node in the system.
  • the configuration module 204 obtains the median of the completion time of the persistence operation of the slave node corresponding to the master node, including: the master node and the slave node both execute the operation of persisting data.
  • the master node If the node fails, the slave node with the largest value of the completion time of the persistence operation corresponding to the master node is selected, and the task of the master node is continued to perform the operation of persisting data.
  • the node information of the new master node contains the completion time of the persistence operation and is set to the median of the completion time of the persistence operation of all slave nodes corresponding to the failed master node.
  • the configuration module also periodically monitors the persistent operation of the system based on the adjustable time threshold, and the interval of the time threshold is configured to be one of minutes or seconds.
  • the configuration module here can, but is not limited to, preset a timer or a timing threshold (time threshold), and the timer can periodically start a monitoring program/monitoring task to acquire and save information about at least one node.
  • the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
  • the machine executes a method for determining the data rollback period in the distributed storage system, including: Monitor the persistence operation of the system and generate monitoring records.
  • the system includes multiple nodes.
  • the monitoring records include the node name of the master node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time; compare the monitoring time on a relative basis The node name in the two adjacent monitoring records; in the case where the node names in the two adjacent monitoring records are different, obtain the monitoring time and monitoring of the monitoring record with the later monitoring time in the two adjacent monitoring records
  • the time period between the completion time of the persistence operation in the earlier monitoring record is used as the data rollback period.
  • the embodiment of the present application also provides a system, including:
  • Memory used to store instructions executed by one or more processors of the system
  • the processor is one of the processors of the system. It is used to execute a method for determining the data rollback period in the distributed storage system, including periodically monitoring the persistent operation of the system and generating monitoring records.
  • the system includes multiple Node, the monitoring record includes the node name of the primary node that performs the persistence operation in the system, the completion time of the persistence operation, and the monitoring time; compare the node names in the two adjacent monitoring records in the monitoring time; in the adjacent two When the node names in the monitoring records are different, obtain the difference between the monitoring time in the monitoring record with the later monitoring time in the two adjacent monitoring records and the completion time of the persistence operation in the monitoring record with the earlier monitoring time Time period, as the data rollback period.
  • the present application also provides a machine-readable medium with instructions stored on the machine-readable medium.
  • the machine executes the above-mentioned method for determining a data rollback period in a distributed storage system.
  • the embodiment of the present application also provides a system, including:
  • Memory used to store instructions executed by one or more processors of the system
  • the processor is one of the processors of the system and is used to execute the above-mentioned method for determining the data rollback period in the distributed storage system.
  • FIG. 5 shown is a block diagram of a system 500 according to an embodiment of the present application.
  • Figure 5 schematically illustrates an example system 500 according to various embodiments.
  • the system 500 may include one or more processors 504, a system control logic 508 connected to at least one of the processors 504, a system memory 512 connected to the system control logic 508, and a system control logic 508 connected to the system control logic 508.
  • NVM non-volatile memory
  • the processor 504 may include one or more single-core or multi-core processors. In some embodiments, the processor 504 may include any combination of a general-purpose processor and a special-purpose processor (for example, a graphics processor, an application processor, a baseband processor, etc.). In an embodiment in which the system 500 adopts an eNB (Evolved Node B, enhanced base station) 101 or a RAN (Radio Access Network, radio access network) controller 102, the processor 504 may be configured to execute various conforming embodiments, For example, one or more of the embodiments shown in FIG. 1.
  • eNB evolved Node B, enhanced base station
  • RAN Radio Access Network, radio access network
  • system control logic 508 may include any suitable interface controller to provide any suitable interface to at least one of the processors 504 and/or any suitable device or component in communication with the system control logic 508.
  • system control logic 508 may include one or more memory controllers to provide an interface to the system memory 512.
  • the system memory 512 can be used to load and store data and/or instructions.
  • the memory 512 of the system 500 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).
  • DRAM dynamic random access memory
  • the NVM/memory 516 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the NVM/memory 516 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as HDD (Hard Disk Drive, hard disk drive), CD (Compact Disc , At least one of an optical disc drive and a DVD (Digital Versatile Disc, Digital Versatile Disc) drive.
  • the NVM/memory 516 may include a part of the storage resources on the device where the system 500 is installed, or it may be accessed by the device, but not necessarily a part of the device.
  • the NVM/storage 516 can be accessed over the network via the network interface 520.
  • system memory 512 and the NVM/memory 516 may respectively include: a temporary copy and a permanent copy of the instruction 524.
  • the instructions 524 may include instructions that when executed by at least one of the processors 504 cause the system 500 to implement the method shown in FIG. 1.
  • the instructions 524, hardware, firmware, and/or software components thereof may additionally/alternatively be placed in the system control logic 508, the network interface 520, and/or the processor 504.
  • the network interface 520 may include a transceiver for providing a radio interface for the system 500 to communicate with any other suitable devices (such as a front-end module, an antenna, etc.) through one or more networks.
  • the network interface 520 may be integrated with other components of the system 500.
  • the network interface 520 may be integrated in at least one of the processor 504, the system memory 512, the NVM/memory 516, and a firmware device (not shown) with instructions.
  • the system 500 implements the method shown in FIG. 1.
  • the network interface 520 may further include any suitable hardware and/or firmware to provide a multiple input multiple output radio interface.
  • the network interface 520 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.
  • At least one of the processors 504 may be packaged with the logic of one or more controllers for the system control logic 508 to form a system in package (SiP). In one embodiment, at least one of the processors 504 may be integrated on the same die with the logic of one or more controllers for the system control logic 508 to form a system on chip (SoC).
  • SiP system in package
  • SoC system on chip
  • the system 500 may further include: an input/output (I/O) device 532.
  • the I/O device 532 may include a user interface to enable a user to interact with the system 500; the design of the peripheral component interface enables the peripheral component to also interact with the system 500.
  • the system 500 further includes a sensor for determining at least one of environmental conditions and location information related to the system 500.
  • the user interface may include, but is not limited to, a display (e.g., liquid crystal display, touch screen display, etc.), speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
  • a display e.g., liquid crystal display, touch screen display, etc.
  • speakers e.g., speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
  • the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
  • the senor may include, but is not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit.
  • the positioning unit may also be part of or interact with the network interface 520 to communicate with components of the positioning network (eg, global positioning system (GPS) satellites).
  • GPS global positioning system
  • FIG. 6 shows a block diagram of a SoC (System on Chip) 600.
  • SoC 600 includes: interconnection unit 650, which is coupled to application processor 615; system agent unit 670; bus controller unit 680; integrated memory controller unit 640; one group or one or more co-processing
  • the device 620 may include an integrated graphics logic, an image processor, an audio processor, and a video processor; a static random access memory (SRAM) unit 630; and a direct memory access (DMA) unit 660.
  • the coprocessor 620 includes a dedicated processor, such as, for example, a network or communication processor, a compression engine, a GPGPU, a high-throughput MIC processor, or an embedded processor, or the like.
  • the various embodiments of the mechanism disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • the embodiments of the present application can be implemented as a computer program or program code executed on a programmable system.
  • the programmable system includes at least one processor and a storage system (including volatile and non-volatile memory and/or storage elements) , At least one input device and at least one output device.
  • Program codes can be applied to input instructions to perform the functions described in this application and generate output information.
  • the output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • the program code can be implemented in a high-level programming language or an object-oriented programming language to communicate with the processing system.
  • assembly language or machine language can also be used to implement the program code.
  • the mechanism described in this application is not limited to the scope of any particular programming language. In either case, the language can be a compiled language or an interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments can also be implemented as instructions carried by or stored on one or more transient or non-transitory machine-readable (eg, computer-readable) storage media, which can be executed by one or more processors Read and execute.
  • the instructions can be distributed through a network or through other computer-readable media.
  • a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (for example, a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), magnetic or optical card, flash memory, or A tangible machine-readable memory used to transmit information (for example, carrier waves, infrared signals, digital signals, etc.) using the Internet with electric, optical, acoustic or other forms of propagating signals. Therefore, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (for example, a computer).
  • each unit/module mentioned in each device embodiment of this application is a logical unit/module.
  • a logical unit/module can be a physical unit/module or a physical unit/ A part of the module can also be realized by a combination of multiple physical units/modules.
  • the physical realization of these logical units/modules is not the most important.
  • the combination of the functions implemented by these logical units/modules is the solution to this application.
  • the above-mentioned device embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems proposed by this application. This does not mean that the above-mentioned device embodiments do not exist. Other units/modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Debugging And Monitoring (AREA)
  • Retry When Errors Occur (AREA)

Abstract

本申请涉及数据处理领域,涉及一种确定分布式存储系统中的数据回滚时段的方法、装置、机器可读介质和系统,方法包括:周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和主节点的当选时间,当选时间用于记录节点当选主节点的时间;比较监控时间上相邻的两条监控记录中的节点名;获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段或者当选时间和持久化操作的完成时间之间的时间段,作为数据回滚时段。本方法能够准确地得出数据丢失的时间段。

Description

一种确定分布式存储系统中的数据回滚时段的方法 技术领域
本申请涉及数据处理领域,特别涉及一种确定分布式存储系统中的数据回滚时段的方法、装置、机器可读介质和系统。
背景技术
随着云计算、大数据的快速发展,数据呈现爆炸式增长,单纯通过增加硬盘个数来扩展计算机文件系统的存储容量的方式,已经无法满足信息爆炸时代指数倍增长的数据的存储需求,分布式存储系统由此产生。
分布式存储系统为基于客户端/服务器模式的存储系统管理的物理存储资源,通过计算机网络与节点相连,可以有效解决数据的存储和管理难题。将固定于某个地点的某个存储系统,扩展到任意多各地点/多个存储系统,众多的节点组成一个存储系统网络。每个节点可以分布在不同地点,通过网络进行节点间的通信和数据传输。用户在使用分布式存储系统时,无需关心数据是存储在哪个节点上、或者是从哪个节点获取的,只需要像使用本地文件系统一样管理和存储系统中的数据。
但是,分布式存储系统在发生数据回滚时,一般只是将丢失的数据做一份备份,以及触发告警,等待人工处理。但是人工处理很难及时响应,并且不容易确定数据丢失的准确时间段。
发明内容
本申请实施例提供了一种确定分布式存储系统中的数据回滚时段的方法,包括:
周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和主节点的当选时间,当选时间用于记录节点当选主节点的时间;
比较监控时间上相邻的两条监控记录中的节点名;
在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段;
在相邻的两条监控记录中的节点名相同且监控记录的当选时间不相同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的当选时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
可选地,
持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的最小值。
可选地,
持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的中位数。
可选地,
多个节点包括一个主节点和至少一个从节点,并且在主节点故障的情况下,从节点中的一个转换为新的主节点。
可选地,还包括:
在主节点故障的情况下,从从节点中选择持久化操作的完成时间最大的从节点作为新的主节点。
可选地,还包括:
基于可调节的时间阈值周期性地监控系统的持久化操作,时间阈值的间隔被配置为以分或者以秒为单位中的一种。
本申请实施例还提供了一种确定分布式存储系统中的数据回滚时段的装置,包括:
监控模块,监控模块用于周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和监控时间和所述主节点的当选时间,所述当选时间用于记录所述节点当选所述主节点的时间;
比较模块,比较模块用于比较监控时间上相邻的两条监控记录中的节点名;
获取模块,获取模块用于,在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段;
获取模块还用于,在相邻的两条监控记录中的节点名相同且监控记录的当选时间不相同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的当选时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
可选地,还包括:
配置模块,配置模块用于配置持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的最小值。
可选地,
配置模块还用于配置持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的中位数。
可选地,
配置模块还用于配置多个节点为一个主节点和至少一个从节点,并且在主节点故障的情况下,从节点中的一个转换为新的主节点。
可选地,
配置模块还用于在主节点故障的情况下,从从节点中选择持久化操作的完成时间最大的从节点作为新的主节点。
可选地,
配置模块还用于配置基于可调节的时间阈值周期性地监控系统的持久化操作,时间阈值的间隔被配置为以分或者以秒为单位中的一种。
本申请还提供了一种机器可读介质,机器可读介质上存储有指令,该指令在机器上执行时使机器执行上述确定分布式存储系统中的数据回滚时段的方法。
本申请实施例还提供了一种系统,包括:
存储器,用于存储由系统的一个或多个处理器执行的指令,以及
处理器,是系统的处理器之一,用于执行上述的确定分布式存储系统中的数据回滚时段的方法。
本发明还提供一种确定分布式存储系统中的数据回滚时段的方法,装置,机器可读介质和系统,无需人工介入,可以自动发现数据回滚情况,并且准确地得出数据丢失的时间段。这对于进一步的数据丢失处理非常有用。
附图说明
图1根据本申请的一些实施例,示出了一种确定分布式存储系统中的数据回滚时段的方法的流程示意图。
图2根据本申请的一些实施例,示出了一种确定分布式存储系统中的数据回滚时段的方法的结构示意图。
图3根据本申请的一些实施例,示出了一种确定分布式存储系统中的数据回滚时段的方法的结构示意图。
图4根据本申请的一些实施例,示出了一种确定分布式存储系统中的数据回滚时段的装置的结构示意图。
图5根据本申请的一些实施例,示出了一种系统的框图。
图6根据本申请一些实施例,示出了一种片上系统(SoC)的框图。
具体实施例
可以理解,如本文所使用的,术语“模块”可以指代或者包括专用集成电路(ASIC)、电子电路、执行一个或多个软件或固件程序的处理器(共享、专用、或群组)和/或存储器、组合逻辑电路、和/或提供所描述的功能的其他适当硬件组件,或者可以作为这些硬件组件的一部分。
可以理解,在本申请各实施例中,处理器可以是微处理器、数字信号处理器、微控制器等,和/或其任何组合。根据另一个方面,所述处理器可以是单核处理器,多核处理器等,和/或其任何组合。
下面将结合附图对本申请的实施例作进一步地详细描述。
本发明的实施例中使用了持久化机制,该机制是将数据在持久状态和瞬时状态间转换的机制。通俗的讲,就是将瞬时数据,如缓存数据,持久化为持久数据。基于持久化机制所得到的持久数据可被永久保存的存储设备中,即使存储设备宕机,只要该持久数据未被损坏,该持久数据都不会丢失。
根据本申请的一些实施例公开了一种确定分布式存储系统中的数据回滚时段的方法、装置、机器可读介质和系统。
本发明的实施例所指出的节点是用于将输出数据进行持久化。例如,假设a节点是执行数据持久化的节点,那么在a节点执行完毕后,可以将该a点的数据进行持久化。通常,持久化数据可以保存在内存中,然后持久化至存储介质,也可以直接保存在文件系统中。
节点基于持久化机制将缓存数据持久化为持久数据的具体方式是多种多样的,可以是将节点中针对缓存数据的写操作记录到持久化文件中。上述写操作包括数据的增加操作、删除操作以及修改操作。
首先参考图1,图1示出了方法的流程图。在下文中,将详细描述根据本发明的实施例的该方法包括:
101:周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和监控时间;
这里的系统可以但不限于预先设置一定时器或者定时阈值(时间阈值),该定时器能够定时启动一监控程序/监控任务获取并保存至少一个节点的信息。该监控程序/监控任务能够通过访问上述节点的访问接口与该节点通信,获取保存于该节点的信息。该监控程序/监控任务能够是一个循环执行的程序,基于上述定时阈值的间隔,定时获取并保存至少一个节点的信息,这里的定时器或者定时阈值的间隔可以配置为分或者秒为单位的,也就是说可以以分或者秒的间隔定时启动一监控程序/监控任务。
具体地,通过监控终端执行的监控程序/监控任务监听节点的通信端口,获得该节点 的监控信息(节点的信息)。监控终端与节点间通信的实现方式采用但不限于一般的消息中间件即可,通过监听通信端口即可获得节点的监控信息(节点的信息)。监控终端执行的监控程序/监控任务的方式,包括但不限于预先设置一定时器或者定时阈值,该定时器能够定时启动一监控程序/监控任务获取至少一个节点的信息。该监控程序/监控任务能够通过访问上述节点的访问接口与该节点通信,获取保存于该节点的信息。该监控程序能够是一个循环执行的程序,基于上述定时阈值的间隔,定时获取并保存至少一个节点的信息。
其中,该节点的信息可以包括但不限于,节点的名称,ip地址,主节点信息,从节点信息,持久化操作的完成时间。
这里的系统中的节点可以被配置为包括至少一个主节点以及与主节点相对应的至少一个从节点,主节点可以但不限于,将一个大的数据持久化任务分为多个小的数据持久化任务并分发给至少一个从节点,各个节点记录对应的执行数据的持久化操作的完成时间。
同时,监控程序/监控任务记录监控程序/监控任务的执行时间,即监控时间;保存该执行时间与上述节点的信息之间的对应关系为该监控记录。其中,保存的数据包括但不限于具有数据键值对的形式,即保存数据具有key-value的形式,其中,key为数据键,value为数据值,并且数据值可为多种形式的数据,通过数据键值对的形式可方便地对保存的数据进行管理。
102:比较监控时间上相邻的两条监控记录中的节点名;
103:在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
接着,比较相邻的两条监控记录,例如,这两条监控记录可以是第一监控记录和第二监控记录,第二监控记录的监控时间晚于第一监控记录的监控时间,当该第一监控记录和该第二监控记录包含的该节点名不相同时,获取该第二监控记录包含的该监控时间与该第一监控记录包含的该节点的持久化操作的完成时间的差值,即,邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
具体地,如图2所述,上述监控记录中记录有节点集群中当前主节点的信息,通过比较相邻的两条监控记录,在某一时刻,主节点中出现故障,则从节点基于节点集群恢复机制,接替该主节点的任务继续执行持久化数据的操作。此时,相邻的两条监控记录中的节点信息将不相同,相邻的两条监控记录中后一条监控记录(第二监控记录)中保存的监控时间,监控时间晚于第一监控记录的监控时间,可以作为数据处理系统发现主节点中出现故障的时间点,同时相邻的两条监控记录中前一条监控记录(第一监控记录)中保存的节点信息的持久化操作的完成时间,可以作为主节点出现故障的起始时间点,将主节点出现故障的起始时间点与发现主节点中出现故障的时间点做差值计算,这两个时间点之间的时间段,确认为该节点集群数据中数据回滚的数据回滚时段。于该差值(数据回滚时段)的范围内,回滚该第一监控记录包含的该节点信息对应的该节点的数据。当主节点出现故障,到从节点接任主节点,系统发现主节点变化的时间段内,该节点集群中已持久化的数据需要进行回滚操作,通过将主节点出现故障的起始时间点与发现主节点中出现故障的时间点做差值计算,来确定进行回滚操作的数据回滚时段。
在相邻的两条监控记录中的节点名相同且监控记录的当选时间不相同的情况下,获取相邻的两条监控记录中当选时间之间的时间段,作为数据回滚时段。
具体地,如图3所述,上述监控记录中记录有节点集群中当前主节点的信息,在某一时刻,主节点中出现故障,则一个从节点接替该主节点的任务继续执行持久化数据的操作,之后前任的主节点修复完毕而从节点发生故障,前任的主节点再次接替主节点的身份继续执行持久化数据的操作。此时,监控程序监控并生成了这两个节点的相邻的两条监控记录,对于这相邻的两条监控记录来说,会出现如下的场景:相邻的两条监控记录包含的节点名相同,但是,相邻的两条监控记录中的后一条监控记录(第二监控记录),监控时间晚于前一条监控记录(第一监控记录)的监控时间,第二监控记录中保存的当选时间,可以作为数据处理系统发现主节点中出现故障的结束时间点,第一监控记录的中保存的持久化操作的完成时间,可以作为主节点出现故障的起始时间点,将这两条记录中的当选时间做差值计算,这两个时间点之间的时间段,确认为该节点集群数据中数据回滚的数据回滚时段。于该差值(数据回滚时段)的范围内,回滚该第一监控记录包含的该节点信息对应的该节点的数据。
本发明的实施例中,定时获取主节点的信息以及主节点和与主节点相对应的从节点记录的该节点的持久化操作的完成时间;获取主节点相对应的该从节点的持久化操作的 完成时间中的最小值;存储该主节点的信息以及该最小值之间的对应关系。其中,保存的数据包括但不限于具有数据键值对的形式,即保存数据具有key-value的形式,其中,key为数据键,value为数据值,并且数据值可为多种形式的数据,通过数据键值对的形式可方便地对保存的数据进行管理。
接着,多个节点包括一个主节点和至少一个从节点,在主节点故障的情况下,从与该主节点相对应的该从节点中选择持久化操作的完成时间为最大的该从节点作为继任主节点/新的主节点;根据本申请的一些实施例,持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的中位数。
具体地,获取与该主节点相对应的从节点的持久化操作的完成时间的中位数,包括:主节点和从节点中均执行持久化数据的操作,在某一时刻,主节点出现故障,则选择与该主节点对应的持久化操作的完成时间的值最大的从节点,接替该主节点的任务继续执行持久化数据的操作。该新的主节点的节点信息中包含持久化操作的完成时间设置为与出现故障的主节点对应的全部从节点的持久化操作的完成时间的中位数。
接着,同时,存储该继任主节点的信息以及该中位数之间的对应关系包括,上述维护程序能够定时获取并保存至少一个新的主节点的节点信息与完成数据持久化时间的中位数之间的对应关系。
通常,主节点和从节点中均执行持久化数据的操作,在某一时刻,主节点中出现故障,则从节点基于节点集群恢复机制,接替该主节点的任务继续执行持久化数据的操作。再如,从节点出现故障后,新建一个节点来代替该出现故障的从节点,然后新建的节点将代替该出现故障的从节点,完成执行持久化数据的操作。
通常,上述提到的节点集群恢复机制,是节点集群系统中的主节点与其对应的从节点之间建立的一种机制,该机制使得主节点与其对应的从节点之间建立起恢复接管关系,可以保证并发持久化节点集群处理的数据,从节点包括但不限于根据主节点的更新操作同步本地数据或者更新与主节点不相同的数据,能够保证主节点和从节点之间更新相同的数据或者并发更新不同的数据;另外,当新建或者重启主节点时,从节点可以接替主节点处理下一步的持久化数据的操作。
通常,监控系统可以作为控制主节点和从节点的设备,其可以对主节点和从节点是否出现故障的状态进行实时监测,监控系统对于节点出现故障的实时监测所采用技术可 以参照现有技术,本发明实施例在此不再详细介绍。
通常,在节点集群中,所有的节点之间可以存在信息交互,所有的节点可以知晓节点集群中除自身外所存在的节点。所有节点集群中的每个节点中包括但不限于携带有节点配置文件,该节点配置文件中包含有节点库集群中每个节点的IP地址(Internet Protocol Address,互联网协议地址),以及用以标识每个节点的标识信息,即上述通用唯一识别码UUID。
本发明的实施例中,当该主节点出现故障时,从与该主节点相对应的该从节点中选择一个未故障的该从节点作为继任主节点;获取与该主节点相对应的该从节点记录的该节点的持久化操作的完成时间;获取该从节点的持久化操作的完成时间中的最小值。
接着,获取与该主节点相对应的该从节点记录的该节点的持久化操作的完成时间;获取该从节点的持久化操作的完成时间中的最小值,包括:主节点和从节点中均执行持久化数据的操作,在某一时刻,主节点出现故障,则选择与该主节点对应的持久化操作的完成时间的值最小的从节点,接替该主节点的任务继续执行持久化数据的操作。该新的主节点的节点信息中包含持久化操作的完成时间设置为该持久化操作的完成时间的最小值。
接着,同时,存储该继任主节点的信息以及该最小值之间的对应关系,上述维护程序能够定时获取并保存至少一个新的主节点的节点信息与完成数据持久化时间的最小值之间的对应关系。
本发明的方法还适用于主节点和与该主节点对应的从节点均出现故障的情况,可以是主节点和从节点同时出现故障;也可以是主节点在先出现故障后,新的用以替代原主节点的从节点完成数据同步之前,该从节点再次出现故障,可以理解,此情况下,主节点和与该主节点对应的从节点均出现故障,还可以是从节点在先出现故障,新的从节点完成数据同步之前,主节点出现故障,同样的,此情况下,主节点和与该主节点对应的从节点均出现故障。
基于上面的描述,下面具体介绍确定分布式存储系统中的数据回滚时段的装置中各模块的主要工作流程。
根据本申请的一些实施例,结合上述方法中对确定分布式存储系统中的数据回滚时 段的方法的描述,本装置中描述的技术细节在此依然适用,为了避免重复,有些在此次不再赘述。如图4所示,具体地,包括:
监控模块201,监控模块201用于周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和监控时间。
具体地,监控模块201用于启动预设的定时监控任务,监控任务依据预设的时间间隔监控系统的持久化操作并生成监控记录,即监控模块201通过监控终端执行的监控程序/监控任务监听节点的通信端口,获得该节点的监控信息(节点的信息)。监控终端与节点间通信的实现方式采用但不限于一般的消息中间件即可,通过监听通信端口即可获得节点的监控信息(节点的信息)。监控终端执行的监控程序/监控任务的方式,包括但不限于预先设置一定时器或者时间间隔的阈值,该定时器能够定时启动一监控程序/监控任务基于时间间隔的阈值获取至少一个节点的信息。该监控程序/监控任务能够通过访问上述节点的访问接口与该节点通信,获取保存于该节点的信息。该监控程序能够是一个循环执行的程序,基于上述定时阈值的间隔,定时获取并保存至少一个节点的信息,同时,监控程序/监控任务记录监控程序/监控任务的执行时间,即监控时间;保存该执行时间与上述节点的信息之间的对应关系为该监控记录。
其中,该节点的信息可以包括但不限于,节点的名称,ip地址,主节点信息,从节点信息,持久化操作的完成时间。
比较模块202,比较模块202用于比较监控时间上相邻的两条监控记录中的节点名。
获取模块203,获取模块203用于,在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
接着,比较模块202比较相邻的两条监控记录,例如,这两条监控记录可以是第一监控记录和第二监控记录,第二监控记录的监控时间晚于第一监控记录的监控时间,当该第一监控记录和该第二监控记录包含的该节点名不相同时,获取模块203获取该第二监控记录包含的该监控时间与该第一监控记录包含的该节点的持久化操作的完成时间的差值,即,邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
具体地,上述监控记录中记录有节点集群中当前主节点的信息,比较模块202通过比较相邻的两条监控记录,在某一时刻,主节点中出现故障,则从节点基于节点集群恢复机制,接替该主节点的任务继续执行持久化数据的操作。此时,相邻的两条监控记录包含的的节点信息相同名相同,但是,相邻的两条监控记录中的后一条监控记录(第二监控记录),监控时间晚于前一条监控记录(第一监控记录)的监控时间,第二监控记录中保存的当选时间,可以作为数据处理系统发现主节点中出现故障的结束时间点,第一监控记录的同时相邻的两条监控记录中前一条监控记录(第一监控记录)中保存的持久化操作的完成时间当选时间,可以作为主节点出现故障的起始时间点,获取模块203,将主节点出现故障的起始时间点与发现主节点中出现故障的时间点做差值计算,这两个时间点之间的时间段,确认为该节点集群数据中数据回滚的数据回滚时段。于该差值(数据回滚时段)的范围内,回滚该第一监控记录包含的该节点信息对应的该节点的数据。当主节点出现故障,到从节点接任主节点,系统发现主节点变化的时间段内,该节点集群中已持久化的数据需要进行回滚操作,通过将主节点出现故障的起始时间点与发现主节点中出现故障的时间点做差值计算,来确定进行回滚操作的数据回滚时段。
获取模块还用于,在相邻的两条监控记录中的节点名相同且监控记录的当选时间不相同的情况下,获取相邻的两条监控记录中当选时间之间的时间段,作为数据回滚时段。
具体地,上述监控记录中记录有节点集群中当前主节点的信息,在某一时刻,主节点中出现故障,则从节点接替该主节点的任务继续执行持久化数据的操作,之后前任的主节点修复完毕而从节点发生故障,前任的主节点再次接替主节点的身份继续执行持久化数据的操作。此时,相邻的两条监控记录中的节点信息相同,相邻的两条监控记录中后一条监控记录(第二监控记录,监控时间晚于第一监控记录的监控时间)中保存的当选时间,可以作为数据处理系统发现主节点中出现故障的时间点,同时相邻的两条监控记录中前一条监控记录(第一监控记录)中保存的当选时间,可以作为主节点出现故障的起始时间点,获取模块将这两条记录中的当选时间做差值计算,这两个时间点之间的时间段,确认为该节点集群数据中数据回滚的数据回滚时段。于该差值(数据回滚时段)的范围内,回滚该第一监控记录包含的该节点信息对应的该节点的数据。
配置模块204还可以将系统中的节点配置多个节点包括一个主节点和至少一个从节点,并且在主节点故障的情况下,从节点中的一个转换为新的主节点。主节点可以但不限于,将一个大的数据持久化任务分为多个小的数据持久化任务并分发给至少一个从节点,各 个节点记录对应的执行数据的持久化操作的完成时间。
配置模块204用于在主节点故障的情况下,从从节点中选择持久化操作的完成时间最大的从节点作为新的主节点。
配置模块204还用于配置持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的最小值。
具体地,配置模块204定时获取主节点的信息以及主节点和与主节点相对应的从节点记录的该节点的持久化操作的完成时间;获取主节点相对应的该从节点的持久化操作的完成时间中的最小值;存储该主节点的信息以及该最小值之间的对应关系。
配置模块204用于配置持久化操作的完成时间为系统中的各个节点的持久化操作的完成时间中的中位数。
具体地,配置模块204获取与该主节点相对应的从节点的持久化操作的完成时间的中位数,包括:主节点和从节点中均执行持久化数据的操作,在某一时刻,主节点出现故障,则选择与该主节点对应的持久化操作的完成时间的值最大的从节点,接替该主节点的任务继续执行持久化数据的操作。该新的主节点的节点信息中包含持久化操作的完成时间设置为与出现故障的主节点对应的全部从节点的持久化操作的完成时间的中位数。
配置模块还基于可调节的时间阈值周期性地监控系统的持久化操作,时间阈值的间隔被配置为以分或者以秒为单位中的一种。这里的配置模块可以但不限于预先设置一定时器或者定时阈值(时间阈值),该定时器能够定时启动一监控程序/监控任务获取并保存至少一个节点的信息。
本申请还提供了一种机器可读介质,机器可读介质上存储有指令,该指令在机器上执行时使机器执行一种确定分布式存储系统中的数据回滚时段的方法,包括,周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和监控时间;比较监控时间上相邻的两条监控记录中的节点名;在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
本申请实施例还提供了一种系统,包括:
存储器,用于存储由系统的一个或多个处理器执行的指令,以及
处理器,是系统的处理器之一,用于执行一种确定分布式存储系统中的数据回滚时段的方法,包括,周期性地监控系统的持久化操作并生成监控记录,系统包括多个节点,监控记录包括系统中执行持久化操作的主节点的节点名、持久化操作的完成时间和监控时间;比较监控时间上相邻的两条监控记录中的节点名;在相邻的两条监控记录中的节点名不同的情况下,获取相邻的两条监控记录中监控时间较晚的监控记录中的监控时间和监控时间较早的监控记录中的持久化操作的完成时间之间的时间段,作为数据回滚时段。
本申请还提供了一种机器可读介质,机器可读介质上存储有指令,该指令在机器上执行时使机器执行上述确定分布式存储系统中的数据回滚时段的方法。
本申请实施例还提供了一种系统,包括:
存储器,用于存储由系统的一个或多个处理器执行的指令,以及
处理器,是系统的处理器之一,用于执行上述的确定分布式存储系统中的数据回滚时段的方法。
现在参考图5,所示为根据本申请的一个实施例的系统500的框图。图5示意性地示出了根据多个实施例的示例系统500。在一个实施例中,系统500可以包括一个或多个处理器504,与处理器504中的至少一个连接的系统控制逻辑508,与系统控制逻辑508连接的系统内存512,与系统控制逻辑508连接的非易失性存储器(NVM)516,以及与系统控制逻辑508连接的网络接口520。
在一些实施例中,处理器504可以包括一个或多个单核或多核处理器。在一些实施例中,处理器504可以包括通用处理器和专用处理器(例如,图形处理器,应用处理器,基带处理器等)的任意组合。在系统500采用eNB(Evolved Node B,增强型基站)101或RAN(Radio Access Network,无线接入网)控制器102的实施例中,处理器504可以被配置为执行各种符合的实施例,例如,如图1所示的实施例中的一个或多个。
在一些实施例中,系统控制逻辑508可以包括任意合适的接口控制器,以向处理器504中的至少一个和/或与系统控制逻辑508通信的任意合适的设备或组件提供任意合适的接口。
在一些实施例中,系统控制逻辑508可以包括一个或多个存储器控制器,以提供连接到系统内存512的接口。系统内存512可以用于加载以及存储数据和/或指令。在一些实施例中系统500的内存512可以包括任意合适的易失性存储器,例如合适的动态随机存取存储器(DRAM)。
NVM/存储器516可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中,NVM/存储器516可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备,例如HDD(Hard Disk Drive,硬盘驱动器),CD(Compact Disc,光盘)驱动器,DVD(Digital Versatile Disc,数字通用光盘)驱动器中的至少一个。
NVM/存储器516可以包括安装系统500的装置上的一部分存储资源,或者它可以由设备访问,但不一定是设备的一部分。例如,可以经由网络接口520通过网络访问NVM/存储516。
特别地,系统内存512和NVM/存储器516可以分别包括:指令524的暂时副本和永久副本。指令524可以包括:由处理器504中的至少一个执行时导致系统500实施如图1所示方法的指令。在一些实施例中,指令524、硬件、固件和/或其软件组件可另外地/替代地置于系统控制逻辑508,网络接口520和/或处理器504中。
网络接口520可以包括收发器,用于为系统500提供无线电接口,进而通过一个或多个网络与任意其他合适的设备(如前端模块,天线等)进行通信。在一些实施例中,网络接口520可以集成于系统500的其他组件。例如,网络接口520可以集成于处理器504的,系统内存512,NVM/存储器516,和具有指令的固件设备(未示出)中的至少一种,当处理器504中的至少一个执行所述指令时,系统500实现如图1所示的方法。
网络接口520可以进一步包括任意合适的硬件和/或固件,以提供多输入多输出无线电接口。例如,网络接口520可以是网络适配器,无线网络适配器,电话调制解调器和/或无线调制解调器。
在一个实施例中,处理器504中的至少一个可以与用于系统控制逻辑508的一个或多个控制器的逻辑封装在一起,以形成系统封装(SiP)。在一个实施例中,处理器504中的至少一个可以与用于系统控制逻辑508的一个或多个控制器的逻辑集成在同一管芯上,以形成片上系统(SoC)。
系统500可以进一步包括:输入/输出(I/O)设备532。I/O设备532可以包括用户界面,使得用户能够与系统500进行交互;外围组件接口的设计使得外围组件也能够与系统500交互。在一些实施例中,系统500还包括传感器,用于确定与系统500相关的环境条件和位置信息的至少一种。
在一些实施例中,用户界面可包括但不限于显示器(例如,液晶显示器,触摸屏显示器等),扬声器,麦克风,一个或多个相机(例如,静止图像照相机和/或摄像机),手电筒(例如,发光二极管闪光灯)和键盘。
在一些实施例中,外围组件接口可以包括但不限于非易失性存储器端口、音频插孔和电源接口。
在一些实施例中,传感器可包括但不限于陀螺仪传感器,加速度计,近程传感器,环境光线传感器和定位单元。定位单元还可以是网络接口520的一部分或与网络接口520交互,以与定位网络的组件(例如,全球定位系统(GPS)卫星)进行通信。
根据本申请的实施例,图6示出了一种SoC(System on Chip,片上系统)600的框图。在图6中,相似的部件具有同样的附图标记。另外,虚线框是更先进的SoC的可选特征。在图6中,SoC 600包括:互连单元650,其被耦合至应用处理器615;系统代理单元670;总线控制器单元680;集成存储器控制器单元640;一组或一个或多个协处理器620,其可包括集成图形逻辑、图像处理器、音频处理器和视频处理器;静态随机存取存储器(SRAM)单元630;直接存储器存取(DMA)单元660。在一个实施例中,协处理器620包括专用处理器,诸如例如网络或通信处理器、压缩引擎、GPGPU、高吞吐量MIC处理器、或嵌入式处理器等等。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如,计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (14)

  1. 一种确定分布式存储系统中的数据回滚时段的方法,其特征在于,包括:
    周期性地监控所述系统的持久化操作并生成监控记录,所述系统包括多个节点,所述监控记录包括所述系统中执行持久化操作的主节点的节点名、所述持久化操作的完成时间、监控时间和所述主节点的当选时间,所述当选时间用于记录所述节点当选所述主节点的时间;
    比较所述监控时间上相邻的两条所述监控记录中的节点名;
    在相邻的两条所述监控记录中的节点名不同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的所述监控时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段;
    在相邻的两条所述监控记录中的节点名相同且所述监控记录的当选时间不相同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的当选时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的最小值。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的中位数。
  4. 根据权利要求1所述的方法,其特征在于,还包括:
    所述多个节点包括一个主节点和至少一个从节点,并且在所述主节点故障的情况下,所述从节点中的一个转换为新的主节点。
  5. 根据权利要求4所述的方法,其特征在于,还包括:
    在所述主节点故障的情况下,从所述从节点中选择持久化操作的完成时间最大的所述从节点作为新的主节点。
  6. 根据权利要求4所述的方法,其特征在于,还包括:
    基于可调节的时间阈值周期性地监控所述系统的持久化操作,所述时间阈值的间隔被配置为以分或者以秒为单位中的一种。
  7. 一种确定分布式存储系统中的数据回滚时段的装置,其特征在于,包括:
    监控模块,所述监控模块用于周期性地监控所述系统的持久化操作并生成监控记录,所述系统包括多个节点,所述监控记录包括所述系统中执行持久化操作的主节点的节点名、所述持久化操作的完成时间、监控时间和所述主节点的当选时间,所述当选时间用于记录所述节点当选所述主节点的时间;
    比较模块,所述比较模块用于比较所述监控时间上相邻的两条所述监控记录中的节点名;
    获取模块,所述获取模块用于,在相邻的两条所述监控记录中的节点名不同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的所述监控时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段;
    所述获取模块还用于,在相邻的两条所述监控记录中的节点名相同且所述监控记录的当选时间不相同的情况下,获取相邻的两条所述监控记录中所述监控时间较晚的监控记录中的当选时间和所述监控时间较早的监控记录中的所述持久化操作的完成时间之间的时间段,作为所述数据回滚时段。
  8. 根据权利要求7所述的装置,其特征在于,还包括:
    配置模块,所述配置模块用于配置所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的最小值。
  9. 根据权利要求8所述的装置,其特征在于,还包括:
    所述配置模块用于配置所述持久化操作的完成时间为所述系统中的各个所述节点的持久化操作的完成时间中的中位数。
  10. 根据权利要求8所述的装置,其特征在于,还包括:
    所述配置模块用于配置所述多个节点包括一个主节点和至少一个从节点,并且在所述主节点故障的情况下,所述从节点中的一个转换为新的主节点。
  11. 根据权利要求10所述的装置,其特征在于,还包括:
    所述配置模块用于在所述主节点故障的情况下,从所述从节点中选择持久化操作的完成时间最大的所述从节点作为新的主节点。
  12. 根据权利要求8所述的装置,其特征在于,还包括:
    所述配置模块用于配置基于可调节的时间阈值周期性地监控所述系统的持久化操作,所述时间阈值的间隔被配置为以分或者以秒为单位中的一种。
  13. 一种机器可读介质,其特征在于,所述机器可读介质上存储有指令,该指令在机器上执行时使机器执行权利要求1至6中任一项所述的确定分布式存储系统中的数据回滚时段的方法。
  14. 一种系统,其特征在于,包括:
    存储器,用于存储由系统的一个或多个处理器执行的指令,以及处理器,是系统的处理器之一,用于执行权利要求1至6中任一项所述的确定分布式存储系统中的数据回滚时段的。
PCT/CN2020/095846 2020-01-13 2020-06-12 一种确定分布式存储系统中的数据回滚时段的方法 WO2021143039A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010031344.4A CN111208949B (zh) 2020-01-13 2020-01-13 一种确定分布式存储系统中的数据回滚时段的方法
CN202010031344.4 2020-01-13

Publications (1)

Publication Number Publication Date
WO2021143039A1 true WO2021143039A1 (zh) 2021-07-22

Family

ID=70790094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095846 WO2021143039A1 (zh) 2020-01-13 2020-06-12 一种确定分布式存储系统中的数据回滚时段的方法

Country Status (2)

Country Link
CN (1) CN111208949B (zh)
WO (1) WO2021143039A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111208949B (zh) * 2020-01-13 2020-12-25 上海依图网络科技有限公司 一种确定分布式存储系统中的数据回滚时段的方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462432A (zh) * 2014-12-15 2015-03-25 成都英力拓信息技术有限公司 自适应的分布式计算方法
CN105468297A (zh) * 2015-11-18 2016-04-06 临沂大学 一种云存储系统内主从设备数据快速同步的方法
CN107168642A (zh) * 2017-03-30 2017-09-15 北京奇艺世纪科技有限公司 一种数据存储方法及系统
CN107577717A (zh) * 2017-08-09 2018-01-12 阿里巴巴集团控股有限公司 一种保障数据一致性的处理方法、装置及服务器
CN108984779A (zh) * 2018-07-25 2018-12-11 郑州云海信息技术有限公司 分布式文件系统快照回滚元数据处理方法、装置及设备
US10235066B1 (en) * 2017-04-27 2019-03-19 EMC IP Holding Company LLC Journal destage relay for online system checkpoint creation
WO2019089599A1 (en) * 2017-10-31 2019-05-09 Ab Initio Technology Llc Managing a computing cluster using durability level indicators
CN111208949A (zh) * 2020-01-13 2020-05-29 上海依图网络科技有限公司 一种确定分布式存储系统中的数据回滚时段的方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10324637B1 (en) * 2016-12-13 2019-06-18 EMC IP Holding Company LLC Dual-splitter for high performance replication
CN106814972B (zh) * 2016-12-22 2018-04-17 北京华云网际科技有限公司 分布式块存储的快照节点的回滚方法和装置
CN108874552B (zh) * 2018-06-28 2021-09-21 杭州云毅网络科技有限公司 分布式锁执行方法、装置及系统、应用服务器和存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462432A (zh) * 2014-12-15 2015-03-25 成都英力拓信息技术有限公司 自适应的分布式计算方法
CN105468297A (zh) * 2015-11-18 2016-04-06 临沂大学 一种云存储系统内主从设备数据快速同步的方法
CN107168642A (zh) * 2017-03-30 2017-09-15 北京奇艺世纪科技有限公司 一种数据存储方法及系统
US10235066B1 (en) * 2017-04-27 2019-03-19 EMC IP Holding Company LLC Journal destage relay for online system checkpoint creation
CN107577717A (zh) * 2017-08-09 2018-01-12 阿里巴巴集团控股有限公司 一种保障数据一致性的处理方法、装置及服务器
WO2019089599A1 (en) * 2017-10-31 2019-05-09 Ab Initio Technology Llc Managing a computing cluster using durability level indicators
CN108984779A (zh) * 2018-07-25 2018-12-11 郑州云海信息技术有限公司 分布式文件系统快照回滚元数据处理方法、装置及设备
CN111208949A (zh) * 2020-01-13 2020-05-29 上海依图网络科技有限公司 一种确定分布式存储系统中的数据回滚时段的方法

Also Published As

Publication number Publication date
CN111208949B (zh) 2020-12-25
CN111208949A (zh) 2020-05-29

Similar Documents

Publication Publication Date Title
WO2017124938A1 (zh) 一种数据同步方法、装置及系统
CN110048896B (zh) 一种集群数据获取方法、装置及设备
US11991094B2 (en) Metadata driven static determination of controller availability
CN111225064A (zh) Ceph集群部署方法、系统、设备和计算机可读存储介质
CN115562911B (zh) 虚拟机数据备份方法及装置、系统、电子设备、存储介质
US11397632B2 (en) Safely recovering workloads within a finite timeframe from unhealthy cluster nodes
CN111880956A (zh) 一种数据同步方法和装置
JP2019204527A (ja) 記憶機器のデータ位置の処理方法及び処理装置、コンピュータ機器並びにコンピュータ読み取り可能な記憶媒体
CN115328662A (zh) 一种进程线程资源管理控制方法及系统
WO2021143039A1 (zh) 一种确定分布式存储系统中的数据回滚时段的方法
CN116302352A (zh) 集群灾备处理方法、装置、电子设备和存储介质
CN113238778B (zh) 一种升级bios固件的方法、系统、设备及介质
CN111031126A (zh) 集群缓存共享方法、系统、设备及存储介质
CN112685486B (zh) 数据库集群的数据管理方法、装置、电子设备及存储介质
CN112181049B (zh) 集群时间同步方法、装置、系统、设备及可读存储介质
CN112000850B (zh) 进行数据处理的方法、装置、系统及设备
WO2018010603A1 (zh) 基于视频云存储系统的存储模式升级方法、装置和系统
CN110750424B (zh) 资源巡检方法和装置
CN110071778B (zh) 一种对时方法、装置、设备及介质
CN114363356B (zh) 数据同步方法、系统、装置、计算机设备和存储介质
CN114500289B (zh) 控制平面恢复方法、装置、控制节点及存储介质
WO2015035891A1 (zh) 补丁方法、设备及系统
CN106453656B (zh) 一种集群主机选取方法及装置
CN112527561B (zh) 基于物联网云存储的数据备份方法及装置
CN114880717A (zh) 数据归档方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914187

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914187

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20914187

Country of ref document: EP

Kind code of ref document: A1