WO2021226905A1 - 一种数据存储方法、系统及存储介质 - Google Patents
一种数据存储方法、系统及存储介质 Download PDFInfo
- Publication number
- WO2021226905A1 WO2021226905A1 PCT/CN2020/090138 CN2020090138W WO2021226905A1 WO 2021226905 A1 WO2021226905 A1 WO 2021226905A1 CN 2020090138 W CN2020090138 W CN 2020090138W WO 2021226905 A1 WO2021226905 A1 WO 2021226905A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- written
- master node
- slave nodes
- node
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 112
- 238000013500 data storage Methods 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000004044 response Effects 0.000 claims abstract description 41
- 230000001360 synchronised effect Effects 0.000 claims abstract description 18
- 230000015654 memory Effects 0.000 claims description 30
- 238000004891 communication Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 18
- 238000011084 recovery Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
Definitions
- the embodiments of the present application relate to the field of data storage, and in particular, to a data storage method, system, and storage medium.
- AMQ Active Message Queue
- DataBase, DB Kaha database
- NFS network file systems
- AMQ natively uses Kaha database (DataBase, DB) to implement message storage and indexing.
- KahaDB is based on stand-alone storage.
- NFS network file systems
- AMQ relies on network file systems (NFS) and other network file systems to achieve master-slave data synchronization, but the entire cluster has only one copy of data, which leads to the problem of low availability of AMQ .
- NFS network file systems
- the embodiments of the present application provide a data storage method, system, and storage medium, which can improve the usability of AMQ.
- An embodiment of the present application provides a data storage method, and the method includes:
- a write success response in response to the write request is sent to the client through the master node.
- An embodiment of the present application provides a data storage system, and the system includes:
- the receiving unit is used to receive the write request of the client through the master node
- the acquiring unit is configured to acquire the data to be written according to the write request
- a storage unit configured to store the data to be written to the master node
- a synchronization unit for synchronizing the data to be written to multiple slave nodes
- a recording unit configured to use the master node to record the number of slave nodes that have synchronized the data to be written among the plurality of slave nodes;
- the sending unit is configured to send a write success response in response to the write request to the client through the master node when it is detected that the number of slave nodes meets the submission condition.
- An embodiment of the present application provides a data storage system.
- the system includes a processor, a memory, a transmitter, a receiver, and a communication bus.
- the processor executes an operating program stored in the memory, the above method.
- the embodiment of the present application provides a computer-readable storage medium with a program stored thereon, which is applied to a data storage system, and when the program is executed by a processor, the method as described in any one of the above is implemented.
- the embodiments of the present application provide a data storage method, system, and storage medium.
- the method may include: receiving a write request from a client through a master node, and obtaining data to be written according to the write request; storing the data to be written To the master node; synchronize the data to be written to multiple slave nodes, and use the master node to record the number of slave nodes that have synchronized the data to be written among multiple slave nodes; when it is detected that the number of slave nodes meets the submission conditions, pass the master node The node sends a write success response in response to the write request to the client.
- the master node when the master node stores the data to be written sent by the client, the data to be written is synchronized to multiple slave nodes, which realizes the distributed storage of the data to be written by the client.
- multiple slave nodes synchronize data to be written, monitor the number of slave nodes that have completed synchronization in real time, and send a write success response to the client when the number of slave nodes meets the submission conditions, which can ensure that the submitted message will not be lost, and then Improved the availability of AMQ.
- FIG. 1 is a first flowchart of a data storage method provided by an embodiment of this application
- FIG. 2 is a schematic diagram of an exemplary RaftDB cluster writing message to be written to the client according to an embodiment of the application;
- FIG. 3 is a second flowchart of a data storage method provided by an embodiment of this application.
- FIG. 4 is a schematic diagram of a process of selecting a master node in an exemplary RaftDB cluster according to an embodiment of the application;
- FIG. 5 is a third flowchart of a data storage method provided by an embodiment of this application.
- FIG. 6 is a first structural diagram of a data storage system provided by an embodiment of this application.
- FIG. 7 is a second structural diagram of a data storage system provided by an embodiment of this application.
- the master-slave cluster implements data sharing through NFS.
- the entire cluster essentially only has one copy of data, and there is no data copy, which cannot meet the availability.
- Pseudo-cluster; and NFS network transmission also has performance loss, which limits it Improve the throughput performance of AMQ.
- KahaDB lacks a data check and data recovery mechanism. When the data is torn due to an abnormal downtime, the node cannot be restarted. The data needs to be cleared, causing a large area of data loss, which leads to data recovery failure.
- FIG. 1 is a schematic diagram of the implementation process of a data storage method proposed in the embodiment of the present application.
- the method may include:
- S101 Receive a write request from a client through the master node, and obtain data to be written according to the write request.
- the data storage method provided by the embodiment of the present application is suitable for the scenario of using RaftDB to implement the cluster expansion of the AMQ distributed multi-replica set.
- the data storage system is a RaftDB cluster
- the RaftDB cluster includes multiple RaftDB nodes
- the master node is the leader node of the multiple RaftDB nodes.
- RaftDB exposes the state machine interface to the outside, allowing the state machine of the upper layer of RaftDB to implement snapshots.
- the master node receives the write request from the client, and the state machine records the login operation
- the stateful data in, such as user name, password, etc. is the data to be written.
- the state machine generates the snapshot and the log corresponding to the snapshot according to the write request.
- the master node determines the data to be written according to the log.
- snapshots are allowed to be sent and loaded.
- data synchronization can be quickly realized by sending and loading snapshots.
- the data storage system When the data storage system receives the write request from the client through the master node, and obtains the data to be written according to the write request, the data storage system stores the data to be written to the master node.
- the master node after acquiring the data to be written, the master node assigns a unique incremental ID to the data to be written, and the unique incremental ID can ensure the continuity and sequence of data synchronization by the slave nodes.
- the data to be written includes the index file to be written and the data file to be written.
- the index file to be written is used to perform addressing in the master node to obtain the addressing result. After that, the data file to be written is Sequentially write to the addressing result.
- the index file to be written uses a memory mapped file, and the index file to be written uses a sparse index.
- S103 Synchronize the data to be written to multiple slave nodes, and use the master node to record the number of slave nodes that have synchronized the data to be written among the multiple slave nodes.
- the data storage system After the data storage system stores the data to be written to the master node, the data storage system synchronizes the data to be written to multiple slave nodes, and uses the master node to record the number of slave nodes that have synchronized the data to be written among the multiple slave nodes .
- the master node synchronizes the data to be written to multiple slave nodes, where the selection of multiple slave nodes can be based on load balancing, service type, etc.
- the specific selection method of multiple slave nodes can be performed according to actual conditions Selection, the embodiment of this application does not make specific limitations. After that, after the slave node completes the synchronization operation of the data to be written, the slave node returns the latest write offset, updates the internal state, and responds to the client.
- the master node monitors the synchronization status of the multiple slave nodes in real time.
- the node will increase by one from the number of nodes.
- S104 When it is detected that the number of slave nodes meets the submission condition, send a write success response in response to the write request to the client through the master node.
- the data storage system sets the submission conditions.
- the data storage system uses the master node to record the number of slave nodes that have synchronized data to be written among multiple slave nodes, it will match the slave node data with the submission conditions and check whether the slave node data meets the submission requirements.
- Condition When the master node detects that the number of slave nodes meets the submission condition, the data storage system sends a write success response in response to the write request to the client through the master node.
- the master node determines whether the data of the slave nodes in turn meets the submission condition, and until it detects that the number of slave nodes meets the submission condition, the master node sends a write success response to the client to Respond to the client's write request.
- the master node obtains the first number corresponding to multiple slave nodes, and uses half of the first number as a submission condition.
- the master node compares the number of synchronized slave nodes with half of the first number. When the number of slave nodes is greater than half of the first number, the master node determines that the number of slave nodes meets the submission condition. At this time, the master node sends a write success response in response to the write request to the client.
- RaftDB with id 0 is the master node
- RatfDB with id 1 and id 2 is the slave node.
- the client sends the information to be written to the master node, and the data of the information to be written Is "A32d322991023421", the information to be written is stored in RaftDB with id 0.
- RaftDB with id 0 is increased by "7", and the corresponding term of office is increased by "3", and then RaftDB with id 0 RaftDB with id 1 sends an additional entry request, which carries the following information to be written "term: “3", index: "7”, data: "A32d322991023421””
- RaftDB with id 1 synchronizes the to-be-written information
- the index of RaftDB with id 1 is increased by "7", and the corresponding term of office is increased by "3".
- RaftDB with id 1 returns an additional entry response to RaftDB with id 0.
- the additional entry response is Carrying the following information "index: 7", RaftDB with an id of 0 sends a write success response to the client after receiving the additional entry response.
- index 7 the process of storing the message to be written with index 7 in RaftDB with id 0 and RaftDB with id 1 is completed.
- the master node when it stores the data to be written sent by the client, it synchronizes the data to be written to multiple slave nodes, which realizes the distributed storage of the data to be written by the client, and simultaneously synchronizes the data to be written on multiple slave nodes.
- When writing data monitor the number of slave nodes that have completed synchronization in real time, and send a write success response to the client when the number of slave nodes meets the submission conditions, which can ensure that the submitted message will not be lost, thereby improving the availability of AMQ.
- Aggregating stateful data to the storage layer can realize the separation of computing and storage architecture, as well as the stateless horizontal expansion of upper-layer applications.
- Through the replica set mechanism of the underlying storage after a single node system is connected to the RaftDB storage, it has the ability to expand the distributed multi-replica set cluster.
- the above-mentioned data storage system before the above-mentioned data storage system receives the client's write request through the master node, that is, before step 101, the above-mentioned data storage system also performs master node election when implementing the data storage method. As shown in Figure 3, it specifically includes the following steps:
- the data storage system when the data storage system determines that the connection between the master node and the slave node is interrupted, the data storage system obtains the master node election instruction; or, when the storage system determines that the master node is faulty, the data storage system obtains Master node election instruction; or, when the data storage system is created, the data storage system obtains the master node election instruction, and the specific data storage system obtains the master node election instruction according to the actual situation.
- the embodiment of the application does not make specific details limited.
- the link layer senses that the connection between the master node and the slave node is interrupted, At this time, the data storage system determines that the connection between the master node and the slave node is interrupted.
- the connection between the master node and the slave node will be interrupted, and the master node and the slave node can be judged during the synchronization process of the data to be written by the slave node.
- the node has a connection interruption problem, specifically, when the preset time arrives, the data storage system judges whether the number of slave nodes that have synchronized the data to be written meets the submission condition, and the data storage system detects the slave node when the preset time arrives.
- the data storage system determines that the connection between the master node and the slave node is interrupted; at this time, the data storage system triggers the master node reselection process and obtains the master node election instruction.
- the data storage system judges whether the number of slave nodes is greater than half of the first number when the preset time arrives. When the number of slave nodes is not greater than half of the first number, it represents the number of slave nodes when the preset time arrives. The submission conditions are not met. At this time, it is determined that the connection between the master node and the slave node is interrupted, and the data storage system starts the master node reselection process and obtains the slave node election instruction.
- the data storage system can determine whether the master node is faulty based on the heartbeat between the master and slave nodes, and when determining that the master node is faulty, trigger the master node reselection process and obtain the master node election instruction.
- the storage database in the embodiment of the application is the RaftDB node, and each RaftDB in the data storage system includes a term (term of office) and a write offset, where the write offset is when the RaftDB node is writing a new message
- the data storage system can determine whether there are the latest logs in multiple RaftDB nodes based on multiple terms and multiple write offsets.
- S202 Determine a first storage database from multiple storage databases according to multiple tenures and multiple write offsets.
- the data storage system After the data storage system separately obtains multiple tenures and multiple write offsets corresponding to multiple storage databases in the data storage system, the data storage system obtains multiple tenures and multiple write offsets from multiple storages according to multiple tenures and multiple write offsets.
- the first storage database is determined in the database.
- the data storage system compares multiple terms of office; when the data storage system determines the term with the largest number of terms from the multiple terms of office, the data storage system directly determines the storage database corresponding to a term of office as the first term. Storage database; when the data storage system determines at least two terms with the largest term from among multiple terms of office, the data storage system further determines the first storage database from at least two storage databases corresponding to the at least two terms of office.
- the data storage system first obtains at least two storage databases corresponding to at least two terms of office and at least two write offsets corresponding to at least two storage databases; the data storage system determines the write from the at least two write offsets The first write offset with the largest offset, and the storage database corresponding to the first write offset is determined as the first storage database.
- the data storage system includes RaftDB with id 0, RaftDB with id 1, and RaftDB with id 2.
- a voting request is initiated from RaftDB with id 0 and RaftDB with id 1.
- the term of RaftDB with id 0 is 3
- the term of RaftDB with id 1 and RaftDB with id 2 is 2, which represents RaftDB with id 0
- the latest log is stored in the database.
- the RaftDB with id 0 is selected as the master node.
- RaftDB with id 0 can periodically perform heartbeat detection on RaftDB with id 1 and RaftDB with id 2.
- S203 Determine the first storage database as the master node, and determine the second storage database among the multiple storage databases as the slave node, and the second storage database is a storage database other than the first storage database among the multiple storage databases.
- the data storage system determines the first storage database as the master node, and determines the second storage database among the multiple storage databases except the first storage database It is a slave node.
- the master node is responsible for processing the client's read and write requests, and regularly sends heartbeats to the slave nodes to monitor the status of each slave node.
- RaftDB implements a master node election mechanism, which restricts the master node to have the latest log through the term of office and write offset, thereby ensuring that the submitted message will not be lost, thereby realizing the reliability of the message.
- the foregoing data storage system also performs data recovery when implementing the data storage method, as shown in FIG. 5, which specifically includes the following steps:
- the first node When the first node performs data recovery, search for a snapshot, and the first node is a master node and a node that performs data recovery from the node.
- the subsequent log is truncated and waits for synchronization.
- the data storage system searches for the snapshot ,
- the snapshot is the snapshot generated when the data to be written is written;
- RaftDB scans the data files to be written, and can rely on checkpoints to reduce the number of scanned files; since the message storage protocol uses fixed message headers + message bodies to achieve, it relies on the cyclic redundancy check in the message header. Check, CRC) check the data, and finally complete the check of the data file.
- S302 Load the snapshot on the first node, and replay the log corresponding to the snapshot.
- the data storage system When the data storage system finds the snapshot, the data storage system loads the snapshot on the first node and replays the log corresponding to the snapshot.
- the data storage system After the data storage system loads the snapshot on the first node and replays the log corresponding to the snapshot, the data storage system rewrites the data to be written on the first node according to the log.
- FIG. 6 is a schematic diagram 1 of the composition structure of the data storage system proposed in the embodiment of the present application.
- the data storage system 1 proposed in the embodiment of the present application can include:
- the receiving unit 10 is configured to receive a write request from a client through the master node
- the obtaining unit 11 is configured to obtain the data to be written according to the write request
- the storage unit 12 is configured to store the data to be written to the master node
- the synchronization unit 13 is configured to synchronize the data to be written to multiple slave nodes
- the recording unit 14 is configured to use the master node to record the number of slave nodes that have synchronized the data to be written among the plurality of slave nodes;
- the sending unit 15 is configured to send a write success response in response to the write request to the client through the master node when it is detected that the number of slave nodes meets the submission condition.
- the data storage system 1 further includes: a judgment unit;
- the acquiring unit 11 is further configured to acquire the first number corresponding to the multiple slave nodes;
- the judgment unit is configured to judge that the number of slave nodes meets the submission condition when the number of slave nodes is greater than half of the first number
- the sending unit 15 is configured to send a write success response in response to the write request to the client through the master node.
- the data storage system further includes: a determining unit;
- the obtaining unit 11 is further configured to obtain multiple tenures and multiple write offsets corresponding to multiple storage databases in the data storage system when the master node election instruction is obtained;
- the determining unit is configured to determine a first storage database from the multiple storage databases according to the multiple tenures and the multiple write offsets; and determine the first storage database as the
- the master node determines a second storage database among the plurality of storage databases as the slave node, and the second storage database is a storage database other than the first storage database among the plurality of storage databases.
- the data storage system further includes: a comparison unit;
- the comparison unit is used to compare the multiple terms of office
- the determining unit is further configured to determine the storage database corresponding to the one tenure as the first storage database when the one with the largest number of tenures is determined from the multiple tenures;
- the obtaining unit 11 is further configured to obtain at least two storage databases and the at least two storage databases corresponding to the at least two tenures when the at least two tenures with the largest tenure are determined from the plurality of tenures At least two corresponding write offsets;
- the determining unit is further configured to determine a first write offset with the largest write offset from the at least two write offsets, and set a corresponding value to the first write offset
- the storage database is determined to be the first storage database.
- the judging unit is further configured to judge that the connection between the master node and the slave node is interrupted when it is detected that the number of slave nodes does not satisfy the submission condition when the preset time arrives;
- the acquiring unit 11 is also used to trigger the master node re-election process and acquire the master node election instruction.
- the data storage system further includes: an addressing unit and a writing unit;
- the addressing unit is configured to use the index file to be written to perform addressing in the master node to obtain an addressing result
- the writing unit is configured to sequentially write the data file to be written into the addressing result.
- the determining unit is further configured to generate a snapshot and a log corresponding to the snapshot according to the write request; and determine the data to be written according to the log.
- the data storage system further includes: a search unit, a loading unit, and a playback unit;
- the searching unit is configured to search for the snapshot when the first node performs data restoration, and the first node is the node that performs data restoration among the master node and the slave node;
- the loading unit is configured to load the snapshot on the first node
- the replay unit is configured to replay the log corresponding to the snapshot
- the writing unit is further configured to rewrite the data to be written on the first node according to the log.
- FIG. 7 is a second schematic diagram of the composition structure of the data storage system proposed in the embodiment of the application.
- the data storage system 1 proposed in the embodiment of the application may further include a processor 110, a memory 111, a transmitter 112, and a receiver. 113 and the communication bus 114.
- the above-mentioned acquisition unit 11, synchronization unit 13, recording unit 14, judgment unit, determination unit, comparison unit, addressing unit, writing unit, search unit, loading unit, and playback unit may be located at
- the processor 110 on the data storage system 1 may be implemented
- the storage unit 12 may be implemented by the memory 111 located on the data storage system 1
- the sending unit 15 may be implemented by the transmitter 112 located on the data storage system 1
- the receiving unit 10 may be implemented by The receiver 113 on the data storage system 1 is implemented.
- the above-mentioned processor 110 may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field ProgRAMmable Gate Array, FPGA), Central Processing Unit (CPU), At least one of a controller, a microcontroller, and a microprocessor.
- ASIC application specific integrated circuit
- DSP Digital Signal Processor
- DSPD Digital Signal Processing Device
- PLD Programmable Logic Device
- FPGA Field Programmable Gate Array
- CPU Central Processing Unit
- the memory 111 is configured to store executable program code, the program code includes computer operation instructions, the memory 111 may include a high-speed RAM memory, or may also include a non-volatile memory, for example, at least two disk memories.
- the communication bus 114 is used to connect the processor 110, the memory 111, the transmitter 112, the receiver 113, and the mutual communication among these devices.
- the memory 111 is used to store instructions and data.
- the above-mentioned receiver 113 is configured to receive the write request of the client through the master node
- the above-mentioned processor 110 is configured to obtain the data to be written according to the write request; synchronize the data to be written to multiple slave nodes, and use the master node to record the data in the multiple slave nodes. Synchronizing the number of slave nodes whose data is to be written;
- the aforementioned memory 111 is configured to store the data to be written to the master node
- the above-mentioned transmitter 112 is configured to send a write success response in response to the write request to the client through the master node when it is detected that the number of slave nodes meets the submission condition.
- the above-mentioned processor 110 is further configured to obtain a first number corresponding to the plurality of slave nodes; when the number of slave nodes is greater than half of the first number, it is determined that the number of slave nodes satisfies the submission conditions;
- the above-mentioned transmitter 112 is further configured to send a write success response in response to the write request to the client through the master node.
- the above-mentioned processor 110 is further configured to obtain multiple tenures and multiple write offsets corresponding to multiple storage databases in the data storage system when the master node election instruction is obtained; according to the multiple The tenure and the multiple write offsets, the first storage database is determined from the multiple storage databases; the first storage database is determined as the master node, and the multiple storage databases are The second storage database of is determined to be the slave node, and the second storage database is a storage database other than the first storage database among the plurality of storage databases.
- the above-mentioned processor 110 is further configured to compare the multiple terms of office; when the term of office with the largest number of terms is determined from the multiple terms of office, determine the storage database corresponding to the term of office as all terms.
- the first storage database when at least two terms with the largest term of office are determined from the plurality of terms of office, at least two storage databases corresponding to the at least two terms of office and at least two storage databases corresponding to the at least two terms of office are acquired Two write offsets; the first write offset with the largest write offset is determined from the at least two write offsets, and the first write offset corresponds to The storage database is determined to be the first storage database.
- processor 110 is further configured to determine that the connection between the master node and the slave node is interrupted when it is detected that the number of slave nodes does not satisfy the submission condition when the preset time arrives; trigger the master node Re-election process, and obtain the master node election instruction.
- processor 110 is further configured to use the index file to be written to perform addressing in the master node to obtain an addressing result; to sequentially write the data file to be written into the addressing result middle.
- the aforementioned processor 110 is further configured to generate a snapshot and a log corresponding to the snapshot according to the write request; and determine the data to be written according to the log.
- the above-mentioned processor 110 is further configured to search for the snapshot when the first node performs data restoration, and the first node is the node that performs data restoration among the master node and the slave node;
- the snapshot is loaded on the first node, and the log corresponding to the snapshot is replayed; and the data to be written is rewritten on the first node according to the log.
- the above-mentioned memory 111 may be a volatile memory (volatile memory), such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory). ), such as Read-Only Memory (ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD) or Solid-State Drive (SSD); or the above types A combination of the first memory and provide instructions and data to the processor 110.
- volatile memory such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory).
- ROM Read-Only Memory
- Flash Memory Flash Memory
- HDD Hard Disk Drive
- SSD Solid-State Drive
- the functional modules in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be realized in the form of hardware or software function module.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this embodiment is essentially or correct
- the part that the prior art contributes or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can be a personal computer).
- the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
- a data storage system proposed in an embodiment of the application receives a write request from a client through a master node, and obtains the data to be written according to the write request; stores the data to be written to the master node; synchronizes the data to be written To multiple slave nodes, and use the master node to record the number of slave nodes that have synchronized data to be written among multiple slave nodes; when it is detected that the number of slave nodes meets the submission conditions, the master node sends a response to the write request to the client Write a successful response.
- the master node when the master node stores the data to be written sent by the client, the data to be written is synchronized to multiple slave nodes, which realizes the distributed storage of the data to be written by the client.
- the number of slave nodes that have completed synchronization is monitored in real time.
- a write success response is sent to the client to ensure that the submitted message will not be lost. Improved the availability of AMQ.
- the embodiments of the present application provide a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, the data storage method as described above is realized.
- the program instructions corresponding to a data storage method in this embodiment can be stored on storage media such as optical disks, hard disks, USB flash drives, etc.
- storage media such as optical disks, hard disks, USB flash drives, etc.
- this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
- These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device realizes the functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
- the embodiments of the present application provide a data storage method, system and storage medium.
- the master node stores the data to be written sent by the client
- the data to be written is synchronized to multiple slave nodes, thereby realizing the waiting of the distributed storage client.
- Write data and monitor the number of slave nodes that have completed synchronization in real time when multiple slave nodes synchronize data to be written.
- the number of slave nodes meets the submission conditions, it will send a write success response to the client to ensure the submitted message Will not be lost, thereby improving the availability of AMQ.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种数据存储方法、系统及存储介质,该方法包括:通过主节点接收客户端的写入请求,并根据写入请求,获取待写入数据(S101);将待写入数据存储至主节点(S102);将待写入数据同步至多个从节点,并利用主节点记录多个从节点中、已同步待写入数据的从节点数量(S103);当检测到从节点数量满足提交条件时,通过主节点向客户端发送响应写入请求的写入成功响应(S104)。
Description
本申请实施例涉及数据存储领域,尤其涉及一种数据存储方法、系统及存储介质。
活动消息中间件(Active Message Queue,AMQ)原生使用Kaha数据库(DataBase,DB)实现消息存储和索引,KahaDB基于单机存储。对于存储层之上的集群化和水平扩展需求,AMQ依赖网络文件系统(Network File System,NFS)等网络文件系统实现主从数据同步,但整个集群只有一份数据,导致AMQ的可用性低的问题。
发明内容
本申请实施例提供一种数据存储方法、系统及存储介质,能够提高AMQ的可用性。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种数据存储方法,所述方法包括:
通过主节点接收客户端的写入请求,并根据所述写入请求,获取待写入数据;
将所述待写入数据存储至所述主节点;
将所述待写入数据同步至多个从节点,并利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;
当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
本申请实施例提供一种数据存储系统,所述系统包括:
接收单元,用于通过主节点接收客户端的写入请求;
获取单元,用于根据所述写入请求,获取待写入数据;
存储单元,用于将所述待写入数据存储至所述主节点;
同步单元,用于将所述待写入数据同步至多个从节点;
记录单元,用于利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;
发送单元,用于当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
本申请实施例提供一种数据存储系统,所述系统包括:处理器、存储器、发送器、接收器和通信总线,所述处理器执行存储器存储的运行程序时实现如上述任一项所述的方法。
本申请实施例提供一种计算机可读存储介质,其上存储有程序,应用于数据存储系统中,所述程序被处理器执行时实现如上述任一项所述的方法。
本申请实施例提供了一种数据存储方法、系统及存储介质,该方法可以包括:通过主节点接收客户端的写入请求,并根据写入请求,获取待写入数据;将待写入数据存储至主节点;将待写入数据同步至多个从节点,并利用主节点记录多个从节点中、已同步待写入数据的从节点数量;当检测到从节点数量满足提交条件时,通过主节点向客户端发送响应写入请求的写入成功响应。由此可见,在本申请的实施例中,当主节点存储了客户端发送的待写入数据时,将待写入数据同步至多个从节点,实现了分布式存储客户端的待写入数据,同时在多个从节点同步待写入数据时实时监控已完成同步的从节点数量,在从节点数量满足提交条件时即向客户端发送写入成功响应,能够保证已提交的消息不会丢失,进而提高了AMQ的可用性。
图1为本申请实施例提供的一种数据存储方法的流程图一;
图2为本申请实施例提供的一种示例性的RaftDB集群写入客户端的待写入消息的流程示意图;
图3为本申请实施例提供的一种数据存储方法的流程图二;
图4为本申请实施例提供的一种示例性的RaftDB集群进行主节点选举的流程示意图;
图5为本申请实施例提供的一种数据存储方法的流程图三;
图6为本申请实施例提供的一种数据存储系统的结构示意图一;
图7为本申请实施例提供的一种数据存储系统的结构示意图二。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
使用KahaDB实现消息的存储和索引时,主从集群通过NFS实现数据共享,整个集群本质上只有一份数据,无数据副本,不能满足可用性,伪集群;且NFS网络传输也存在性能损耗,进而限制了AMQ的吞吐性能。具体的问题如下:
1、数据不一致:KahaDB数据落盘和索引/redo日志落盘不在一个事物内,导致异常宕机时,数据文件和索引/redo文件数据不一致的问题。
2、页断裂恢复失败:KahaDB缺少数据校验和数据恢复机制,异常宕机造成数据撕裂时,节点不可重启,需清除数据造成数据大面积丢失,进而导致数据恢复失败。
3、可用性低:节点重启需要扫描所有数据文件,数据恢复时间长,业 务不可接受。
为解决上述问题,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在一实施例中,本申请实施例提供了一种数据存储方法,图1本申请实施例提出的一种数据存储方法的实现流程示意图,该方法可以包括:
S101、通过主节点接收客户端的写入请求,并根据写入请求,获取待写入数据。
本申请实施例提供的一种数据存储方法适用于使用RaftDB实现AMQ分布式多副本集的集群扩展的场景下。
本申请实施例中,数据存储系统为RaftDB集群,RaftDB集群中包括多个RaftDB节点,主节点为多个RaftDB节点中的领导节点。
本申请实施例中,RaftDB对外暴露状态机接口,允许RaftDB上层的状态机实现快照,用户在客户端进行登陆等操作时,此时,主节点接收到客户端的写入请求,状态机记录登陆操作中的有状态数据,如用户名、密码等,该有状态数据即为待写入数据,状态机根据写入请求,生成快照和快照对应的日志,主节点根据日志确定待写入数据。
本申请实施例中,快照允许发送和装载,对于数据存储系统中的新节点或者同步落后的节点,通过快照发送和装载可以快速实现数据同步。
S102、将待写入数据存储至主节点。
当数据存储系统通过主节点接收客户端的写入请求,并根据写入请求,获取到待写入数据之后,数据存储系统将待写入数据存储至主节点。
本申请实施例中,主节点在获取到待写入数据之后,为待写入数据分配一个唯一递增ID,该唯一递增ID可以保证从节点进行数据同步的连续性和顺序。
本申请实施例中,待写入数据包括待写入索引文件和待写入数据文件,利用待写入索引文件在主节点中进行寻址,得到寻址结果,之后,将待写 入数据文件顺序写入寻址结果中。
本申请实施例中,待写入索引文件使用内存映射文件,待写入索引文件采用稀疏索引。
S103、将待写入数据同步至多个从节点,并利用主节点记录多个从节点中、已同步待写入数据的从节点数量。
当数据存储系统将待写入数据存储至主节点之后,数据存储系统将待写入数据同步至多个从节点,并利用主节点记录多个从节点中、已同步待写入数据的从节点数量。
本申请实施例中,主节点将待写入数据同步至多个从节点,其中,多个从节点的选取可以基于负载均衡、业务类型等,具体的多个从节点的选取方式可以根据实际情况进行选择,本申请实施例不做具体的限定。之后,当从节点完成待写入数据的同步操作之后,从节点返回最新的写入偏移量,并更新内部状态后响应客户端。
本申请实施例中,在将待写入数据同步至多个从节点的过程中,主节点实时监控多个从节点的同步状态,每当检测到一个从节点同步完成该待写入数据时,主节点将从节点数量加一。
S104、当检测到从节点数量满足提交条件时,通过主节点向客户端发送响应写入请求的写入成功响应。
数据存储系统设置提交条件,数据存储系统在利用主节点记录多个从节点中、已同步待写入数据的从节点数量时,将从节点数据与提交条件进行匹配,检测从节点数据是否满足提交条件,当主节点检测到从节点数量满足提交条件时,数据存储系统通过主节点向客户端发送响应写入请求的写入成功响应。
本申请实施例中,每当从节点数量加一时,主节点就判断依次从节点数据是否满足提交条件,直至检测到从节点数量满足提交条件时,主节点向客户端发送写入成功响应,以响应客户端的写入请求。
本申请实施例中,主节点获取多个从节点对应的第一数量,并将第一数量的一半作为提交条件,主节点将已完成同步的从节点数量与第一数量的一半进行比较,当从节点数量大于第一数量的一半时,主节点判断出从节点数量满足提交条件,此时,主节点向客户端发送响应写入请求的写入成功响应。
示例性的,如图2所示,id为0的RaftDB为主节点,id为1和id为2的RatfDB为从节点,客户端向主节点发送待写入信息,该待写入信息的数据为“A32d322991023421”,将该待写入信息存储至id为0的RaftDB中,此时,id为0的RaftDB的索引增加“7”,对应的任期增加“3”,之后id为0的RaftDB向id为1的RaftDB发送追加条目请求,该追加条目请求中携带有如下待写入信息“任期:“3”,索引:“7”,数据:“A32d322991023421””id为1的RaftDB同步该待写入信息,当同步完成时,id为1的RaftDB的索引增加“7”,对应的任期增加“3”,之后id为1的RaftDB向id为0的RaftDB返回追加条目响应,该追加条目响应中携带如下信息“索引:7”,,id为0的RaftDB在接收到追加条目响应之后,向客户端发送写入成功响应。此时,完成了将索引为7的待写入消息存储至id为0的RaftDB和id为1的RaftDB中的过程。
可以理解的是,当主节点存储了客户端发送的待写入数据时,将待写入数据同步至多个从节点,实现了分布式存储客户端的待写入数据,同时在多个从节点同步待写入数据时实时监控已完成同步的从节点数量,在从节点数量满足提交条件时即向客户端发送写入成功响应,能够保证已提交的消息不会丢失,进而提高了AMQ的可用性。将有状态数据汇集到存储层,能够实现计算和存储分离架构,以及上层应用的无状态水平扩展。通过底层存储的副本集机制,实现单节点系统接入RaftDB存储后,具备分布式多副本集集群扩展的能力.
基于上述实施例,在本申请的实施例中,上述数据存储系统在通过主 节点接收客户端的写入请求之前,即步骤101之前,上述数据存储系统在实现数据存储方法时还进行主节点选举,如图3所示,具体的包括以下步骤:
S201、当获取到主节点选举指令时,分别获取数据存储系统中的多个存储数据库对应的多个任期和多个写入偏移量。
本申请实施例中,当数据存储系统确定出主节点与从节点之间的连接中断时,数据存储系统获取主节点选举指令;或者,当据存储系统确定出主节点故障时,数据存储系统获取主节点选举指令;或者,当创建完成数据存储系统时,数据存储系统获取主节点选举指令,具体的数据存储系统获取主节点选举指令的时机根据实际情况进行选择,本申请实施例不做具体的限定。
本申请实施例中,数据存储系统在预设时间到达时检测到已同步待写入数据的从节点数量未满足提交条件时,在链路层感知到主节点和从节点之间的连接中断,此时,数据存储系统判断出主节点和从节点连接中断。
在实际应用中,当主节点与从节点不在同一个网络分区中时,主节点和从节点之间会出现连接中断的问题,在从节点进行待写入数据的同步过程中可以判断主节点和从节点是否出现连接中断的问题,具体的,数据存储系统在预设时间到达时,判断已同步待写入数据的从节点数量是否满足提交条件,当数据存储系统在预设时间到达时检测到从节点数量未满足提交条件时,数据存储系统判断出主节点和从节点连接中断;此时,数据存储系统触发主节点重选流程,并获取主节点选举指令。
在实际应用中,数据存储系统在预设时间到达时,判断从节点数量是否大于第一数量的一半,当从节点数量未大于第一数量的一半时,表征在预设时间到达时从节点数量未满足提交条件,此时,判断出主节点和从节点连接中断,数据存储系统出发主节点重选流程,并获取从节点选举指令。
本申请实施例中,数据存储系统可以根据主从节点之间的心跳,判断 主节点是否出现故障,并在判断出主节点出现故障时,触发主节点重选流程,并获取主节点选举指令。
本申请实施例中的存储数据库即为RaftDB节点,数据存储系统中的每一个RaftDB均包括term(任期)和写入偏移量,其中,写入偏移量为RaftDB节点在写入新的消息时生成的写入偏移量,数据存储系统可以根据多个term和多个写入偏移量,确定多个RaftDB节点中是否具有最新的日志。
S202、根据多个任期和多个写入偏移量,从多个存储数据库中确定出第一存储数据库。
当数据存储系统分别获取数据存储系统中的多个存储数据库对应的多个任期和多个写入偏移量之后,数据存储系统根据多个任期和多个写入偏移量,从多个存储数据库中确定出第一存储数据库。
本申请实施例中,数据存储系统将多个任期进行比较;当数据存储系统从多个任期中确定出任期数最大的一个任期时,数据存储系统直接将一个任期对应的存储数据库确定为第一存储数据库;当数据存储系统从多个任期中确定出任期最大的至少两个任期时,数据存储系统进一步从至少两个任期对应的至少两个存储数据库中确定出第一存储数据库,具体的,数据存储系统先获取至少两个任期对应的至少两个存储数据库和至少两个存储数据库对应的至少两个写入偏移量;数据存储系统从至少两个写入偏移量中确定出写入偏移量最大的第一写入偏移量,并将第一写入偏移量对应的存储数据库确定为第一存储数据库。
示例性的,如图4所示,数据存储系统中包括id为0的RaftDB,id为1的RaftDB和id为2的RaftDB,首先发起投票请求以从id为0的RaftDB,id为1的RaftDB和id为2的RaftDB中选取主节点,由图2可知,id为0的RaftDB中的任期为3,id为1的RaftDB和id为2的RaftDB中的任期为2,表征id为0的RaftDB中保存有最新的日志,此时,将id为0的RaftDB选为主节点,id为0的RaftDB可以定时对id为1的RaftDB和id为2的 RaftDB进行心跳检测,以监控id为1的RaftDB和id为2的RaftDB的状态。
S203、将第一存储数据库确定为主节点,并将多个存储数据库中的第二存储数据库确定为从节点,第二存储数据库为多个存储数据库中除第一存储数据库外的存储数据库。
当数据存储系统从多个存储数据库中确定出第一存储数据库之后,数据存储系统将第一存储数据库确定为主节点,并将多个存储数据库中除第一存储数据库外的第二存储数据库确定为从节点。
本申请实施例中,主节点负责处理客户端的读写请求,以及定时向从节点发送心跳以监控各个从节点的状态。
可以理解的是,RaftDB实现了主节点选举机制,通过任期和写入偏移量限制该主节点需拥有最新的日志,进而保证提交的消息不会丢失,进而实现了消息的可靠性。
基于上述实施例,在本申请的实施例中,上述数据存储系统在实现数据存储方法时还进行数据恢复,如图5所示,具体的包括以下步骤:
S301、当第一节点进行数据恢复时,查找快照,第一节点为主节点和从节点中进行数据恢复的节点。
本申请实施例中,当数据存储系统中的主节点或从节点中的第一节点宕机后进行数据恢复时,对于页断裂数据,截断后续日志等待同步,启动过程中,数据存储系统查找快照,该快照即为写入待写入数据时生成的快照;
进一步地,RaftDB会扫描待写入数据文件,可以依赖检查点(checkpoint)降低扫描文件数;由于消息存储协议使用固定消息头+消息体实现,依赖消息头中的循环冗余校验(Cyclic Redundancy Check,CRC)校验数据,最终完成对数据文件的校验。
S302、在第一节点上装载快照,并重放快照对应的日志。
当数据存储系统查找到快照时,数据存储系统在第一节点上装载快照,并重放快照对应的日志。
S303、根据日志,在第一节点上重新写入待写入数据。
当数据存储系统在第一节点上装载快照,并重放快照对应的日志之后,数据存储系统根据日志,在第一节点上重新写入待写入数据。
可以理解的是,基于快照、数据校验与恢复机制,解决了原生AMQ的数据丢失时,恢复时间过程的问题,满足了现阶段互联网服务对中间件高性能,高可用性,容灾的需求。
基于上述实施例,在本申请的又一实施例中,图6为本申请实施例提出的数据存储系统的组成结构示意图一,如图6所示,本申请实施例提出的数据存储系统1可以包括:
接收单元10,用于通过主节点接收客户端的写入请求;
获取单元11,用于根据所述写入请求,获取待写入数据;
存储单元12,用于将所述待写入数据存储至所述主节点;
同步单元13,用于将所述待写入数据同步至多个从节点;
记录单元14,用于利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;
发送单元15,用于当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
进一步地,所述数据存储系统1还包括:判断单元;
所述获取单元11,还用于获取所述多个从节点对应的第一数量;
所述判断单元,用于当所述从节点数量大于所述第一数量的一半时,判断出所述从节点数量满足所述提交条件;
所述发送单元15,用于通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
进一步地,所述数据存储系统还包括:确定单元;
所述获取单元11,还用于当获取到主节点选举指令时,分别获取数据存储系统中的多个存储数据库对应的多个任期和多个写入偏移量;
所述确定单元,用于根据所述多个任期和所述多个写入偏移量,从所述多个存储数据库中确定出第一存储数据库;将所述第一存储数据库确定为所述主节点,并将所述多个存储数据库中的第二存储数据库确定为所述从节点,所述第二存储数据库为所述多个存储数据库中除所述第一存储数据库外的存储数据库。
进一步地,所述数据存储系统还包括:比较单元;
所述比较单元,用于将所述多个任期进行比较;
所述确定单元,还用于当从所述多个任期中确定出任期数最大的一个任期时,将所述一个任期对应的存储数据库确定为所述第一存储数据库;
所述获取单元11,还用于当从所述多个任期中确定出任期最大的至少两个任期时,获取所述至少两个任期对应的至少两个存储数据库和所述至少两个存储数据库对应的至少两个写入偏移量;
所述确定单元,还用于从所述至少两个写入偏移量中确定出写入偏移量最大的第一写入偏移量,并将所述第一写入偏移量对应的存储数据库确定为所述第一存储数据库。
进一步地,所述判断单元,还用于当在预设时间到达时检测到所述从节点数量未满足所述提交条件时,判断出所述主节点和所述从节点连接中断;
所述获取单元11,还用于触发主节点重选流程,并获取到所述主节点选举指令。
进一步地,所述数据存储系统还包括:寻址单元和写入单元;
所述寻址单元,用于利用所述待写入索引文件在所述主节点中进行寻址,得到寻址结果;
所述写入单元,用于将所述待写入数据文件顺序写入所述寻址结果中。
进一步地,所述确定单元,还用于根据所述写入请求,生成快照和所述快照对应的日志;根据所述日志确定所述待写入数据。
进一步地,所述数据存储系统还包括:查找单元、装载单元和重放单元;
所述查找单元,用于当第一节点进行数据恢复时,查找所述快照,所述第一节点为所述主节点和所述从节点中进行数据恢复的节点;
所述装载单元,用于在所述第一节点上装载所述快照;
所述重放单元,用于重放所述快照对应的所述日志;
所述写入单元,还用于根据所述日志,在所述第一节点上重新写入所述待写入数据。
图7为本申请实施例提出的数据存储系统的组成结构示意图二,如图7所示,本申请实施例提出的数据存储系统1还可以包括处理器110、存储器111、发送器112、接收器113和通信总线114。
在具体的实施例的过程中,上述获取单元11、同步单元13、记录单元14、判断单元、确定单元、比较单元、寻址单元、写入单元、查找单元、装载单元和重放单元可由位于数据存储系统1上的处理器110实现,上述存储单元12可由位于数据存储系统1上的存储器111实现,上述发送单元15可由位于数据存储系统1上的发送器112实现,上述接收单元10可由位于数据存储系统1上的接收器113实现,在本申请的实施例中,上述处理器110可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(ProgRAMmable Logic Device,PLD)、现场可编程门阵列(Field ProgRAMmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本申请 实施例不作具体限定。存储器111用于存储可执行程序代码,该程序代码包括计算机操作指令,存储器111可能包含高速RAM存储器,也可能还包括非易失性存储器,例如,至少两个磁盘存储器。
在本申请的实施例中,通信总线114用于连接处理器110、存储器111、发送器112、接收器113以及这些器件之间的相互通信。
在本申请的实施例中,存储器111,用于存储指令和数据。
进一步地,在本申请的实施例中,上述接收器113,用于通过主节点接收客户端的写入请求,
上述处理器110,用于并根据所述写入请求,获取待写入数据;将所述待写入数据同步至多个从节点,并利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;
上述存储器111,用于将所述待写入数据存储至所述主节点;
上述发送器112,用于当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
进一步地,上述处理器110,还用于获取所述多个从节点对应的第一数量;当所述从节点数量大于所述第一数量的一半时,判断出所述从节点数量满足所述提交条件;
上述发送器112,还用于通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
进一步地,上述处理器110,还用于当获取到主节点选举指令时,分别获取数据存储系统中的多个存储数据库对应的多个任期和多个写入偏移量;根据所述多个任期和所述多个写入偏移量,从所述多个存储数据库中确定出第一存储数据库;将所述第一存储数据库确定为所述主节点,并将所述多个存储数据库中的第二存储数据库确定为所述从节点,所述第二存储数据库为所述多个存储数据库中除所述第一存储数据库外的存储数据库。
进一步地,上述处理器110,还用于将所述多个任期进行比较;当从所 述多个任期中确定出任期数最大的一个任期时,将所述一个任期对应的存储数据库确定为所述第一存储数据库;当从所述多个任期中确定出任期最大的至少两个任期时,获取所述至少两个任期对应的至少两个存储数据库和所述至少两个存储数据库对应的至少两个写入偏移量;从所述至少两个写入偏移量中确定出写入偏移量最大的第一写入偏移量,并将所述第一写入偏移量对应的存储数据库确定为所述第一存储数据库。
进一步地,上述处理器110,还用于当在预设时间到达时检测到所述从节点数量未满足所述提交条件时,判断出所述主节点和所述从节点连接中断;触发主节点重选流程,并获取到所述主节点选举指令。
进一步地,上述处理器110,还用于利用所述待写入索引文件在所述主节点中进行寻址,得到寻址结果;将所述待写入数据文件顺序写入所述寻址结果中。
进一步地,上述处理器110,还用于根据所述写入请求,生成快照和所述快照对应的日志;根据所述日志确定所述待写入数据。
进一步地,上述处理器110,还用于当第一节点进行数据恢复时,查找所述快照,所述第一节点为所述主节点和所述从节点中进行数据恢复的节点;在所述第一节点上装载所述快照,并重放所述快照对应的所述日志;根据所述日志,在所述第一节点上重新写入所述待写入数据。
在实际应用中,上述存储器111可以是易失性第一存储器(volatile memory),例如随机存取第一存储器(Random-Access Memory,RAM);或者非易失性第一存储器(non-volatile memory),例如只读第一存储器(Read-Only Memory,ROM),快闪第一存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的第一存储器的组合,并向处理器110提供指令和数据。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元 中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例提出的一种数据存储系统,通过主节点接收客户端的写入请求,并根据写入请求,获取待写入数据;将待写入数据存储至主节点;将待写入数据同步至多个从节点,并利用主节点记录多个从节点中、已同步待写入数据的从节点数量;当检测到从节点数量满足提交条件时,通过主节点向客户端发送响应写入请求的写入成功响应。由此可见,在本申请的实施例中,主节点存储了客户端发送的待写入数据时,将待写入数据同步至多个从节点,实现了分布式存储客户端的待写入数据,同时在多个从节点同步待写入数据时实时监控已完成同步的从节点数量,在从节点数量满足提交条件时即向客户端发送写入成功响应,能够保证已提交的消息不会丢失,进而提高了AMQ的可用性。
本申请实施例提供计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如上所述的数据存储方法。
具体来讲,本实施例中的一种数据存储方法对应的程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种数据存储方法对应的程序指令被一电子设备读取或被执行时,实现如上述任一项所述 的数据存储方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。
本申请实施例提供了一种数据存储方法、系统及存储介质,当主节点存储了客户端发送的待写入数据时,将待写入数据同步至多个从节点,实现了分布式存储客户端的待写入数据,同时在多个从节点同步待写入数据时实时监控已完成同步的从节点数量,在从节点数量满足提交条件时即向客户端发送写入成功响应,能够保证已提交的消息不会丢失,进而提高了AMQ的可用性。
Claims (11)
- 一种数据存储方法,所述方法包括:通过主节点接收客户端的写入请求,并根据所述写入请求,获取待写入数据;将所述待写入数据存储至所述主节点;将所述待写入数据同步至多个从节点,并利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
- 根据权利要求1所述的方法,其中,所述当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应,包括:获取所述多个从节点对应的第一数量;当所述从节点数量大于所述第一数量的一半时,判断出所述从节点数量满足所述提交条件;通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
- 根据权利要求1所述的方法,其中,所述方法还包括:当获取到主节点选举指令时,分别获取数据存储系统中的多个存储数据库对应的多个任期和多个写入偏移量;根据所述多个任期和所述多个写入偏移量,从所述多个存储数据库中确定出第一存储数据库;将所述第一存储数据库确定为所述主节点,并将所述多个存储数据库中的第二存储数据库确定为所述从节点,所述第二存储数据库为所述多个存储数据库中除所述第一存储数据库外的存储数据库。
- 根据权利要求3所述的方法,其中,所述根据所述多个任期和所述 多个写入偏移量,从所述多个存储数据库中确定出第一存储数据库,包括:将所述多个任期进行比较;当从所述多个任期中确定出任期数最大的一个任期时,将所述一个任期对应的存储数据库确定为所述第一存储数据库;当从所述多个任期中确定出任期最大的至少两个任期时,获取所述至少两个任期对应的至少两个存储数据库和所述至少两个存储数据库对应的至少两个写入偏移量;从所述至少两个写入偏移量中确定出写入偏移量最大的第一写入偏移量,并将所述第一写入偏移量对应的存储数据库确定为所述第一存储数据库。
- 根据权利要求3所述的方法,其中,所述利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量之后,所述方法还包括:当在预设时间到达时检测到所述从节点数量未满足所述提交条件时,判断出所述主节点和所述从节点连接中断;触发主节点重选流程,并获取到所述主节点选举指令。
- 根据权利要求1所述的方法,其中,所述待写入数据包括待写入索引文件和待写入数据文件,所述将所述待写入数据存储至所述主节点,包括:利用所述待写入索引文件在所述主节点中进行寻址,得到寻址结果;将所述待写入数据文件顺序写入所述寻址结果中。
- 根据权利要求1所述的方法,其中,所述根据所述写入请求,获取待写入数据,包括:根据所述写入请求,生成快照和所述快照对应的日志;根据所述日志确定所述待写入数据。
- 根据权利要求7所述的方法,其中,所述根据所述日志确定所述待 写入数据之后,所述方法还包括:当第一节点进行数据恢复时,查找所述快照,所述第一节点为所述主节点和所述从节点中进行数据恢复的节点;在所述第一节点上装载所述快照,并重放所述快照对应的所述日志;根据所述日志,在所述第一节点上重新写入所述待写入数据。
- 一种数据存储系统,所述系统包括:接收单元,用于通过主节点接收客户端的写入请求;获取单元,用于根据所述写入请求,获取待写入数据;存储单元,用于将所述待写入数据存储至所述主节点;同步单元,用于将所述待写入数据同步至多个从节点;记录单元,用于利用所述主节点记录所述多个从节点中、已同步所述待写入数据的从节点数量;发送单元,用于当检测到所述从节点数量满足提交条件时,通过所述主节点向所述客户端发送响应所述写入请求的写入成功响应。
- 一种数据存储系统,所述系统包括:处理器、存储器、发送器、接收器和通信总线,所述处理器执行存储器存储的运行程序时实现如权利要求1-8任一项所述的方法。
- 一种计算机可读存储介质,其上存储有程序,应用于数据存储系统中,所述程序被处理器执行时实现如权利要求1-8任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/090138 WO2021226905A1 (zh) | 2020-05-14 | 2020-05-14 | 一种数据存储方法、系统及存储介质 |
CN202080100499.7A CN115486052B (zh) | 2020-05-14 | 一种数据存储方法、系统及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/090138 WO2021226905A1 (zh) | 2020-05-14 | 2020-05-14 | 一种数据存储方法、系统及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021226905A1 true WO2021226905A1 (zh) | 2021-11-18 |
Family
ID=78526185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/090138 WO2021226905A1 (zh) | 2020-05-14 | 2020-05-14 | 一种数据存储方法、系统及存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021226905A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201117A (zh) * | 2021-12-22 | 2022-03-18 | 苏州浪潮智能科技有限公司 | 缓存数据的处理方法、装置、计算机设备及存储介质 |
CN114490540A (zh) * | 2022-02-14 | 2022-05-13 | 杭州朗和科技有限公司 | 数据存储方法、介质、装置和计算设备 |
CN115297144A (zh) * | 2022-07-29 | 2022-11-04 | 中汽创智科技有限公司 | 节点数据交互方法和装置 |
CN115599747A (zh) * | 2022-04-22 | 2023-01-13 | 北京志凌海纳科技有限公司(Cn) | 一种分布式存储系统的元数据同步方法、系统及设备 |
CN116107814A (zh) * | 2023-04-04 | 2023-05-12 | 阿里云计算有限公司 | 数据库容灾方法、设备、系统及存储介质 |
CN117666970A (zh) * | 2024-01-30 | 2024-03-08 | 杭州阿里云飞天信息技术有限公司 | 数据存储方法以及数据存储装置 |
CN118368296A (zh) * | 2024-06-20 | 2024-07-19 | 杭州宇信数字科技有限公司 | 一种跨数据中心的应用数据实时同步方法、装置和系统 |
WO2024199464A1 (zh) * | 2023-03-31 | 2024-10-03 | 华为云计算技术有限公司 | 数据库系统及其数据管理方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070174541A1 (en) * | 1998-02-13 | 2007-07-26 | Sashikanth Chandrasekaran | Methods to perform disk writes in a distributed shared disk system needing consistency across failures |
CN101741911A (zh) * | 2009-12-18 | 2010-06-16 | 中兴通讯股份有限公司 | 基于多副本协同的写操作方法、系统及节点 |
CN105278877A (zh) * | 2015-09-30 | 2016-01-27 | 成都华为技术有限公司 | 一种对象存储方法和装置 |
CN111124301A (zh) * | 2019-12-18 | 2020-05-08 | 深圳供电局有限公司 | 一种对象存储设备的数据一致性存储方法及系统 |
-
2020
- 2020-05-14 WO PCT/CN2020/090138 patent/WO2021226905A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070174541A1 (en) * | 1998-02-13 | 2007-07-26 | Sashikanth Chandrasekaran | Methods to perform disk writes in a distributed shared disk system needing consistency across failures |
CN101741911A (zh) * | 2009-12-18 | 2010-06-16 | 中兴通讯股份有限公司 | 基于多副本协同的写操作方法、系统及节点 |
CN105278877A (zh) * | 2015-09-30 | 2016-01-27 | 成都华为技术有限公司 | 一种对象存储方法和装置 |
CN111124301A (zh) * | 2019-12-18 | 2020-05-08 | 深圳供电局有限公司 | 一种对象存储设备的数据一致性存储方法及系统 |
Non-Patent Citations (1)
Title |
---|
HUANG, XIAO-WEI ET AL.: "Data Synchronous Method Based on Snapshot Log Analysis", JOURNAL OF LOGISTICAL ENGINEERING UNIVERSITY, no. 2, 31 December 2006 (2006-12-31), pages 59 - 62,67, XP055867569, ISSN: 1672--784 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201117B (zh) * | 2021-12-22 | 2023-09-01 | 苏州浪潮智能科技有限公司 | 缓存数据的处理方法、装置、计算机设备及存储介质 |
CN114201117A (zh) * | 2021-12-22 | 2022-03-18 | 苏州浪潮智能科技有限公司 | 缓存数据的处理方法、装置、计算机设备及存储介质 |
CN114490540A (zh) * | 2022-02-14 | 2022-05-13 | 杭州朗和科技有限公司 | 数据存储方法、介质、装置和计算设备 |
CN114490540B (zh) * | 2022-02-14 | 2024-05-28 | 杭州网易数之帆科技有限公司 | 数据存储方法、介质、装置和计算设备 |
CN115599747A (zh) * | 2022-04-22 | 2023-01-13 | 北京志凌海纳科技有限公司(Cn) | 一种分布式存储系统的元数据同步方法、系统及设备 |
CN115599747B (zh) * | 2022-04-22 | 2023-06-06 | 北京志凌海纳科技有限公司 | 一种分布式存储系统的元数据同步方法、系统及设备 |
CN115297144B (zh) * | 2022-07-29 | 2024-03-08 | 中汽创智科技有限公司 | 节点数据交互方法和装置 |
CN115297144A (zh) * | 2022-07-29 | 2022-11-04 | 中汽创智科技有限公司 | 节点数据交互方法和装置 |
WO2024199464A1 (zh) * | 2023-03-31 | 2024-10-03 | 华为云计算技术有限公司 | 数据库系统及其数据管理方法 |
CN116107814A (zh) * | 2023-04-04 | 2023-05-12 | 阿里云计算有限公司 | 数据库容灾方法、设备、系统及存储介质 |
CN116107814B (zh) * | 2023-04-04 | 2023-09-22 | 阿里云计算有限公司 | 数据库容灾方法、设备、系统及存储介质 |
CN117666970A (zh) * | 2024-01-30 | 2024-03-08 | 杭州阿里云飞天信息技术有限公司 | 数据存储方法以及数据存储装置 |
CN117666970B (zh) * | 2024-01-30 | 2024-05-07 | 杭州阿里云飞天信息技术有限公司 | 数据存储方法以及数据存储装置 |
CN118368296A (zh) * | 2024-06-20 | 2024-07-19 | 杭州宇信数字科技有限公司 | 一种跨数据中心的应用数据实时同步方法、装置和系统 |
Also Published As
Publication number | Publication date |
---|---|
CN115486052A (zh) | 2022-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021226905A1 (zh) | 一种数据存储方法、系统及存储介质 | |
US11481289B2 (en) | Method and apparatus for reading and writing committed data | |
US10496669B2 (en) | System and method for augmenting consensus election in a distributed database | |
US7685459B1 (en) | Parallel backup | |
US7657796B1 (en) | System and method for distributed storage verification | |
US9582382B1 (en) | Snapshot hardening | |
WO2017177941A1 (zh) | 主备数据库切换方法和装置 | |
CN106776130B (zh) | 一种日志恢复方法、存储装置和存储节点 | |
EP2434729A2 (en) | Method for providing access to data items from a distributed storage system | |
US20150113317A1 (en) | Method for a source storage device sending data to a backup storage device for storage, and storage device | |
US20150213100A1 (en) | Data synchronization method and system | |
CN110543386B (zh) | 一种数据存储方法、装置、设备和存储介质 | |
CN106339278A (zh) | 一种网络文件系统的数据备份及恢复方法 | |
KR20150137884A (ko) | 부분동기화 지원 데이터베이스 관리 시스템 및 데이터베이스 관리 시스템에서 부분동기화 방법 | |
US10452680B1 (en) | Catch-up replication with log peer | |
JP5292351B2 (ja) | メッセージキュー管理システム及びロックサーバ及びメッセージキュー管理方法及びメッセージキュー管理プログラム | |
US10282256B1 (en) | System and method to enable deduplication engine to sustain operational continuity | |
US7809898B1 (en) | Detecting and repairing inconsistencies in storage mirrors | |
CN106452836B (zh) | 主节点设置方法及装置 | |
US11748215B2 (en) | Log management method, server, and database system | |
EP3896571B1 (en) | Data backup method, apparatus and system | |
CN113326251B (zh) | 数据管理方法、系统、设备和存储介质 | |
CN107402841B (zh) | 大规模分布式文件系统数据修复方法及设备 | |
US11693844B2 (en) | Processing delete requests based on change feed of updates | |
CN112711382B (zh) | 基于分布式系统的数据存储方法、装置和存储节点 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20936030 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/04/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20936030 Country of ref document: EP Kind code of ref document: A1 |