CN113885809A - Data management system and method - Google Patents

Data management system and method Download PDF

Info

Publication number
CN113885809A
CN113885809A CN202111479575.2A CN202111479575A CN113885809A CN 113885809 A CN113885809 A CN 113885809A CN 202111479575 A CN202111479575 A CN 202111479575A CN 113885809 A CN113885809 A CN 113885809A
Authority
CN
China
Prior art keywords
data
data block
log
storage device
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111479575.2A
Other languages
Chinese (zh)
Other versions
CN113885809B (en
Inventor
张洋
黄岩
任崇彬
罗强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunhe Enmo Beijing Information Technology Co ltd
Original Assignee
Yunhe Enmo Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunhe Enmo Beijing Information Technology Co ltd filed Critical Yunhe Enmo Beijing Information Technology Co ltd
Priority to CN202111479575.2A priority Critical patent/CN113885809B/en
Publication of CN113885809A publication Critical patent/CN113885809A/en
Application granted granted Critical
Publication of CN113885809B publication Critical patent/CN113885809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The invention discloses a data management system and a method. The system is arranged on a local storage device and used for managing the consistency protocol data of a target system, wherein the data management system comprises: the log storage module and the data block searching module are used for searching data blocks; the log storage module is used for storing a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, wherein the consistency protocol log is used for recording a log of a target system for consistency operation; the data block searching module is used for searching the data block stored in the local storage device, and the data block searching module is in data connection with the input port of the local storage device, wherein the data block is a data block for performing consistency operation on a target system. The invention solves the technical problem that in the related technology, data and logs can be stored locally in a distributed system and are repeated with the logs of a local file system, so that the local resource space is wasted.

Description

Data management system and method
Technical Field
The invention relates to the field of data processing, in particular to a data management system and a data management method.
Background
Enterprise storage is very important for an enterprise, and if important data is damaged, the production of the enterprise is greatly influenced, so that the data must be backed up to ensure the absolute safety of the data. Enterprise storage is far more data-storage than personal storage, and because storage is large, enterprise storage is more administratively difficult to maintain than personal storage.
Enterprise storage is divided into centralized storage and distributed storage, the centralized storage is used for intensively storing physical media, the requirement on the machine room environment is high, a special server and storage equipment are needed, and the equipment price and the maintenance cost are high; the distributed storage can adopt low-end small-capacity storage equipment for distributed storage, physical media can be distributed to different places, the equipment price and the maintenance cost are low, and the distributed storage can also utilize a multi-state storage server to share the storage load, so that the expansion is convenient.
With the popularity of electronic devices in today's society, the amount of storage around the globe has seen explosive growth, and centralized storage, where all business units are deployed collectively on one or several mainframes, has become increasingly inadequate for today's computer systems. The centralized storage has poor expansibility, when the data volume or the service volume is increased, the data volume or the service volume can only be upgraded upwards to a higher configuration, and under a centralized architecture, a single server has high manufacturing cost and high cost; the centralized storage has a single-point problem for a long time, once one large host fails, the whole system is in an unavailable state, and the pressure for recovering data is high; compared with centralized storage, the distributed storage has better expansibility and can adapt to the increase of data volume quickly, the distributed storage technology can share the pressure of data recovery to a plurality of nodes, the data recovery speed is higher than that of centralized storage, and the status of distributed storage is more and more important in the future.
The reliability of the data is a first requirement of the storage system, in order to provide the data reliability, the distributed storage system copies the data into a plurality of copies, and stores the copies onto different hard disks, so that even if one hard disk is damaged, the data cannot be lost if the copies of the data are stored on other hard disks; after the hard disk is damaged, the storage system generally senses and completes lost copies in time, and multiple copies bring data reliability, but also bring the problem of data consistency, namely, the data between different copies on different hard disks are ensured to be completely the same, so that after the copies are used for recovery, the data is completely the same as before; if the data synchronization between the copies is not timely, the situation that the data written in the front cannot be read or the data read in the front cannot be read later can occur, the data consistency is divided into strong consistency and weak consistency, the strong consistency means that the copying is synchronous, and the weak consistency means that the copying is asynchronous; when a user writes data into the distributed storage system, the same data can be obtained when the data is written successfully and returned and immediately read, and the consistency is strong; when the data is successfully written and returned, the data is delayed when being read, and is inconsistent with the written data, namely weak consistency exists; the strong consistency algorithm is generally a distributed consistency protocol such as a Paxos protocol and a Raft protocol, the Raft protocol is easy to implement compared with the Paxos protocol, the Raft protocol is adopted by a plurality of distributed storage systems, and the problem of data consistency can be solved through the Raft protocol.
The snapshot is an image of data of the storage system at a certain time, and the snapshot mainly has the function of online data backup and recovery. When the application of the storage device fails or the file is damaged, the data can be quickly recovered, and the data can be recovered to the state of a certain available time point. The snapshot has another function of providing another data access channel for the storage user, so that when the original data is subjected to online application processing, the user can access the snapshot data and can also utilize the snapshot to perform work such as testing.
There are two main techniques for implementing snapshots: copy-on-write (COW) and redirect-on-write (ROW). Wherein ROW is the primary direction in which snapshots are currently implemented for distributed storage; the ROW, when creating a snapshot, will assign a volume as the snapshot volume relative to the source volume, after creating the snapshot, all subsequent write I/O is performed on the snapshot volume, while read input/output I/O may be from the source volume or the snapshot volume, depending on whether the blocks have changed after the snapshot is created, and the point-in-time image of the snapshot data is the source volume itself, because the source volume is read-only at all times after creating the snapshot. After a plurality of snapshots are made, a snapshot chain is generated, the disk volume is always mounted at the tail end of the snapshot chain, namely all write operations fall into the tail-end snapshot volume, and reading is required to be performed in sequence during reading, so that the greatest problem of ROW in a traditional storage scene is that the influence on the reading performance is large.
Distributed storage using a native Raft interface results in unnecessary waste of space. The native Raft interface stores the Raft data and the log on the local storage, the local file system has the log, two logs appear, the data appears repeatedly, the disk space is wasted, the local file system stores the log, the load of the local file system is increased, and the performance expense is increased.
The local storage of most distributed storage systems does not fully exploit the capabilities of high performance hardware. The development of hardware is very rapid nowadays, and various advanced storage devices are layered endlessly, but it is rare to optimize the performance of these advanced high performance hardware in the local storage management of distributed storage systems, such as the current Ceph distributed storage system, which is originally designed for the mechanical hard disk HDD design, and the situations of all-solid-state disk SSD (PCIE interface solid-state disk, PCIE or Peripheral component interconnect express, a high-speed serial computer expansion bus standard) and NVRAM (Non-Volatile Random Access Memory) are not fully considered, so that the physical performance of these hardware is not fully exerted in the distributed file system Ceph, especially in the case of latency and IOPS (number of read/write Operations per Second), resulting in a waste of resources.
Through the two examples, it can be known that a lot of optimized spaces exist in a local storage management system of the existing distributed storage system, and the first is that the use of a native Raft interface causes unnecessary waste of spaces, increases the burden of a local file system, and increases performance overhead; secondly, the local storage does not give play to the best performance of the current hardware, and the quality of the details can greatly influence the experience of the user; thirdly, the traditional ROW snapshot expresses snapshot information in a mode of linking data blocks into a directed acyclic graph data structure by using pointers, and has the defects that the newly written data needs a large amount of updated metadata, the written physical data amount is multiple times of the written data amount, the newly written data is not suitable for a Solid State Disk (SSD), and the storage and management of KV (Key-value) data are difficult to use. And if a large number of snapshots are made, the performance is very affected when reading.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a data management system and a data management method, which are used for at least solving the technical problem that data and logs can be locally stored in a distributed system and are repeated with the logs of a local file system in the related art, so that the local resource space is wasted.
According to an aspect of the embodiments of the present invention, there is provided a data management system, disposed in a local storage device, for managing coherence protocol data of a target system, where the data management system includes: the log storage module and the data block searching module are used for searching data blocks; the log storage module is used for storing a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, wherein the consistency protocol log is used for recording a log of a consistency operation performed by the target system; the data block searching module is used for searching a data block stored in the local storage device, and the data block searching module is in data connection with an input port of the local storage device, wherein the data block is a data block for performing consistency operation on the target system.
Optionally, the data block searching module includes: a data block map unit, a data block index unit; the data block map unit is used for storing root information of a plurality of data block index units, the data block map unit is in data connection with the input port of the local storage device, and the root information is used for searching the positions of the data block index units; the data block indexing unit is used for storing index information of data blocks, and the data block indexing unit is in data connection with the input port of the local storage device, wherein the index information is used for searching storage positions of corresponding data blocks, and the data blocks are stored in a placement group of the local storage device.
Optionally, the root directory stores placement group metadata, the root directory is in data connection with the input port of the local storage device, the root directory is also in data connection with the log storage module and the data block map unit and the data block index unit of the data block search module, where the placement group metadata is used to identify the data blocks of the consistency protocol data belonging to the corresponding placement group; the root directory also stores metadata of the local storage device, and the placement group metadata and the metadata of the local storage device both comprise a plurality of versions of metadata; the data block map unit includes multiple versions of data block map data, where the versions are versions of a coherency operation.
Optionally, the verification module is a polling module, and the verification module is configured to verify an input/output request, where the input/output request includes a write input/output request and a read input/output request; and the inspection module is used for regularly checking the data blocks and recording an inspection log.
Optionally, the log access interface is respectively in data connection with the log storage module and the input port of the local storage device, and the log access interface is a dedicated interface, and the dedicated interface is directly called by the target system, acquires the consistency protocol log from the log storage module, and performs read-write operation on the consistency protocol log.
According to an aspect of an embodiment of the present invention, there is provided a data management method, including: a log storage module for storing the consistency protocol log generated by the consistency operation of the target system on the local storage device through a special interface; storing consistency protocol data of the target system for consistency operation in a local storage device in a form of a placement group by taking a data block as a unit, and recording the storage position of the data block through a data block searching module; and under the condition of receiving the input and output requests of the data blocks, calling the data block searching module to search the data blocks.
Optionally, the data block searching module includes a data block map unit and a data block indexing unit, and in the case of receiving the input and output requests of the data block, invoking the data block searching module to search for the data block includes: receiving an input and output request of a data block, wherein the input and output request comprises binary information of the data block, the binary information comprises a first identification of a volume to which the data block belongs and a second identification of the data block in the volume; calculating the hash value of the binary group, calling the data block map unit according to the hash value, and determining a data block index unit needing to be called; calling the data block index unit, and determining the position of a data block object according to triple information, wherein the triple information comprises the first identifier, the second identifier and a third identifier of a consistency operation version, and the data block object is a storage unit of the data block in the local storage device; and searching the data block object for the data block.
Optionally, the consistency protocol log is stored in a first storage device with poor read-write performance in the local storage device, where the local storage device includes a plurality of storage devices with different read-write performance; storing the data block of the consistency protocol data in a second storage device with better read-write performance in the local storage device; and/or storing a plurality of data blocks of different consistency operation versions of unified data blocks of the consistency protocol data in similar positions in the second storage device.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the data management method described in any one of the above.
According to another aspect of the embodiments of the present invention, there is also provided a computer storage medium, where the computer storage medium includes a stored program, and when the program runs, the apparatus where the computer storage medium is located is controlled to execute any one of the above data management methods.
In an embodiment of the present invention, a data management system for managing coherence protocol data of a target system by being installed in a local storage device includes: the log storage module stores a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, wherein the consistency protocol log is used for recording a log of a target system for consistency operation; the data block searching module searches data blocks stored in the local storage device, and is in data connection with the input port of the local storage device, wherein the data blocks are data blocks for performing consistency operation on a target system, and the purpose of reducing the waste of the disk space managed by the distributed system is achieved by providing a special consistency protocol log interface and a corresponding data searching method, so that the technical effects of improving the storage space utilization rate of the local storage device and maximizing the hardware performance development are achieved, and the technical problem that the data and the logs can be locally stored in the distributed system and are repeated with the logs of a local file system in the related art, so that the local resource space is wasted is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of a data management system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of data management according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an overall design layout of a localstore according to an embodiment of the invention;
FIG. 4 is a schematic diagram of data block overwriting and checking according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a modification of the Raft protocol interface according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of module partitioning according to an embodiment of the present invention;
FIG. 7 is a diagram of viewing certain snapshot data of CHUNK, according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of logical address adjacency between snapshots according to an embodiment of the invention;
FIG. 9 is a schematic diagram of snapshot deletion according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating snapshot read and write and rollback according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the present embodiment, a data management system is provided, and fig. 1 is a schematic diagram of a data management system according to an embodiment of the present invention, as shown in fig. 1, the data management system is disposed in a local storage device and is configured to manage coherence protocol data of a target system, where the data management system includes: the log storage module 11 and the data block searching module 12;
the log storage module 11 is configured to store a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, where the consistency protocol log is used to record a log of a target system performing a consistency operation; the data block searching module 12 is configured to search a data block stored in the local storage device, and the data block searching module is in data connection with an input port of the local storage device, where the data block is a data block for performing a consistency operation on a target system.
The data management system for managing the consistency protocol data of the target system by being arranged on the local storage device comprises: the log storage module stores a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, wherein the consistency protocol log is used for recording a log of a target system for consistency operation; the data block searching module searches data blocks stored in the local storage device, and is in data connection with the input port of the local storage device, wherein the data blocks are data blocks for performing consistency operation on a target system, and the purpose of reducing the waste of the disk space managed by the distributed system is achieved by providing a special consistency protocol log interface and a corresponding data searching method, so that the technical effects of improving the storage space utilization rate of the local storage device and maximizing the hardware performance development are achieved, and the technical problem that the data and the logs can be locally stored in the distributed system and are repeated with the logs of a local file system in the related art, so that the local resource space is wasted is solved.
The consistency operation can ensure the consistency of a plurality of copies of data stored in multiple nodes in the distributed storage system, and the multiple copies of data can include user data, control data and metadata generated by functions such as snapshot, clone and compression. The consistency operation may store all current system states in a snapshot and process a snapshot operation, and the consistency operation may also be a data backup operation such as a mirroring operation and a copy operation.
The snapshot can be regarded as storage backup of the data state of the system in the operation process in different time periods, and the metadata, the data block map unit and the data block index unit can have different versions through the consistency operation.
The log storage module is configured to store a consistency protocol log, where the consistency protocol log is generated by performing a consistency operation on a system, and the consistency protocol log may be considered as a record of a process of all operations such as reading, writing, searching, and accessing in a consistency operation process, for example, when the consistency operation is a snapshot operation, the consistency protocol log may include a snapshot log. The above-described consistency protocol log may be stored on a local storage device through a dedicated interface. However, in the related art, when the coherency protocol log is synchronized to the local storage device through the coherency protocol, the local storage device itself may also generate a locally stored log, and the locally stored log actually has a larger content of repetition than the coherency protocol log, which results in a waste of storage resources of the local storage device. Therefore, the data management system in this embodiment only uses a special log storage module for storing the consistency protocol log, and controls the local storage device to not generate a corresponding local storage log any more. The purpose of reducing the waste of the disk space managed by the distributed system is achieved, and therefore the technical effects of improving the storage space utilization rate of the distributed system and maximally developing the hardware performance are achieved.
The data block searching module may be connected to an input port of the local storage device, may receive a search request for a data block through the input port of the local storage device, and searches the data block in the local storage device according to the request information for searching the data block.
Through the consistency operation, the log storage module and the data search module are applied to the local storage device of the distributed system, the log repetition of the local storage and the local file system is avoided, the purpose of exerting the performance of the system hardware to the maximum extent is achieved, and the technical effect of improving the utilization rate of the storage space of the distributed system is achieved.
Optionally, the data block searching module includes: a data block map unit, a data block index unit; the data block map unit is used for storing root information of the data block index units, the data block map unit is in data connection with an input port of the local storage device, and the root information is used for searching the positions of the data block index units; the data block indexing unit is used for storing index information of the data blocks, the data block indexing unit is in data connection with the input port of the local storage device, the index information is used for searching storage positions of the corresponding data blocks, and the data blocks are stored in the placement group of the local storage device.
The data block searching module comprises a data block map unit and a data block index unit, wherein the data block map unit is used for storing root information of the data block index unit, the root information can comprise a logical address of the data block index unit and a corresponding relation between the logical address of the data block index unit and request information in a data block searching request, in one embodiment, the data block map unit stores a binary group comprising a volume identifier and a block identifier in the data block searching request, and a hash value can be calculated according to the binary group of the searching request; the hash value can be used as an index to search in the data block map unit, and an address interval where the binary group is located is searched to determine a data block index unit corresponding to the address interval. In the data block index unit, through a volume identifier, a block identifier and a snapshot identifier, a corresponding target key-value pair is searched in a key-value pair record corresponding to an address interval, and a corresponding target data block is determined through the target key-value pair, wherein the snapshot identifier is used for identifying different snapshot volumes, the data blocks are stored in a local storage device in a placement group form, and one placement group PG may include a plurality of data blocks.
The data block searching module consisting of the data block map unit and the data block index unit is established, so that the purpose of searching the target position of the data block according to the request is achieved, and the technical effects of improving the data searching efficiency and the data input and output speed are achieved.
Optionally, the root directory stores the placement group metadata, the root directory is in data connection with the input port of the local storage device, the root directory is also in data connection with the log storage module and the data block map unit and the data block index unit of the data block searching module, wherein the placement group metadata is used for identifying the data blocks of the consistency protocol data belonging to the corresponding placement group; the root directory also stores metadata of the local storage device, and the metadata of the placement group metadata and the metadata of the local storage device both comprise a plurality of versions of metadata; the data block map unit includes multiple versions of data block map data, where a version is a version of a coherency operation.
The root directory may store placement group metadata, the placement group may be a mapping relationship to a storage location or a storage manner of a data block in a data storage process, the placement group metadata is used to identify a data block of the consistency protocol data belonging to the corresponding placement group, the placement group metadata includes multiple versions, the root directory further stores metadata of the local storage device, the metadata in the local storage device may also include multiple versions, and the data block map unit may also include multiple versions of data block map data. The version of the metadata corresponds to the version of the consistency operation, for example, the consistency operation is a snapshot operation, multiple snapshots correspond to different snapshot versions, snapshot metadata is generated every snapshot, and the version of the metadata is also a snapshot version.
The metadata of the local storage device is metadata for storing a data block by the storage device, and the metadata may be used to indicate data generated in a process of storing a version of the consistency protocol data by the local storage device, a data block including the consistency protocol data stored on the local storage device, and related attribute information of the data block on the local storage device, including information such as a size of the data block, an identifier of the data block, and an index of the data block. The version of the metadata of its local storage device may actually correspond to the version of the coherency operation described above. The metadata of the group metadata and the metadata of the local storage device are placed, different versions can contain verification of metadata of the versions, and correctness of the metadata can be judged in a verification mode.
The root directory can be connected with an input port of the local storage device, can also be in data connection with a data block map unit and a data block index unit of the log storage module and the data search module, and can search or input and output data through the connected channels.
Specifically, the metadata SSD meta and the placement group metadata PG meta of the local storage device in the root directory superbob may include multiple copies, each of the headers has a version number, each of the metadata has a checksum, when reading out, the checksum of the header is calculated first, one of the multiple copies is selected as the correct checksum, and the latest version number is read into the memory; when writing metadata, only one version is written to cover the oldest version, when reading data, a plurality of copies of metadata are read simultaneously, and the error check and the error version are discarded, so that the read data can be ensured to have no error metadata.
The connection of the root directory, the log storage module and the data searching module achieves the purpose of data access and achieves the technical effect of improving the data processing efficiency of the system. The aim of metadata preparation backup is achieved by storing different versions of metadata with check in the root directory, and the technical effect of improving the reliability of data is achieved.
Optionally, the system further includes: the verification module is used for verifying the input and output requests, and the input and output requests comprise write input and output requests and read input and output requests; and the inspection module is used for regularly checking the data blocks and recording an inspection log.
The check module may use end-to-end check, may check Input/Output data, and the check mode may select a cyclic redundancy check CRC or parity check mode, specifically, when a write requests each write IO (Input/Output) request, add a check field to each data block, add a check from the beginning (or at a block device driver layer of an operating system) initially received from zStorage (cloud storage), and write the check field into a hard disk together with data; for each read IO request, verifying each data block before submitting the data block to the client, reporting an error if the verification is not correct, and returning the error by the read IO.
The polling module may read the data blocks and their checksums from the disk in a periodic or random manner and may verify the data blocks and their checksums. If the check is not passed, the redundant data block is used for recovering the data block, meanwhile, the error is logged, and if the number of the errors reaches a certain threshold value, the system can send an alarm.
The data is periodically checked through the checking module and the inspection module, the accuracy of the read-write process data of the input and output of the system is guaranteed, the purpose of checking the data is achieved, and the technical effect of improving the reliability of the data is achieved.
Optionally, the log access interface is in data connection with the log storage module and the input port of the local storage device, and is a dedicated interface, and the dedicated interface is used to be directly called by the target system, acquire the consistency protocol log from the log storage module, and perform read-write operation on the consistency protocol log.
The log access interface is respectively in data connection with the log storage module and an input port of a local storage device, and the local storage device can be a hard disk or a system memory or other devices with a storage function. The log access interface may be a dedicated interface. The log access interface may be invoked by the target system and may support storing a coherence protocol state, a coherence protocol log, and a coherence protocol state machine (Chunk Pool).
By establishing the special log access interface and directly communicating with the log storage module, when the access log storage module performs read-write operation on the consistent protocol log, the log access interface can be used for realizing fast and high-efficiency implementation, the purpose of directly interacting the data of the log storage module and the data of the local storage device is achieved, and the technical effect of improving the log access efficiency is achieved.
In accordance with an embodiment of the present invention, there is provided a method embodiment of a data management method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.
Fig. 2 is a flowchart of a data management method according to an embodiment of the present invention, as shown in fig. 2, the method includes the following steps:
step S202, a log storage module for storing a consistency protocol log generated by the consistency operation of the target system on the local storage device through a special interface;
step S204, storing the consistency protocol data of the consistency operation of the target system in a local storage device in a form of a placement group by taking the data block as a unit, and recording the storage position of the data block through a data block searching module;
and step S206, under the condition of receiving the input and output requests of the data blocks, calling a data block searching module to search the data blocks.
Through the steps, a consistency protocol log generated by the consistency operation of the target system is stored in a log storage module on the local storage device through a special interface; storing consistency protocol data of a target system for consistency operation in a local storage device in a form of a placement group by taking a data block as a unit, and recording the storage position of the data block through a data block searching module; under the condition of receiving an input/output request of a data block, a data block searching module is called to search the data block by providing a special consistency protocol log interface and a corresponding data searching method, so that the aim of reducing the waste of the disk space managed by the distributed system is fulfilled, the technical effects of improving the storage space utilization rate of local storage equipment and maximally developing the hardware performance are achieved, and the technical problem that the local resource space is wasted due to the fact that the data and the log can be locally stored in the distributed system and are repeated with the log of a local file system in the related technology is solved.
The consistency protocol log generated by the consistency operation of the target system is stored in the log storage module on the local device through the special interface, and the local file system in the local storage device does not need to generate the locally stored log, so that the situation that the storage resource of the local storage device is wasted due to the fact that the locally stored log and the consistency protocol log are repeated is avoided.
When the distributed system needs to access the log, the log can be accessed only by using a special interface, consistency protocol data of consistency operation such as consistency can be stored in a local storage device in a form of a placement group by taking a data block as a unit, and the data block can be accessed by a data block searching module.
In the data management system in this embodiment, only the consistency protocol log is stored by using a special log storage module, and the local storage device is controlled not to generate a corresponding local storage log any more. The purpose of reducing the waste of the disk space managed by the distributed system is achieved, and therefore the technical effects of improving the storage space utilization rate of the distributed system and maximally developing the hardware performance are achieved.
The logs are stored in the local storage device through the special interface, so that the aims of reducing the burden of the distributed system on managing and storing the logs are fulfilled, and the technical effect of improving the performance of the distributed system storage device is achieved.
Optionally, the data block searching module includes a data block map unit and a data block indexing unit, and in the case of receiving an input/output request of the data block, invoking the data block searching module to search for the data block includes: receiving an input and output request of a data block, wherein the input and output request comprises binary information of the data block, and the binary information comprises a first identifier of a volume to which the data block belongs and a second identifier of the data block in the volume; calculating a hash value of the binary group, calling a data block map unit according to the hash value, and determining a data block index unit needing to be called; calling a data block index unit, and determining the position of a data block object according to triple information, wherein the triple information comprises a first identifier, a second identifier and a third identifier of a consistency operation version, and the data block object is a storage unit of a data block in local storage equipment; the data block is looked up in the data block object.
The data searching module receives a request for inputting and outputting a target data block, wherein the request can comprise a first identifier of a volume to which the data block belongs and a binary group of a second identifier of the data block in the volume, the first identifier can be a volume identifier, the second identifier can be a block identifier, the data block is searched in a data block map unit according to the binary group, and a hash value is calculated according to the binary group; taking the hash value as an index, and searching an interval where the binary group is located, wherein interval storage can be an ordered key value pair record; and searching corresponding target key-value pairs in key-value pair records corresponding to the intervals according to the first identifier, the second identifier and the third identifier in the data block index unit, wherein the third identifier can be a snapshot identifier, and determining corresponding target data blocks through the target key-value pairs, wherein the snapshot identifier is used for identifying different snapshot volumes.
Receiving a write request for a second target data block of the snapshot data, wherein the second target data block is a data block at any first data block position in the snapshot data; under the condition that no data block exists in the first data block position of the current volume, writing a second target data block into the first data block position of the current volume of the snapshot data; and if the first data block position of the current volume has the data block, covering the target data block with the data block at the first data block position.
Receiving a read request for reading a third target data block, wherein the third target data block is a data block at any second data block position in the snapshot data; under the condition that a data block exists at the position of a second data block of the current volume, reading the data block at the position of the second data block of the current volume; under the condition that no data block exists in the second data block position of the current volume, reading the data block in the second data block position of the snapshot volume before the current volume; in the event that all snapshot volumes at the second data block location have no data blocks, all zero data blocks are returned in response to the read request.
Optionally, the consistency protocol log is stored in a first storage device with poor read-write performance in a local storage device, where the local storage device includes a plurality of storage devices with different read-write performances; storing the data block of the consistency protocol data in a second storage device with better read-write performance in the local storage device; and/or storing a plurality of data blocks of different coherency operations versions of a unified data block of coherency protocol data in a similar location in the second storage device.
The physical storage resources can be managed in a classified mode according to different storage media or different storage performances, the consistency protocol logs do not have high requirements for access speed, the consistency protocol logs can be stored in the storage equipment with poor storage performance, the data blocks of the consistency protocol data have high requirements for access speed, and the consistency protocol data blocks can be stored in the second equipment with good read-write performance. The first storage device with poor read-write performance may be one or more storage devices with lower read-write speed among a plurality of storage devices of the local storage device, or the first storage device with poor read-write performance may be a storage device with read-write speed not exceeding a preset speed. The second storage device with better read-write performance may be one or more storage devices with higher read-write speed in a plurality of storage devices of the local storage device, or the first storage device with better read-write performance may be a storage device with read-write speed exceeding a preset speed.
By classifying and storing different requirements of different data access speeds, the purpose of hybrid deployment of the fast and slow storage equipment is achieved, and the technical effect of fully exerting the storage performance is achieved.
It should be noted that the present application also provides an alternative implementation, and the details of the implementation are described below.
The implementation mode is specially customized for the consistency protocol Raft log and the state machine of the upper layer, and is completely compatible with a Chunk service Chunk Server and a consistency protocol Raft interface, so that the waste of space is reduced, and the burden of the upper layer is reduced; the data reliability is greatly improved compared with the prior distributed local storage management system by carrying out background polling and verification on the data; the mixed deployment of fast and slow media is supported, the universality is strong, and the performance of hardware is exerted to the maximum extent; compared with the ROW snapshot, the method uses a different method to store the metadata of the snapshot, reduces the updated metadata amount needed by newly written data, and reduces the written physical data amount compared with the traditional snapshot; it is easy to store snapshot metadata using key-value KV data.
The technical points of the embodiment are as follows:
firstly, the overall layout of the embodiment is realized;
fig. 3 is a schematic diagram of the overall design layout of a localstore (local storage) according to the embodiment, as shown in fig. 3, each block "in fig. 3 represents one Blob, and the following description is given:
1. SuperBlob: super blobs (binary large Objects, also referred to as ordered lists composed of clusters) that can be created, deleted, resized, and persisted, and can also persist after power-down restart) are the "roots" of all data on an OSD (Object Storage Device), and the locations of various other data structures can be directly (or indirectly) obtained through super blobs.
2. Left Log: and the consistency protocol log is a data file for storing a raft log and a level1 SST.
3. CkBucket Map: the data block map unit stores the Blob of the index of the CkBucket, and by looking up the table, it is possible to find out to which CkBucket (data block index unit) a certain tuple (volume identifier VolID, block identifier CkID) belongs.
4. CkBucket: and the Data block index unit, wherein Blob is used for storing the corresponding relationship between the triple (volume identifier VolId, block identifier CkId, snapshot identifier SnapId) and Id of Chunk Data Blob.
5. Chunk Meta: and storing the Data of the Chunk, wherein each snapshot (4MB) of the Chunk corresponds to one Chunk Data Blob.
6. Metadata reliability assurance:
(1) at least four copies of metadata SSD meta and placement group metadata PG meta of a local storage device in a SuperBlob are stored, each copy has a version number at the head, and the versions of the four copies are different; (2) each metadata has a check (CRC 32 is the default, and cyclic redundancy check) and when the metadata is read, the header checksum is calculated firstly, one checksum is selected from 4 parts and is correct, and the latest version number is read into the memory;
(3) the CkBucket Map is also stored in four parts, and different versions are stored in the CkBucket Map.
FIG. 4 is a schematic diagram of data block overwriting and checking in the present embodiment, as shown in FIG. 4, when writing metadata, only one version is written, overwriting the oldest version; when data is read, 4 pieces of metadata are read simultaneously, and error check and error versions are discarded, so that the metadata which has no error when the data is read by the user can be ensured, and the reliability of the data is greatly improved.
Secondly, background polling and checking of the data block, so that the reliability and the availability of the data are improved;
1. using end-to-end parity, for each write IO (i.e., input output) request, adding a parity field for each data block, the parity being added from the beginning of the initial receipt of zStorage (or at the block device driver layer of the operating system), the parity field being written to the hard disk with the data; for each IO reading request, verifying each data block before submitting the data block to the client, if the data block is not verified, reporting an error, and returning the errors by IO reading; this can greatly avoid the user getting wrong data.
And 2, regularly reading all data blocks and checksums thereof from the disk and checking the data blocks and the checksums. If the check is not passed, the redundant data block is used for recovering the data block, simultaneously the error is logged, and if the number of the errors reaches a certain threshold value, an alarm is given; through reasonable adjustment of the polling period, the performance of the storage device can not be influenced too much, and data in the storage device can be effective.
Thirdly, a special Raft interface is provided, and waste of space is reduced;
fig. 5 is a schematic diagram of a modified Raft protocol interface according to an embodiment of the present invention, and as shown in fig. 5, a corresponding Raft protocol interface is provided for an upper layer application, the Raft protocol interface is customized for a Raft protocol log and a state machine, and the Raft protocol interface is provided for the upper layer application, and supports storage of a Raft protocol state (term, voteFor), a Raft protocol log, and a Raft state machine (Chunk Pool); the log is only stored in the local storage, and the distributed system only needs to use the corresponding Raft protocol interface when needing to access the log, so that the burden of the distributed system for managing the log storage is reduced.
The local storage is divided into two layers, one layer is a Raft protocol storage management layer, the other layer is a Blob management layer, and the Raft protocol storage management layer provides a special Raft protocol interface for providing service for the upper layer and storing and managing a Raft protocol log; the Blob management layer provides read-write storage and data backup for data. Fig. 6 is a schematic diagram of module division according to an embodiment of the present invention, and as shown in fig. 6, the module division is schematic in the embodiment.
Fifthly, mixed deployment of fast and slow media is supported, universality is achieved, and the performance of storage hardware is fully exerted;
all storage resources (such as a mechanical hard disk (HDD), a Solid State Disk (SSD) and the like) in a cluster belong to a unified physical storage resource pool, and storage space is allocated in the unified physical storage resource pool for all storage objects (such as files, block devices, objects and the like) to carry out unified management;
the storage media with different performances are subjected to classified management, and byte-addressed persistent memories, NVMe SSD, SATA SSD, SAS HDD, SATA HDD and the like are adopted;
chunk data is stored with an HDD (or a general SSD), a WAL or a higher-level SST with an SSD (or an Optane storage);
sixthly, snapshot realization:
1. finding a snapshot of chunk:
fig. 7 is a schematic diagram of looking up certain snapshot data of CHUNK according to an embodiment of the present invention, as shown in fig. 7, a Hash value is calculated by a binary group (VOLUME identifier VOLUME _ ID, block identifier CHUNK _ ID), an interval where the binary group (VOLUME _ ID, CHUNK _ ID) is located is found by using the Hash value as an index, an ordered KV record is stored in the interval, and a corresponding logical address of CHUNK is searched by a binary method by using the triplet (VOLUME _ ID, CHUNK _ ID, SNAP _ ID).
Fig. 8 is a schematic diagram of logical address adjacency between snapshots according to an embodiment of the present invention, and as shown in fig. 8, data of all snapshots passing through the same data block are stored together in close logical proximity, and preferably stored according to the block address. The advantages are that: for the read operation, if the data block does not exist on the snapshot ID of the current primary volume, the data block can be found from the historical snapshot nearby, and the read operation performance is high. The disadvantages are that: (1) when there is a history snapshot, each write operation requires that the data block written on the current primary volume be inserted between two old data blocks. (2) When there is a large amount of snapshot data, the sequential read operation of the large block of data becomes a non-sequential read on the disk medium because there are a large number of historical snapshot data blocks in between. The solid state disk SSD is used as a medium for bearing online data of a database, and is characterized by high random read-write performance, and discontinuous data block layout of the same snapshot is not a big problem for the solid state disk SSD.
2. And (3) snapshot deletion:
when deleting a snapshot, if the snapshot is referenced, the snapshot cannot be deleted, and only the snapshot which is not referenced can be deleted, fig. 9 is a schematic diagram of the deletion of the snapshot according to the embodiment of the present invention, such as the 2 nd data (data is 987) in fig. 9, because this data is still referenced by the current primary volume snap 9995; and the snapshots that should be deleted before are deleted but because the reference is not deleted, if the snapshot that references them is deleted in this deletion, the snapshots that should be deleted before need to be deleted, such as data block No. 4 of snap999997 and data block No. 6 of snap 9998 in fig. 9. The reason these data blocks are left behind is: snapshot snap999996 needs to refer to them. Since snap999996 is also deleted, no other snapshots refer to them and the space they occupy can be reclaimed.
3. Snapshot read-write and rollback:
fig. 10 is a schematic diagram of snapshot reading, writing and rolling back according to an embodiment of the present invention, as shown in fig. 10, after a write request for a certain data block of a current primary volume is received, if the data block does not exist on the current primary volume, the data block is directly inserted. If so, the current data is overwritten.
When a read request is received, it is checked whether there is a corresponding data block on the snapshot ID (999995 in fig. 10) corresponding to the current primary volume. If yes, returning the data block; if not, searching whether the snapshot (with larger number) earlier than the primary volume snapshot ID exists, and if so, returning; if not, then an all-zeros block of data is returned.
It should be noted that the read operation should ignore those data blocks that have been rolled back. But the deleted data blocks should not be ignored.
The key to this embodiment is a unique overall design layout; the mixed deployment of fast and slow media is supported, and different storage media can be used; through background polling and verification of the data block, the quality of the data block can be monitored at any time; a customized Raft protocol interface is provided, and the burden of an upper layer is reduced; with a unique snapshot implementation.
The embodiment aims at the problems that most of the local storage systems of the existing distributed storage system do not pay attention to the local storage management efficiency and availability, so that the local storage efficiency is too slow and the reliability is poor, the current situation of the whole distributed storage system is influenced, a unique design thought is passed, unnecessary space overhead is reduced, different storage media are supported, the performance of hardware is fully exerted, and compared with the traditional local storage management system of the distributed storage system, the method has great improvement, the reliability and the availability are enhanced, and the local storage efficiency is improved.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the data management method of any one of the above.
According to another aspect of the embodiments of the present invention, there is also provided a computer storage medium, where the computer storage medium includes a stored program, and when the program runs, the apparatus where the computer storage medium is located is controlled to execute the data management method of any one of the above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A data management system, provided in a local storage device, for managing coherence protocol data of a target system, wherein the data management system includes: the log storage module and the data block searching module are used for searching data blocks;
the log storage module is used for storing a consistency protocol log, and the log storage module is in data connection with an input port of the local storage device, wherein the consistency protocol log is used for recording a log of a consistency operation performed by the target system;
the data block searching module is used for searching a data block stored in the local storage device, and the data block searching module is in data connection with an input port of the local storage device, wherein the data block is a data block for performing consistency operation on the target system.
2. The data management system of claim 1, wherein the data block lookup module comprises: a data block map unit, a data block index unit;
the data block map unit is used for storing root information of a plurality of data block index units, the data block map unit is in data connection with the input port of the local storage device, and the root information is used for searching the positions of the data block index units;
the data block indexing unit is used for storing index information of data blocks, and the data block indexing unit is in data connection with the input port of the local storage device, wherein the index information is used for searching storage positions of corresponding data blocks, and the data blocks are stored in a placement group of the local storage device.
3. The data management system of claim 2, further comprising: the root directory of the directory is a directory of the directory,
the root directory stores and places group metadata, the root directory is in data connection with an input port of the local storage device, the root directory is also in data connection with the log storage module and a data block map unit and a data block index unit of the data block searching module, wherein the placed group metadata is used for identifying data blocks of consistency protocol data belonging to corresponding placed groups;
the root directory also stores metadata of the local storage device, and the placement group metadata and the metadata of the local storage device both comprise a plurality of versions of metadata;
the data block map unit includes multiple versions of data block map data, where the versions are versions of a coherency operation.
4. The data management system of claim 3, further comprising: a checking module, a routing inspection module,
the verification module is used for verifying input and output requests, and the input and output requests comprise write input and output requests and read input and output requests;
and the inspection module is used for regularly checking the data blocks and recording an inspection log.
5. The data management system of claim 4, further comprising: a log access interface for accessing the log, and a log access interface for accessing the log,
the log access interface is respectively in data connection with the log storage module and the input port of the local storage device, the log access interface is a special interface, and the special interface is used for being directly called by the target system, acquiring the consistency protocol log from the log storage module and performing read-write operation on the consistency protocol log.
6. A method for managing data, comprising:
a log storage module for storing the consistency protocol log generated by the consistency operation of the target system on the local storage device through a special interface;
storing consistency protocol data of the target system for consistency operation in a local storage device in a form of a placement group by taking a data block as a unit, and recording the storage position of the data block through a data block searching module;
and under the condition of receiving the input and output requests of the data blocks, calling the data block searching module to search the data blocks.
7. The method of claim 6, wherein the data block search module comprises a data block map unit and a data block index unit, and in the case of receiving the input and output request of the data block, invoking the data block search module to search for the data block comprises:
receiving an input and output request of a data block, wherein the input and output request comprises binary information of the data block, the binary information comprises a first identification of a volume to which the data block belongs and a second identification of the data block in the volume;
calculating the hash value of the binary group, calling the data block map unit according to the hash value, and determining a data block index unit needing to be called;
calling the data block index unit, and determining the position of a data block object according to triple information, wherein the triple information comprises the first identifier, the second identifier and a third identifier of a consistency operation version, and the data block object is a storage unit of the data block in the local storage device;
and searching the data block object for the data block.
8. The method of claim 6, further comprising:
storing the consistency protocol log in a first storage device with poor read-write performance in the local storage device, wherein the local storage device comprises a plurality of storage devices with different read-write performance;
storing the data block of the consistency protocol data in a second storage device with better read-write performance in the local storage device;
and/or storing a plurality of data blocks of different consistency operation versions of unified data blocks of the consistency protocol data in similar positions in the second storage device.
9. A processor, configured to run a program, wherein the program when running performs the data management method of any one of claims 6 to 8.
10. A computer storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer storage medium is located to perform the data management method of any one of claims 6 to 8.
CN202111479575.2A 2021-12-07 2021-12-07 Data management system and method Active CN113885809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111479575.2A CN113885809B (en) 2021-12-07 2021-12-07 Data management system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111479575.2A CN113885809B (en) 2021-12-07 2021-12-07 Data management system and method

Publications (2)

Publication Number Publication Date
CN113885809A true CN113885809A (en) 2022-01-04
CN113885809B CN113885809B (en) 2022-03-18

Family

ID=79015697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111479575.2A Active CN113885809B (en) 2021-12-07 2021-12-07 Data management system and method

Country Status (1)

Country Link
CN (1) CN113885809B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579529A (en) * 2022-05-07 2022-06-03 深圳市杉岩数据技术有限公司 Local storage method and system based on redirection and log mixing
CN116775571A (en) * 2023-08-23 2023-09-19 北京昆迈医疗科技有限公司 Data management system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459557A (en) * 2008-11-29 2009-06-17 成都市华为赛门铁克科技有限公司 Secure logging centralized storage method and device
CN101964030A (en) * 2010-07-19 2011-02-02 北京兴宇中科科技开发股份有限公司 Volume stage continuous data protection system supported by consistent point insertion and recovery and method
CN102576321A (en) * 2009-09-22 2012-07-11 Emc公司 Snapshotting a performance storage system in a system for performance improvement of a capacity optimized storage system
US20120254120A1 (en) * 2011-03-31 2012-10-04 International Business Machines Corporation Logging system using persistent memory
CN103221925A (en) * 2012-11-23 2013-07-24 华为技术有限公司 Data processing method and storage equipment
CN104239538A (en) * 2014-09-22 2014-12-24 北京国双科技有限公司 Method, system and device for compressing snapshot log
US20150154288A1 (en) * 2013-11-29 2015-06-04 Konkuk University Industrial Cooperation Corp. Method and system for processing log data
CN108959497A (en) * 2018-06-26 2018-12-07 郑州云海信息技术有限公司 distributed file system log processing method, device, equipment and storage medium
CN108958970A (en) * 2018-05-29 2018-12-07 新华三技术有限公司 A kind of data reconstruction method, server and computer-readable medium
US20190012357A1 (en) * 2017-07-07 2019-01-10 Sap Se Logging changes to data stored in distributed data storage system
CN110023896A (en) * 2016-12-19 2019-07-16 净睿存储股份有限公司 The merged block in flash-memory storage system directly mapped
CN113220236A (en) * 2021-05-17 2021-08-06 北京青云科技股份有限公司 Data management method, system and equipment
WO2021226822A1 (en) * 2020-05-12 2021-11-18 深圳市欢太科技有限公司 Log write method and apparatus, electronic device, and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459557A (en) * 2008-11-29 2009-06-17 成都市华为赛门铁克科技有限公司 Secure logging centralized storage method and device
CN102576321A (en) * 2009-09-22 2012-07-11 Emc公司 Snapshotting a performance storage system in a system for performance improvement of a capacity optimized storage system
CN101964030A (en) * 2010-07-19 2011-02-02 北京兴宇中科科技开发股份有限公司 Volume stage continuous data protection system supported by consistent point insertion and recovery and method
US20120254120A1 (en) * 2011-03-31 2012-10-04 International Business Machines Corporation Logging system using persistent memory
CN103221925A (en) * 2012-11-23 2013-07-24 华为技术有限公司 Data processing method and storage equipment
US20150154288A1 (en) * 2013-11-29 2015-06-04 Konkuk University Industrial Cooperation Corp. Method and system for processing log data
CN104239538A (en) * 2014-09-22 2014-12-24 北京国双科技有限公司 Method, system and device for compressing snapshot log
CN110023896A (en) * 2016-12-19 2019-07-16 净睿存储股份有限公司 The merged block in flash-memory storage system directly mapped
US20190012357A1 (en) * 2017-07-07 2019-01-10 Sap Se Logging changes to data stored in distributed data storage system
CN108958970A (en) * 2018-05-29 2018-12-07 新华三技术有限公司 A kind of data reconstruction method, server and computer-readable medium
CN108959497A (en) * 2018-06-26 2018-12-07 郑州云海信息技术有限公司 distributed file system log processing method, device, equipment and storage medium
WO2021226822A1 (en) * 2020-05-12 2021-11-18 深圳市欢太科技有限公司 Log write method and apparatus, electronic device, and storage medium
CN113220236A (en) * 2021-05-17 2021-08-06 北京青云科技股份有限公司 Data management method, system and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卢坤: ""云平台上日志存储与分析系统关键技术研究"", 《中国博士学位论文全文数据库(电子期刊)信息科技辑》 *
马强: ""入侵防御系统日志采集和存储的改进方法"", 《计算机工程与设计》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579529A (en) * 2022-05-07 2022-06-03 深圳市杉岩数据技术有限公司 Local storage method and system based on redirection and log mixing
CN116775571A (en) * 2023-08-23 2023-09-19 北京昆迈医疗科技有限公司 Data management system
CN116775571B (en) * 2023-08-23 2023-11-07 北京昆迈医疗科技有限公司 Data management system

Also Published As

Publication number Publication date
CN113885809B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
US20190163591A1 (en) Remote Data Replication Method and System
KR101827239B1 (en) System-wide checkpoint avoidance for distributed database systems
KR101833114B1 (en) Fast crash recovery for distributed database systems
US7421551B2 (en) Fast verification of computer backup data
US11816093B2 (en) Storage tier verification checks
US9880762B1 (en) Compressing metadata blocks prior to writing the metadata blocks out to secondary storage
CN106547859B (en) Data file storage method and device under multi-tenant data storage system
US9009428B2 (en) Data store page recovery
CN106776130B (en) Log recovery method, storage device and storage node
CN102033786B (en) Method for repairing consistency of copies in object storage system
CN113885809B (en) Data management system and method
US11347600B2 (en) Database transaction log migration
US10628298B1 (en) Resumable garbage collection
US10977143B2 (en) Mirrored write ahead logs for data storage system
US8140886B2 (en) Apparatus, system, and method for virtual storage access method volume data set recovery
US11403176B2 (en) Database read cache optimization
US20230139582A1 (en) Forwarding operations to bypass persistent memory
CN114416665B (en) Method, device and medium for detecting and repairing data consistency
CN115098299A (en) Backup method, disaster recovery method, device and equipment for virtual machine
CN107402841B (en) Data restoration method and device for large-scale distributed file system
CN110134551B (en) Continuous data protection method and device
WO2022033269A1 (en) Data processing method, device and system
CN113805811A (en) Method, system, equipment and storage medium for optimizing read-write access file
US10747610B2 (en) Leveraging distributed metadata to achieve file specific data scrubbing
CN111831230A (en) Intelligent storage system for high-performance calculation and big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant