CN116257186A - Data object erasure code storage method, device, equipment and medium - Google Patents

Data object erasure code storage method, device, equipment and medium Download PDF

Info

Publication number
CN116257186A
CN116257186A CN202310201833.3A CN202310201833A CN116257186A CN 116257186 A CN116257186 A CN 116257186A CN 202310201833 A CN202310201833 A CN 202310201833A CN 116257186 A CN116257186 A CN 116257186A
Authority
CN
China
Prior art keywords
data object
read
local
target
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310201833.3A
Other languages
Chinese (zh)
Inventor
樊云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Jinan data Technology Co ltd
Original Assignee
Inspur Jinan data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Jinan data Technology Co ltd filed Critical Inspur Jinan data Technology Co ltd
Priority to CN202310201833.3A priority Critical patent/CN116257186A/en
Publication of CN116257186A publication Critical patent/CN116257186A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data object erasure code storage method, device, equipment and medium, relating to the technical field of cloud computing data centers, comprising the following steps: acquiring a data object read-write request, and storing the data object read-write request to a local; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure code coding operation on the target data object to obtain each sub-target data object; and sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device can improve the storage utilization rate of the erasure codes of the data objects, reduce the cost and increase the stability of the distributed storage system.

Description

Data object erasure code storage method, device, equipment and medium
Technical Field
The present invention relates to the field of cloud computing data centers, and in particular, to a method, an apparatus, a device, and a medium for storing erasure codes of data objects.
Background
The distributed storage system is characterized in that data are stored on a plurality of independent devices in a scattered mode, an extensible system structure is adopted, a plurality of storage servers are utilized to share storage load, meanwhile, a position server is utilized to position storage information, software defined storage represented by Ceph (distributed file system) and VSAN (VMware Virtual SAN) is a laterally-expanded, automatically-balanced and self-healing distributed storage system, hardware resources such as a commercial x86 server, a solid state disk and a mechanical hard disk are integrated into a thin supply resource pool, and storage services are provided in various interface modes such as block storage, file storage, object storage and Restful API. At present, in a distributed storage system, data protection is generally realized by adopting multiple copies and erasure codes, the erasure codes are obtained by cutting a data object and then encoding the data object, and the erasure codes are still single copies, but because redundancy check code words are required to be provided for an encoding algorithm, the increase of storage space of the copy data per se can be caused, the erasure codes are based on the principle that the data object is divided into n parts, m data are added for the n parts of data at the same time to be stored as redundancy data, and the original data object can be restored by any n parts of data in n+m parts according to the adopted algorithm. If n+m data are distributed on different nodes of the storage system, for any failure of m nodes (failure of m data) or less, original data can be restored through other residual data, multiple copies can store the same multiple copies of the data object in multiple nodes in the distributed storage system, obviously the multiple copies are highly reliable, the realization is simple, the service is not affected as long as at least one copy object exists in a cluster, however, as the same data object stores multiple copies in the distributed system, the space utilization rate is lower, the storage cost is increased for enterprise-level users in an intangible way, and meanwhile, the consistency model among the multiple copies can also cause the reduction of writing performance.
From the above, how to improve the utilization rate of erasure code storage of data objects, reduce the cost, and increase the stability of the distributed storage system is a problem to be solved in the art.
Disclosure of Invention
Accordingly, the present invention is directed to a method, apparatus, device and medium for storing erasure codes of data objects, which can improve the utilization rate of erasure code storage of data objects, reduce the cost and increase the stability of a distributed storage system. The specific scheme is as follows:
in a first aspect, the present application discloses a data object erasure code storage method, applied to any node in a cluster, including:
acquiring a data object read-write request, and storing the data object read-write request to a local place;
performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object;
and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
Optionally, the obtaining the data object read-write request and saving the data object read-write request to the local includes:
acquiring a data object read-write request, and storing the data object read-write request to a local gateway;
and sending the data object read-write request in the gateway to a local storage layer.
Optionally, the performing read-write operation on the data object based on the data object read-write request, and determining the local current remaining storage space capacity, includes:
performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write.
Optionally, the screening the target data object to be stored from all the local data objects includes:
determining a dirty data object judgment rule based on the service requirement;
and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
Optionally, the performing erasure coding operation on the target data object to obtain each sub-target data object includes:
and performing erasure code encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
Optionally, after sending and storing each target child data object to other nodes in the cluster except for the target child data object, the method further includes:
generating a message for representing successful transmission;
the message characterizing the success of the transmission is transmitted in broadcast form to all nodes in the cluster.
Optionally, the data object erasure code storage method further includes:
and after the message for representing the successful sending is acquired, deleting the target data object in the local storage layer.
In a second aspect, the present application discloses a data object erasure code storage apparatus, comprising:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local;
the target data object determining module is used for performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
the judging module is used for judging the type of the target data object, and if the type of the target data object is a main copy, erasure code encoding operation is carried out on the target data object so as to obtain each sub-target data object;
and the data object storage module is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the data object erasure code storage method.
In a fourth aspect, the present application discloses a computer storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the data object erasure code storage method disclosed previously.
It can be seen that the present application provides a data object erasure code storage method, which includes obtaining a data object read-write request, and storing the data object read-write request to a local place; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for storing erasure codes of data objects disclosed in the present application;
FIG. 2 is a diagram of a distributed storage model disclosed herein;
FIG. 3 is a flow chart of a method for storing erasure codes for data objects disclosed in the present application;
FIG. 4 is a schematic diagram of a data object erasure code storage according to the present disclosure;
FIG. 5 is a schematic diagram of a data object erasure code storage apparatus disclosed in the present application;
fig. 6 is a block diagram of an electronic device provided in the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The distributed storage system is characterized in that data are stored on a plurality of independent devices in a scattered mode, an extensible system structure is adopted, a plurality of storage servers are utilized to share storage load, meanwhile, a position server is utilized to position storage information, software defined storage represented by Ceph (distributed file system) and VSAN (VMware Virtual SAN) is a laterally-expanded, automatically-balanced and self-healing distributed storage system, hardware resources such as a commercial x86 server, a solid state disk and a mechanical hard disk are integrated into a thin supply resource pool, and storage services are provided in various interface modes such as block storage, file storage, object storage and Restful API. At present, in a distributed storage system, data protection is generally realized by adopting multiple copies and erasure codes, the erasure codes are obtained by cutting a data object and then encoding the data object, and the erasure codes are still single copies, but because redundancy check code words are required to be provided for an encoding algorithm, the increase of storage space of the copy data per se can be caused, the erasure codes are based on the principle that the data object is divided into n parts, m data are added for the n parts of data at the same time to be stored as redundancy data, and the original data object can be restored by any n parts of data in n+m parts according to the adopted algorithm. If n+m data are distributed on different nodes of the storage system, for any failure of m nodes (failure of m data) or less, original data can be restored through other residual data, multiple copies can store the same multiple copies of the data object in multiple nodes in the distributed storage system, obviously the multiple copies are highly reliable, the realization is simple, the service is not affected as long as at least one copy object exists in a cluster, however, as the same data object stores multiple copies in the distributed system, the space utilization rate is lower, the storage cost is increased for enterprise-level users in an intangible way, and meanwhile, the consistency model among the multiple copies can also cause the reduction of writing performance. From the above, how to improve the utilization rate of erasure code storage of data objects, reduce the cost, and increase the stability of the distributed storage system is a problem to be solved in the art.
The method and the system are applied to any node in the distributed storage model shown in fig. 2, and for the traditional distributed storage system model adopting the multi-copy strategy, the distributed storage system not only improves the reliability, availability and access efficiency of the system, but also can bring high expansibility and extremely low cost control, and has the capability of building and providing enterprise-level storage. Compared with multiple copies and erasure codes, the multiple copies have good read-write performance, but low disc yield, the erasure codes have high disc yield, but the verification codes have performance loss in calculation; for key services with high performance requirements, a multi-copy strategy is generally adopted, however, erasure codes can be adopted for mass storage scenes, so that the utilization rate of a storage system is improved, and the cost is reduced. The multi-copy strategy is used in the Cache layer, so that the reliability and performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
Referring to fig. 1, the embodiment of the invention discloses a data object erasure code storage method, which specifically includes:
step S11: and acquiring a data object read-write request, and storing the data object read-write request to a local place.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local gateway; and sending the data object read-write request in the gateway to a local storage layer. Namely, the cluster comprises a plurality of different nodes, after the data object read-write request is acquired, gateway (namely Gateway) in the nodes forwards the data object read-write request to each node, and then when the nodes receive the data object read-write request, the data object read-write request is stored locally.
Step S12: and performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value.
In this embodiment, the storage layer is used to perform read-write operation on the data object based on the data object read-write request, so as to obtain the storage space capacity occupied by read-write; determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write, then judging whether the current residual storage space capacity is larger than a preset threshold value, and determining a dirty data object judging rule based on service requirements if the current residual storage space capacity is larger than the preset threshold value; and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
It can be understood that, the Cache layer (i.e. the storage layer) in the node is utilized to read and write the data object, the storage space capacity occupied by the read and write is determined, the total storage space capacity in the node and the storage space capacity occupied by the read and write are differenced to obtain the current residual storage space capacity, when the Cache layer capacity reaches a predetermined threshold value, the Cache management is utilized to perform the dirty data brushing operation in the Cache layer, i.e. the dirty data is screened out from the data object in the Cache layer according to the service requirement, and the dirty data is used as the target data object to be stored.
Step S13: judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object.
Step S14: and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
In this embodiment, whether the target data object to be stored in the node is a primary copy is determined, and if not, the process is ignored; and if the target data object is the master copy, performing dirty data downloading, namely performing erasure code encoding operation on the target data object to obtain each sub-target data object, and then sending each sub-target data object to other nodes except the target data object in the cluster.
The method is applied to any node in the distributed storage model shown in fig. 2, and for the traditional distributed storage system model adopting a multi-copy strategy, gateway (namely Gateway) is responsible for requesting forwarding among nodes, and a Cache layer (namely storage layer) is a space formed by high-performance media, has limited capacity and can provide higher performance; the Data layer (namely, the Data layer) is a space formed by a large-capacity medium, and has larger capacity but poorer performance; the object request is forwarded to different copy nodes through a Gateway module, when the copy request reaches a node A and a node C, the request is firstly processed by a Cache layer, and because the Cache layer space is limited, dirty Data objects need to be flushed down to a Data layer space after the Cache layer space reaches a preset threshold, namely the Data objects of the Cache layer are written into the Data layer, and then the Cache space is released, so that the purpose of recovering the Cache space is achieved; typically, in order to avoid the business IO being affected by the brushing-down process, the process is implemented by adopting asynchronous processing; typically, distributed storage systems employ multiple copy policies at both the Cache layer and the Data layer, resulting in low space utilization. However, the multi-copy strategy is used in the Cache layer, so that the reliability and the performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Referring to fig. 3, the embodiment of the invention discloses a data object erasure code storage method, which specifically includes:
step S21: and acquiring a data object read-write request, and storing the data object read-write request to a local place.
Step S22: and performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value.
Step S23: judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding and encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
Step S24: and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
Step S25: a message is generated that characterizes the success of the transmission and then sent in broadcast form to all nodes in the cluster.
In this embodiment, after the message for characterizing the success of the transmission is acquired, the deletion operation is performed on the target data object in the local storage layer.
In this embodiment, when the Data is to be written down, after the target Data object performs erasure coding through the EC module (i.e., the super module), the Data is distributed in the Data layer space of all the nodes in the cluster, after the Data is written down, the cluster message is broadcasted, and the Cache management of all the nodes is notified to delete the copies in the Cache space, so that the Cache space is released, and therefore, the erasure coding storage scheme of the Data object in the Data layer space is realized. As shown in fig. 4, a rule of a master copy is defined, that is, a master copy is defined in a multi-copy policy, a cluster includes a node a, a node B and a node C, for example, the node a obtains a data object read-write request, the data object read-write request is sent to the node C through a Gateway layer in the node a, then a Cache layer in the node a performs read-write operation on the data object to obtain a current remaining storage space capacity, then judges whether the current remaining storage space capacity is greater than a preset threshold value, if the current remaining storage space capacity is greater than the preset threshold value, then all local data objects are screened out to be stored, then an erasure code encoding operation is performed on the target data objects by using an EC module to obtain sub-target data objects, then the sub-target data objects are respectively sent to the node B and the node C, then the node a generates a message for representing that the transmission is successful, and then the message for representing the transmission is sent to all nodes in the cluster in a broadcast form, after receiving the message, the node a deletes the target data object in the Cache layer, if the current remaining storage space capacity is greater than the preset threshold value, and if the target data object in the Cache layer is also exists in the target data object Cache layer.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Referring to fig. 5, an embodiment of the present invention discloses a data object erasure code storage apparatus, which may specifically include:
the request acquisition module 11 is configured to acquire a data object read-write request, and store the data object read-write request to a local area;
the target data object determining module 12 is configured to perform a read-write operation on a data object based on the data object read-write request, determine a local current remaining storage space capacity, determine whether the current remaining storage space capacity is greater than a preset threshold, and if the current remaining storage space capacity is greater than the preset threshold, screen a target data object to be stored from all local data objects;
the judging module 13 is configured to judge a type of the target data object, and if the type of the target data object is a primary copy, perform erasure code encoding operation on the target data object to obtain each sub-target data object;
and the data object storage module 14 is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
The method is applied to any node in the distributed storage model, and for the traditional distributed storage system model adopting a multi-copy strategy, gateway (namely Gateway) is responsible for requesting forwarding among nodes, and Cache layer (namely storage layer) is a space formed by high-performance media, has limited capacity and can provide higher performance; the Data layer (namely, the Data layer) is a space formed by a large-capacity medium, and has larger capacity but poorer performance; the object request is forwarded to different copy nodes through a Gateway module, when the copy request reaches a node A and a node C, the request is firstly processed by a Cache layer, and because the Cache layer space is limited, dirty Data objects need to be flushed down to a Data layer space after the Cache layer space reaches a preset threshold, namely the Data objects of the Cache layer are written into the Data layer, and then the Cache space is released, so that the purpose of recovering the Cache space is achieved; typically, in order to avoid the business IO being affected by the brushing-down process, the process is implemented by adopting asynchronous processing; typically, distributed storage systems employ multiple copy policies at both the Cache layer and the Data layer, resulting in low space utilization. However, the multi-copy strategy is used in the Cache layer, so that the reliability and the performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
In some specific embodiments, the request acquiring module 11 may specifically include:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local gateway;
and the request sending module is used for sending the data object read-write request in the gateway to a local storage layer.
In some specific embodiments, the target data object determining module 12 may specifically include:
the read-write module is used for performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and the current residual storage space capacity determining module is used for determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the reading and writing.
In some specific embodiments, the target data object determining module 12 may specifically include:
the judging rule determining module is used for determining a dirty data object judging rule based on the service requirement;
and the target data object determining module is used for screening all the data objects in the local storage layer according to the dirty data object judging rule so as to obtain the target data object to be stored.
In some specific embodiments, the determining module 13 may specifically include:
and the sub-target data object determining module is used for performing erasure code encoding operation on the target data object by utilizing a local super module and adopting an asynchronous processing mode so as to obtain each sub-target data object.
In some embodiments, the data object storage module 14 may specifically include:
the message generation module is used for generating a message for representing successful transmission;
and the message sending module is used for sending the message used for representing the successful sending to all nodes in the cluster in a broadcast mode.
In some embodiments, the data object storage module 14 may specifically include:
and the deleting module is used for deleting the target data object in the local storage layer after the message for representing the successful sending is acquired.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the data object erasure code storage method performed by the electronic device as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the data 223 in the memory 22 by the processor 21, which may be Windows, unix, linux or the like. The computer program 222 may further include a computer program capable of performing other specific tasks in addition to the computer program capable of performing the data object erasure code storage method performed by the electronic device 20 as disclosed in any of the foregoing embodiments. The data 223 may include, in addition to the data transmitted by the external device and received by the data object erasure code storage device, data collected by the self input/output interface 25, and so on.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the steps of the data object erasure code storage method disclosed in any embodiment are realized.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is provided in detail of a method, apparatus, device and storage medium for storing erasure codes of data objects, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. The data object erasure code storage method is characterized by being applied to any node in a cluster and comprising the following steps of:
acquiring a data object read-write request, and storing the data object read-write request to a local place;
performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object;
and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
2. The data object erasure code storage method according to claim 1, wherein the obtaining the data object read-write request and saving the data object read-write request to the local comprises:
acquiring a data object read-write request, and storing the data object read-write request to a local gateway;
and sending the data object read-write request in the gateway to a local storage layer.
3. The method for storing erasure codes for data objects according to claim 2, wherein the performing read-write operations on the data objects based on the data object read-write requests and determining the local current remaining storage space capacity includes:
performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write.
4. The method for storing erasure codes for data objects according to claim 3, wherein said screening out target data objects to be stored from all local data objects comprises:
determining a dirty data object judgment rule based on the service requirement;
and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
5. The method for storing erasure codes for data objects according to claim 1, wherein said performing erasure code encoding operation on said target data object to obtain each sub-target data object comprises:
and performing erasure code encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
6. The method for storing erasure codes for data objects according to any of claims 1 to 5, wherein after each of the target child data objects is sent to and stored in a node other than itself in the cluster, the method further comprises:
generating a message for representing successful transmission;
the message characterizing the success of the transmission is transmitted in broadcast form to all nodes in the cluster.
7. The data object erasure code storage method according to claim 6, further comprising:
and after the message for representing the successful sending is acquired, deleting the target data object in the local storage layer.
8. A data object erasure code storage apparatus, comprising:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local;
the target data object determining module is used for performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
the judging module is used for judging the type of the target data object, and if the type of the target data object is a main copy, erasure code encoding operation is carried out on the target data object so as to obtain each sub-target data object;
and the data object storage module is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data object erasure code storage method according to any of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the data object erasure code storage method according to any of claims 1 to 7.
CN202310201833.3A 2023-02-28 2023-02-28 Data object erasure code storage method, device, equipment and medium Pending CN116257186A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310201833.3A CN116257186A (en) 2023-02-28 2023-02-28 Data object erasure code storage method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310201833.3A CN116257186A (en) 2023-02-28 2023-02-28 Data object erasure code storage method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116257186A true CN116257186A (en) 2023-06-13

Family

ID=86684087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310201833.3A Pending CN116257186A (en) 2023-02-28 2023-02-28 Data object erasure code storage method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116257186A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954523A (en) * 2023-09-20 2023-10-27 苏州元脑智能科技有限公司 Storage system, data storage method, data reading method and storage medium
CN117240873A (en) * 2023-11-08 2023-12-15 阿里云计算有限公司 Cloud storage system, data reading and writing method, device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954523A (en) * 2023-09-20 2023-10-27 苏州元脑智能科技有限公司 Storage system, data storage method, data reading method and storage medium
CN116954523B (en) * 2023-09-20 2024-01-26 苏州元脑智能科技有限公司 Storage system, data storage method, data reading method and storage medium
CN117240873A (en) * 2023-11-08 2023-12-15 阿里云计算有限公司 Cloud storage system, data reading and writing method, device and storage medium
CN117240873B (en) * 2023-11-08 2024-03-29 阿里云计算有限公司 Cloud storage system, data reading and writing method, device and storage medium

Similar Documents

Publication Publication Date Title
CN116257186A (en) Data object erasure code storage method, device, equipment and medium
JP4696089B2 (en) Distributed storage system
US20190196728A1 (en) Distributed storage system-based data processing method and storage device
US9361034B2 (en) Transferring storage resources between snapshot storage pools and volume storage pools in a distributed network
WO1991014230A1 (en) Message communication processing system
AU2012398211A1 (en) Caching method for distributed storage system, a lock server node, and a lock client node
CN112783445A (en) Data storage method, device, system, electronic equipment and readable storage medium
CN110780819A (en) Data read-write method of distributed storage system
US20200142634A1 (en) Hybrid distributed storage system to dynamically modify storage overhead and improve access performance
WO2019062856A1 (en) Data reconstruction method and apparatus, and data storage system
CN112130758B (en) Data reading request processing method and system, electronic equipment and storage medium
US12131051B2 (en) Migrating data between different medium layers of a storage system
CN109254958A (en) Distributed data reading/writing method, equipment and system
CN111399760B (en) NAS cluster metadata processing method and device, NAS gateway and medium
JP5475702B2 (en) Mail storage backup system and backup method
CN117914675A (en) Method and device for constructing distributed cache system
CN112104729A (en) Storage system and caching method thereof
CN103685359B (en) Data processing method and device
CN115981559A (en) Distributed data storage method and device, electronic equipment and readable medium
CN103488768A (en) File management method and file management system based on cloud computing
CN115706727A (en) Cloud desktop data migration method, node and server
CN113301086A (en) DNS data management system and management method
CN111488324A (en) Distributed network file system based on message middleware and working method thereof
CN110968257A (en) Method, apparatus and computer program product for storage management
US10938701B2 (en) Efficient heartbeat with remote servers by NAS cluster nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination