CN116257186A - Data object erasure code storage method, device, equipment and medium - Google Patents
Data object erasure code storage method, device, equipment and medium Download PDFInfo
- Publication number
- CN116257186A CN116257186A CN202310201833.3A CN202310201833A CN116257186A CN 116257186 A CN116257186 A CN 116257186A CN 202310201833 A CN202310201833 A CN 202310201833A CN 116257186 A CN116257186 A CN 116257186A
- Authority
- CN
- China
- Prior art keywords
- data object
- read
- local
- target
- target data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012216 screening Methods 0.000 claims abstract description 17
- 238000004590 computer program Methods 0.000 claims description 17
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 3
- 238000005201 scrubbing Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 2
- 230000001680 brushing effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/062—Securing storage systems
- G06F3/0622—Securing storage systems in relation to access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a data object erasure code storage method, device, equipment and medium, relating to the technical field of cloud computing data centers, comprising the following steps: acquiring a data object read-write request, and storing the data object read-write request to a local; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure code coding operation on the target data object to obtain each sub-target data object; and sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device can improve the storage utilization rate of the erasure codes of the data objects, reduce the cost and increase the stability of the distributed storage system.
Description
Technical Field
The present invention relates to the field of cloud computing data centers, and in particular, to a method, an apparatus, a device, and a medium for storing erasure codes of data objects.
Background
The distributed storage system is characterized in that data are stored on a plurality of independent devices in a scattered mode, an extensible system structure is adopted, a plurality of storage servers are utilized to share storage load, meanwhile, a position server is utilized to position storage information, software defined storage represented by Ceph (distributed file system) and VSAN (VMware Virtual SAN) is a laterally-expanded, automatically-balanced and self-healing distributed storage system, hardware resources such as a commercial x86 server, a solid state disk and a mechanical hard disk are integrated into a thin supply resource pool, and storage services are provided in various interface modes such as block storage, file storage, object storage and Restful API. At present, in a distributed storage system, data protection is generally realized by adopting multiple copies and erasure codes, the erasure codes are obtained by cutting a data object and then encoding the data object, and the erasure codes are still single copies, but because redundancy check code words are required to be provided for an encoding algorithm, the increase of storage space of the copy data per se can be caused, the erasure codes are based on the principle that the data object is divided into n parts, m data are added for the n parts of data at the same time to be stored as redundancy data, and the original data object can be restored by any n parts of data in n+m parts according to the adopted algorithm. If n+m data are distributed on different nodes of the storage system, for any failure of m nodes (failure of m data) or less, original data can be restored through other residual data, multiple copies can store the same multiple copies of the data object in multiple nodes in the distributed storage system, obviously the multiple copies are highly reliable, the realization is simple, the service is not affected as long as at least one copy object exists in a cluster, however, as the same data object stores multiple copies in the distributed system, the space utilization rate is lower, the storage cost is increased for enterprise-level users in an intangible way, and meanwhile, the consistency model among the multiple copies can also cause the reduction of writing performance.
From the above, how to improve the utilization rate of erasure code storage of data objects, reduce the cost, and increase the stability of the distributed storage system is a problem to be solved in the art.
Disclosure of Invention
Accordingly, the present invention is directed to a method, apparatus, device and medium for storing erasure codes of data objects, which can improve the utilization rate of erasure code storage of data objects, reduce the cost and increase the stability of a distributed storage system. The specific scheme is as follows:
in a first aspect, the present application discloses a data object erasure code storage method, applied to any node in a cluster, including:
acquiring a data object read-write request, and storing the data object read-write request to a local place;
performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object;
and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
Optionally, the obtaining the data object read-write request and saving the data object read-write request to the local includes:
acquiring a data object read-write request, and storing the data object read-write request to a local gateway;
and sending the data object read-write request in the gateway to a local storage layer.
Optionally, the performing read-write operation on the data object based on the data object read-write request, and determining the local current remaining storage space capacity, includes:
performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write.
Optionally, the screening the target data object to be stored from all the local data objects includes:
determining a dirty data object judgment rule based on the service requirement;
and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
Optionally, the performing erasure coding operation on the target data object to obtain each sub-target data object includes:
and performing erasure code encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
Optionally, after sending and storing each target child data object to other nodes in the cluster except for the target child data object, the method further includes:
generating a message for representing successful transmission;
the message characterizing the success of the transmission is transmitted in broadcast form to all nodes in the cluster.
Optionally, the data object erasure code storage method further includes:
and after the message for representing the successful sending is acquired, deleting the target data object in the local storage layer.
In a second aspect, the present application discloses a data object erasure code storage apparatus, comprising:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local;
the target data object determining module is used for performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
the judging module is used for judging the type of the target data object, and if the type of the target data object is a main copy, erasure code encoding operation is carried out on the target data object so as to obtain each sub-target data object;
and the data object storage module is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the data object erasure code storage method.
In a fourth aspect, the present application discloses a computer storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the data object erasure code storage method disclosed previously.
It can be seen that the present application provides a data object erasure code storage method, which includes obtaining a data object read-write request, and storing the data object read-write request to a local place; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for storing erasure codes of data objects disclosed in the present application;
FIG. 2 is a diagram of a distributed storage model disclosed herein;
FIG. 3 is a flow chart of a method for storing erasure codes for data objects disclosed in the present application;
FIG. 4 is a schematic diagram of a data object erasure code storage according to the present disclosure;
FIG. 5 is a schematic diagram of a data object erasure code storage apparatus disclosed in the present application;
fig. 6 is a block diagram of an electronic device provided in the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The distributed storage system is characterized in that data are stored on a plurality of independent devices in a scattered mode, an extensible system structure is adopted, a plurality of storage servers are utilized to share storage load, meanwhile, a position server is utilized to position storage information, software defined storage represented by Ceph (distributed file system) and VSAN (VMware Virtual SAN) is a laterally-expanded, automatically-balanced and self-healing distributed storage system, hardware resources such as a commercial x86 server, a solid state disk and a mechanical hard disk are integrated into a thin supply resource pool, and storage services are provided in various interface modes such as block storage, file storage, object storage and Restful API. At present, in a distributed storage system, data protection is generally realized by adopting multiple copies and erasure codes, the erasure codes are obtained by cutting a data object and then encoding the data object, and the erasure codes are still single copies, but because redundancy check code words are required to be provided for an encoding algorithm, the increase of storage space of the copy data per se can be caused, the erasure codes are based on the principle that the data object is divided into n parts, m data are added for the n parts of data at the same time to be stored as redundancy data, and the original data object can be restored by any n parts of data in n+m parts according to the adopted algorithm. If n+m data are distributed on different nodes of the storage system, for any failure of m nodes (failure of m data) or less, original data can be restored through other residual data, multiple copies can store the same multiple copies of the data object in multiple nodes in the distributed storage system, obviously the multiple copies are highly reliable, the realization is simple, the service is not affected as long as at least one copy object exists in a cluster, however, as the same data object stores multiple copies in the distributed system, the space utilization rate is lower, the storage cost is increased for enterprise-level users in an intangible way, and meanwhile, the consistency model among the multiple copies can also cause the reduction of writing performance. From the above, how to improve the utilization rate of erasure code storage of data objects, reduce the cost, and increase the stability of the distributed storage system is a problem to be solved in the art.
The method and the system are applied to any node in the distributed storage model shown in fig. 2, and for the traditional distributed storage system model adopting the multi-copy strategy, the distributed storage system not only improves the reliability, availability and access efficiency of the system, but also can bring high expansibility and extremely low cost control, and has the capability of building and providing enterprise-level storage. Compared with multiple copies and erasure codes, the multiple copies have good read-write performance, but low disc yield, the erasure codes have high disc yield, but the verification codes have performance loss in calculation; for key services with high performance requirements, a multi-copy strategy is generally adopted, however, erasure codes can be adopted for mass storage scenes, so that the utilization rate of a storage system is improved, and the cost is reduced. The multi-copy strategy is used in the Cache layer, so that the reliability and performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
Referring to fig. 1, the embodiment of the invention discloses a data object erasure code storage method, which specifically includes:
step S11: and acquiring a data object read-write request, and storing the data object read-write request to a local place.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local gateway; and sending the data object read-write request in the gateway to a local storage layer. Namely, the cluster comprises a plurality of different nodes, after the data object read-write request is acquired, gateway (namely Gateway) in the nodes forwards the data object read-write request to each node, and then when the nodes receive the data object read-write request, the data object read-write request is stored locally.
Step S12: and performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value.
In this embodiment, the storage layer is used to perform read-write operation on the data object based on the data object read-write request, so as to obtain the storage space capacity occupied by read-write; determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write, then judging whether the current residual storage space capacity is larger than a preset threshold value, and determining a dirty data object judging rule based on service requirements if the current residual storage space capacity is larger than the preset threshold value; and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
It can be understood that, the Cache layer (i.e. the storage layer) in the node is utilized to read and write the data object, the storage space capacity occupied by the read and write is determined, the total storage space capacity in the node and the storage space capacity occupied by the read and write are differenced to obtain the current residual storage space capacity, when the Cache layer capacity reaches a predetermined threshold value, the Cache management is utilized to perform the dirty data brushing operation in the Cache layer, i.e. the dirty data is screened out from the data object in the Cache layer according to the service requirement, and the dirty data is used as the target data object to be stored.
Step S13: judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object.
Step S14: and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
In this embodiment, whether the target data object to be stored in the node is a primary copy is determined, and if not, the process is ignored; and if the target data object is the master copy, performing dirty data downloading, namely performing erasure code encoding operation on the target data object to obtain each sub-target data object, and then sending each sub-target data object to other nodes except the target data object in the cluster.
The method is applied to any node in the distributed storage model shown in fig. 2, and for the traditional distributed storage system model adopting a multi-copy strategy, gateway (namely Gateway) is responsible for requesting forwarding among nodes, and a Cache layer (namely storage layer) is a space formed by high-performance media, has limited capacity and can provide higher performance; the Data layer (namely, the Data layer) is a space formed by a large-capacity medium, and has larger capacity but poorer performance; the object request is forwarded to different copy nodes through a Gateway module, when the copy request reaches a node A and a node C, the request is firstly processed by a Cache layer, and because the Cache layer space is limited, dirty Data objects need to be flushed down to a Data layer space after the Cache layer space reaches a preset threshold, namely the Data objects of the Cache layer are written into the Data layer, and then the Cache space is released, so that the purpose of recovering the Cache space is achieved; typically, in order to avoid the business IO being affected by the brushing-down process, the process is implemented by adopting asynchronous processing; typically, distributed storage systems employ multiple copy policies at both the Cache layer and the Data layer, resulting in low space utilization. However, the multi-copy strategy is used in the Cache layer, so that the reliability and the performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Referring to fig. 3, the embodiment of the invention discloses a data object erasure code storage method, which specifically includes:
step S21: and acquiring a data object read-write request, and storing the data object read-write request to a local place.
Step S22: and performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening out target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value.
Step S23: judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding and encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
Step S24: and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
Step S25: a message is generated that characterizes the success of the transmission and then sent in broadcast form to all nodes in the cluster.
In this embodiment, after the message for characterizing the success of the transmission is acquired, the deletion operation is performed on the target data object in the local storage layer.
In this embodiment, when the Data is to be written down, after the target Data object performs erasure coding through the EC module (i.e., the super module), the Data is distributed in the Data layer space of all the nodes in the cluster, after the Data is written down, the cluster message is broadcasted, and the Cache management of all the nodes is notified to delete the copies in the Cache space, so that the Cache space is released, and therefore, the erasure coding storage scheme of the Data object in the Data layer space is realized. As shown in fig. 4, a rule of a master copy is defined, that is, a master copy is defined in a multi-copy policy, a cluster includes a node a, a node B and a node C, for example, the node a obtains a data object read-write request, the data object read-write request is sent to the node C through a Gateway layer in the node a, then a Cache layer in the node a performs read-write operation on the data object to obtain a current remaining storage space capacity, then judges whether the current remaining storage space capacity is greater than a preset threshold value, if the current remaining storage space capacity is greater than the preset threshold value, then all local data objects are screened out to be stored, then an erasure code encoding operation is performed on the target data objects by using an EC module to obtain sub-target data objects, then the sub-target data objects are respectively sent to the node B and the node C, then the node a generates a message for representing that the transmission is successful, and then the message for representing the transmission is sent to all nodes in the cluster in a broadcast form, after receiving the message, the node a deletes the target data object in the Cache layer, if the current remaining storage space capacity is greater than the preset threshold value, and if the target data object in the Cache layer is also exists in the target data object Cache layer.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
Referring to fig. 5, an embodiment of the present invention discloses a data object erasure code storage apparatus, which may specifically include:
the request acquisition module 11 is configured to acquire a data object read-write request, and store the data object read-write request to a local area;
the target data object determining module 12 is configured to perform a read-write operation on a data object based on the data object read-write request, determine a local current remaining storage space capacity, determine whether the current remaining storage space capacity is greater than a preset threshold, and if the current remaining storage space capacity is greater than the preset threshold, screen a target data object to be stored from all local data objects;
the judging module 13 is configured to judge a type of the target data object, and if the type of the target data object is a primary copy, perform erasure code encoding operation on the target data object to obtain each sub-target data object;
and the data object storage module 14 is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
In this embodiment, a data object read-write request is obtained, and the data object read-write request is saved to a local area; performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value; judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object; and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster. The method and the device realize that a multi-copy strategy is adopted in a storage layer, the reliability and the performance of a distributed storage system are guaranteed, an asynchronous process is adopted to send a target data object to be stored to the data layer, the process cannot influence running service IO, an erasure code strategy is adopted in the data layer, and the space utilization rate of the distributed storage system is improved; meanwhile, the distributed storage system is stable and reliable through defining the main copy, and the distributed storage system has high storage utilization rate while being good in performance.
The method is applied to any node in the distributed storage model, and for the traditional distributed storage system model adopting a multi-copy strategy, gateway (namely Gateway) is responsible for requesting forwarding among nodes, and Cache layer (namely storage layer) is a space formed by high-performance media, has limited capacity and can provide higher performance; the Data layer (namely, the Data layer) is a space formed by a large-capacity medium, and has larger capacity but poorer performance; the object request is forwarded to different copy nodes through a Gateway module, when the copy request reaches a node A and a node C, the request is firstly processed by a Cache layer, and because the Cache layer space is limited, dirty Data objects need to be flushed down to a Data layer space after the Cache layer space reaches a preset threshold, namely the Data objects of the Cache layer are written into the Data layer, and then the Cache space is released, so that the purpose of recovering the Cache space is achieved; typically, in order to avoid the business IO being affected by the brushing-down process, the process is implemented by adopting asynchronous processing; typically, distributed storage systems employ multiple copy policies at both the Cache layer and the Data layer, resulting in low space utilization. However, the multi-copy strategy is used in the Cache layer, so that the reliability and the performance of the distributed storage system are ensured; for the space of the Data layer, the scrubbing of the Cache layer Data is an asynchronous process, and the process does not influence the running service IO, so that the Data object is stored by using erasure codes in the Data layer when the Data is scrubbed, the disk yield of the Data layer is improved, the disk yield of the distributed storage system is dependent on the Data layer, and obviously, the scheme can ensure high performance and improve the disk yield of the system.
In some specific embodiments, the request acquiring module 11 may specifically include:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local gateway;
and the request sending module is used for sending the data object read-write request in the gateway to a local storage layer.
In some specific embodiments, the target data object determining module 12 may specifically include:
the read-write module is used for performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and the current residual storage space capacity determining module is used for determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the reading and writing.
In some specific embodiments, the target data object determining module 12 may specifically include:
the judging rule determining module is used for determining a dirty data object judging rule based on the service requirement;
and the target data object determining module is used for screening all the data objects in the local storage layer according to the dirty data object judging rule so as to obtain the target data object to be stored.
In some specific embodiments, the determining module 13 may specifically include:
and the sub-target data object determining module is used for performing erasure code encoding operation on the target data object by utilizing a local super module and adopting an asynchronous processing mode so as to obtain each sub-target data object.
In some embodiments, the data object storage module 14 may specifically include:
the message generation module is used for generating a message for representing successful transmission;
and the message sending module is used for sending the message used for representing the successful sending to all nodes in the cluster in a broadcast mode.
In some embodiments, the data object storage module 14 may specifically include:
and the deleting module is used for deleting the target data object in the local storage layer after the message for representing the successful sending is acquired.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the data object erasure code storage method performed by the electronic device as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the data 223 in the memory 22 by the processor 21, which may be Windows, unix, linux or the like. The computer program 222 may further include a computer program capable of performing other specific tasks in addition to the computer program capable of performing the data object erasure code storage method performed by the electronic device 20 as disclosed in any of the foregoing embodiments. The data 223 may include, in addition to the data transmitted by the external device and received by the data object erasure code storage device, data collected by the self input/output interface 25, and so on.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the steps of the data object erasure code storage method disclosed in any embodiment are realized.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is provided in detail of a method, apparatus, device and storage medium for storing erasure codes of data objects, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (10)
1. The data object erasure code storage method is characterized by being applied to any node in a cluster and comprising the following steps of:
acquiring a data object read-write request, and storing the data object read-write request to a local place;
performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
judging the type of the target data object, and if the type of the target data object is a main copy, performing erasure coding operation on the target data object to obtain each sub-target data object;
and respectively sending and storing each target sub-data object to other nodes except the target sub-data object in the cluster.
2. The data object erasure code storage method according to claim 1, wherein the obtaining the data object read-write request and saving the data object read-write request to the local comprises:
acquiring a data object read-write request, and storing the data object read-write request to a local gateway;
and sending the data object read-write request in the gateway to a local storage layer.
3. The method for storing erasure codes for data objects according to claim 2, wherein the performing read-write operations on the data objects based on the data object read-write requests and determining the local current remaining storage space capacity includes:
performing read-write operation on the data object by utilizing the storage layer and based on the data object read-write request so as to obtain the storage space capacity occupied by read-write;
and determining the current residual storage space capacity based on the local total storage space capacity and the storage space capacity occupied by the read-write.
4. The method for storing erasure codes for data objects according to claim 3, wherein said screening out target data objects to be stored from all local data objects comprises:
determining a dirty data object judgment rule based on the service requirement;
and screening all the data objects in the local storage layer according to the dirty data object judging rule to obtain a target data object to be stored.
5. The method for storing erasure codes for data objects according to claim 1, wherein said performing erasure code encoding operation on said target data object to obtain each sub-target data object comprises:
and performing erasure code encoding operation on the target data object by using a local super module and adopting an asynchronous processing mode to obtain each sub-target data object.
6. The method for storing erasure codes for data objects according to any of claims 1 to 5, wherein after each of the target child data objects is sent to and stored in a node other than itself in the cluster, the method further comprises:
generating a message for representing successful transmission;
the message characterizing the success of the transmission is transmitted in broadcast form to all nodes in the cluster.
7. The data object erasure code storage method according to claim 6, further comprising:
and after the message for representing the successful sending is acquired, deleting the target data object in the local storage layer.
8. A data object erasure code storage apparatus, comprising:
the request acquisition module is used for acquiring a data object read-write request and storing the data object read-write request to a local;
the target data object determining module is used for performing read-write operation on the data objects based on the data object read-write request, determining the local current residual storage space capacity, judging whether the current residual storage space capacity is larger than a preset threshold value, and screening target data objects to be stored from all the local data objects if the current residual storage space capacity is larger than the preset threshold value;
the judging module is used for judging the type of the target data object, and if the type of the target data object is a main copy, erasure code encoding operation is carried out on the target data object so as to obtain each sub-target data object;
and the data object storage module is used for respectively sending and storing each target sub-data object to other nodes except the data object storage module.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data object erasure code storage method according to any of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the data object erasure code storage method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310201833.3A CN116257186A (en) | 2023-02-28 | 2023-02-28 | Data object erasure code storage method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310201833.3A CN116257186A (en) | 2023-02-28 | 2023-02-28 | Data object erasure code storage method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116257186A true CN116257186A (en) | 2023-06-13 |
Family
ID=86684087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310201833.3A Pending CN116257186A (en) | 2023-02-28 | 2023-02-28 | Data object erasure code storage method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116257186A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116954523A (en) * | 2023-09-20 | 2023-10-27 | 苏州元脑智能科技有限公司 | Storage system, data storage method, data reading method and storage medium |
CN117240873A (en) * | 2023-11-08 | 2023-12-15 | 阿里云计算有限公司 | Cloud storage system, data reading and writing method, device and storage medium |
-
2023
- 2023-02-28 CN CN202310201833.3A patent/CN116257186A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116954523A (en) * | 2023-09-20 | 2023-10-27 | 苏州元脑智能科技有限公司 | Storage system, data storage method, data reading method and storage medium |
CN116954523B (en) * | 2023-09-20 | 2024-01-26 | 苏州元脑智能科技有限公司 | Storage system, data storage method, data reading method and storage medium |
CN117240873A (en) * | 2023-11-08 | 2023-12-15 | 阿里云计算有限公司 | Cloud storage system, data reading and writing method, device and storage medium |
CN117240873B (en) * | 2023-11-08 | 2024-03-29 | 阿里云计算有限公司 | Cloud storage system, data reading and writing method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116257186A (en) | Data object erasure code storage method, device, equipment and medium | |
JP4696089B2 (en) | Distributed storage system | |
US20190196728A1 (en) | Distributed storage system-based data processing method and storage device | |
US9361034B2 (en) | Transferring storage resources between snapshot storage pools and volume storage pools in a distributed network | |
WO1991014230A1 (en) | Message communication processing system | |
AU2012398211A1 (en) | Caching method for distributed storage system, a lock server node, and a lock client node | |
CN112783445A (en) | Data storage method, device, system, electronic equipment and readable storage medium | |
CN110780819A (en) | Data read-write method of distributed storage system | |
US20200142634A1 (en) | Hybrid distributed storage system to dynamically modify storage overhead and improve access performance | |
WO2019062856A1 (en) | Data reconstruction method and apparatus, and data storage system | |
CN112130758B (en) | Data reading request processing method and system, electronic equipment and storage medium | |
US12131051B2 (en) | Migrating data between different medium layers of a storage system | |
CN109254958A (en) | Distributed data reading/writing method, equipment and system | |
CN111399760B (en) | NAS cluster metadata processing method and device, NAS gateway and medium | |
JP5475702B2 (en) | Mail storage backup system and backup method | |
CN117914675A (en) | Method and device for constructing distributed cache system | |
CN112104729A (en) | Storage system and caching method thereof | |
CN103685359B (en) | Data processing method and device | |
CN115981559A (en) | Distributed data storage method and device, electronic equipment and readable medium | |
CN103488768A (en) | File management method and file management system based on cloud computing | |
CN115706727A (en) | Cloud desktop data migration method, node and server | |
CN113301086A (en) | DNS data management system and management method | |
CN111488324A (en) | Distributed network file system based on message middleware and working method thereof | |
CN110968257A (en) | Method, apparatus and computer program product for storage management | |
US10938701B2 (en) | Efficient heartbeat with remote servers by NAS cluster nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |