CN115454344A - Data storage method and device, electronic equipment and storage medium - Google Patents

Data storage method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115454344A
CN115454344A CN202211129755.2A CN202211129755A CN115454344A CN 115454344 A CN115454344 A CN 115454344A CN 202211129755 A CN202211129755 A CN 202211129755A CN 115454344 A CN115454344 A CN 115454344A
Authority
CN
China
Prior art keywords
data
copy
sub
stored
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211129755.2A
Other languages
Chinese (zh)
Inventor
张廷全
纪志祥
吴瑞强
惠润海
卜庆忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Shuguang Storage Technology Co ltd
Original Assignee
Tianjin Zhongke Shuguang Storage Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Shuguang Storage Technology Co ltd filed Critical Tianjin Zhongke Shuguang Storage Technology Co ltd
Priority to CN202211129755.2A priority Critical patent/CN115454344A/en
Publication of CN115454344A publication Critical patent/CN115454344A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage method, which is applied to storage nodes of a distributed storage system and comprises the following steps: in response to the object creation instruction sent by the management node, creating a copy object and a copy sub-object; storing the data to be stored sent by the management node through the copy object; and determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to a cache module, and storing the sub-data to be stored in the non-target copy sub-object to a hard disk module. According to the technical scheme of the embodiment of the invention, not only is the copy storage of the data to be stored realized, namely the data backup is realized, but also the cache module of each storage node does not comprise repeated data, so that the waste of cache resources in a distributed storage system is avoided, and in addition, the management node can read complete stored data through the cache modules of a plurality of storage nodes, so that the reading efficiency of the stored data is improved.

Description

Data storage method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data storage, and in particular, to a data storage method and apparatus, an electronic device, and a storage medium.
Background
In order to ensure the availability of data, a distributed storage system usually adopts a copy mode to perform data redundancy storage, that is, the same data is stored in a plurality of copies to realize data backup.
In order to reduce the response time of the read-write interface, a cache is usually deployed on the storage nodes, and the data to be stored is stored in the hard disk through the cache.
However, in the above data storage manner, the same storage data is stored in the cache of each storage node, but actually one copy of the storage data can meet the read call requirement of the management node, and the occurrence of the redundant data occupies a large amount of storage space in the cache, which greatly causes the waste of cache resources.
Disclosure of Invention
The invention provides a data storage method and a data storage device, which aim to solve the problem that redundant data occupies more cache resources.
According to an aspect of the present invention, there is provided a data storage method applied to a storage node of a distributed storage system, including:
in response to the acquisition of an object creation instruction sent by the management node, creating a copy object according to the acquired copy number, and creating a copy sub-object in the copy object according to the acquired copy number;
the data to be stored sent by the management node is saved through the copy object, so that each copy sub-object in the copy object is matched with each sub-data to be stored in the data to be stored one by one;
and determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to a cache module, and storing the sub-data to be stored in the non-target copy sub-object to a hard disk module.
After determining a target copy sub-object in each copy sub-object according to the copy number, the method further includes: storing the data to be stored in the copy object to a cache module, and copying the data to be stored to a hard disk module through the cache module; and in the cache module, continuously storing the subdata to be stored corresponding to the target copy sub-object, and deleting the subdata to be stored corresponding to the non-target copy sub-object. The cache module of the storage node does not include repeated data, so that the cache resource waste in the distributed storage system is avoided, the asynchronous storage of the data is realized, and the data storage efficiency is improved.
After the to-be-stored data sent by the management node is saved by the replica object, so that each replica sub-object in the replica object is matched with each to-be-stored sub-data in the to-be-stored data one by one, the method further includes: setting weight for each copy sub-object according to the correction coefficient, the copy number and the copy sub-number of each copy sub-object; determining a target copy sub-object in each copy sub-object according to the copy number, including: determining a target copy sub-object in each copy sub-object according to the weight of each copy sub-object and a preset weight threshold; if the weight of the current copy sub-object is greater than or equal to a preset weight threshold, the current copy sub-object is a target copy sub-object; if the weight of the current copy sub-object is smaller than a preset weight threshold value, the current copy sub-object is a non-target copy sub-object; the preset weight threshold is related to the remaining storage capacity of the cache module. The storage node sets the weight for the copy sub-object in the current copy object, so that when the residual storage space in the cache module is large, a large amount of sub-data to be stored is stored in the cache module, the characteristic of high read-write speed of the cache module is fully utilized, and the rapid write-in and read-out of data are realized; when the residual storage space in the cache module is small, only a small amount of subdata to be stored is stored in the cache module so as to realize accurate storage of data, and the integrity of the current data to be stored in the cache module of the distributed storage system is still ensured while the waste of cache resources is reduced.
The data storage method further comprises the following steps: in response to the fact that the residual storage capacity of the cache module is smaller than or equal to a first storage threshold value, acquiring the weight corresponding to each stored data in the cache module; according to the sorting result of the weight of each stored data, sequentially deleting the target stored data with the lowest weight until the residual storage capacity of the cache module is greater than or equal to a second storage threshold value; wherein the second storage threshold is greater than the first storage threshold. Through the weight corresponding to each stored data, the importance comparison among sub-data in the same data is realized, and the importance comparison among sub-data in different data can also be realized.
According to another aspect of the present invention, there is provided a data storage method applied to a management node of a distributed storage system, including:
responding to the acquired data to be stored, respectively sending object creating instructions to a plurality of corresponding storage nodes, so that each storage node creates a copy object according to the acquired copy number, and creates a copy sub-object in the copy object according to the acquired copy number; wherein, different storage nodes correspond to different copy numbers;
carrying out data splitting on the data to be stored according to the number of the copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored;
and sending the data to be stored after data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the obtained copy number, stores the sub-data to be stored in the target copy sub-object to a cache module, and stores the sub-data to be stored in the non-target copy sub-object to a hard disk module.
The data storage method further comprises the following steps: in response to the acquisition of a data reading instruction, acquiring a plurality of target storage nodes matched with the data reading instruction and target copy objects in the target storage nodes; respectively acquiring target copy sub-objects matched with the copy number of the current target copy object through the target copy objects in each target storage node; respectively acquiring matched stored data from corresponding cache modules through target copy sub-objects of the target storage nodes; and performing data splicing on the stored data to obtain spliced target storage data, and responding to the data reading instruction based on the target storage data. The management node obtains complete storage data from the cache modules of the plurality of storage nodes, so that the reading efficiency of the stored data is improved, and meanwhile, the cache modules of the storage nodes do not comprise repeated data, so that the waste of cache resources in a distributed storage system is avoided.
According to another aspect of the present invention, there is provided a data storage apparatus applied to a storage node of a distributed storage system, including:
the copy object creating module is used for responding to an object creating instruction sent by the acquired management node, creating a copy object according to the acquired copy number, and creating a copy sub-object in the copy object according to the acquired copy number;
the matching storage execution module is used for storing the data to be stored sent by the management node through the copy object so as to enable each copy sub-object in the copy object to be matched with each sub-data to be stored in the data to be stored one by one;
and the data storage execution module is used for determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to the cache module, and storing the sub-data to be stored in the non-target copy sub-object to the hard disk module.
According to another aspect of the present invention, there is provided a data storage apparatus applied to a management node of a distributed storage system, including:
the object creation instruction sending module is used for responding to the acquired data to be stored, and respectively sending object creation instructions to the corresponding storage nodes so as to enable the storage nodes to create copy objects according to the acquired copy numbers and create copy sub-objects in the copy objects according to the acquired copy number; wherein, different storage nodes correspond to different copy numbers;
the data splitting execution module is used for carrying out data splitting on the data to be stored according to the number of the copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored;
and the data sending execution module is used for sending the data to be stored after the data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the acquired copy number, stores the sub-data to be stored in the target copy sub-object to the cache module, and stores the sub-data to be stored in the non-target copy sub-object to the hard disk module.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor to enable the at least one processor to execute the data storage method according to the first embodiment of the present invention or execute the data storage method according to the second embodiment of the present invention.
According to another aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for causing a processor to implement the data storage method according to the first embodiment of the present invention or the data storage method according to the second embodiment of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, after the storage node creates the copy object and the copy sub-objects, the sub-data to be stored of the data to be stored is stored one by one through each copy sub-object, the sub-data to be stored in the target copy sub-object is stored in the cache module, and the sub-data to be stored in the non-target copy sub-object is stored in the hard disk module, so that not only is the copy storage of the data to be stored realized, namely the data backup is realized, but also the cache module of each storage node does not comprise repeated data, the waste of cache resources in a distributed storage system is avoided, in addition, the management node can read the complete stored data through the cache modules of a plurality of storage nodes, and the reading efficiency of the stored data is improved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1A is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
fig. 1B is a flowchart of a data storage method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data storage method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a data storage operation performed by the distributed storage system according to a third embodiment of the present invention;
fig. 4 is a flowchart of a data read operation performed by the distributed storage system according to the fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data storage device according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data storage device according to a sixth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device implementing the data storage method according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1A is a schematic structural diagram of a distributed storage system in an embodiment of the present invention, and as shown in fig. 1A, the distributed storage system includes a management node (i.e., a head node) 100 and at least one storage node 200; since the management node 100 itself can also be used as a storage node 200, the minimum structure of the distributed storage system is composed of two electronic devices, one electronic device is used as the management node 100, and the other electronic device is used as the storage node 200.
When external data is written into the distributed storage system, the external data is firstly written into the management node 100, and the management node 100 allocates a plurality of storage nodes 200 for the currently written data to be used for executing copy storage; then sending the current write data to each allocated storage node 200; the storage node 200 is used for storing the external write data; when the external device reads data from the distributed storage system, the data read command is also written into the management node 100, and the management node 100 then fetches the stored data from the corresponding storage nodes 200 in response to the data read command.
The storage node comprises a cache module 201 and a hard disk module 202, wherein the cache module 201 is connected with the hard disk module 202; the cache module 201 may include a memory, and the Hard Disk module 202 may include a Hard Disk Drive (HDD); the cache module 201 has higher read-write efficiency than the mechanical hard disk 202, and the mechanical hard disk 202 has larger storage capacity than the cache module 201; particularly, since the Solid State Disk (SSD) has a better reading efficiency than the mechanical hard Disk, the Solid State Disk may be used as the cache module 201, that is, the cache module 201 may include the memory and/or the Solid State Disk, and the memory and the Solid State Disk are connected in series or in parallel.
Example one
Fig. 1B is a flowchart of a data storage method according to an embodiment of the present invention, where this embodiment is applicable to a storage node to manage data to be stored by creating a copy object and a copy sub-object, and the method may be executed by a data storage device in the fifth embodiment, where the data storage device may be implemented in a form of hardware and/or software, and the data storage device may be configured in an electronic device such as a server.
As shown in fig. 1B, the method includes:
s101, in response to the acquisition of an object creation instruction sent by a management node, creating a copy object according to the acquired copy number, and creating a copy child object in the copy object according to the acquired copy number.
The copy number is the number of data to be stored simultaneously when the distributed storage system stores data; in the embodiment of the invention, the management node can be set to be in a fixed copy mode, namely the copy number is fixed, and when the management node acquires each data to be stored, the same copy number is used for distributing storage nodes for the current data to be stored; for example, the management node is set to be in a three-copy mode, that is, the number of copies is three, and when the management node acquires each piece of data to be stored, the management node needs to store the data through three storage nodes, and each storage node stores one piece of complete data.
The management node can also be set to be in a non-fixed copy mode, namely the copy number is not fixed, and when the management node acquires each data to be stored, the management node allocates storage nodes for the current data to be stored according to different copy demands of different data to be stored; for example, the data a to be stored is stored in a two-copy mode, and the data B to be stored is stored in a three-copy mode; particularly, the number of copies of different data to be stored can be determined by the importance degree of the current data to be stored, and the importance degree and the number of copies are in positive correlation; the importance degree of the data to be stored may be determined according to the data type and/or the data source (i.e., the data sender) of the data to be stored.
The management node distributes the storage nodes of the number for the current data to be stored in a distribution mode such as a Hash algorithm and the like so as to ensure the balance of storage space among all the storage nodes in the distributed storage system; the copy number is the number of each copy for storing the data to be stored currently; the object creating instruction sent by the management node to each storage node comprises the number of the copies and the copy number of the copy to be created; taking the above technical solution as an example, the data B to be processed is stored in three copy modes, the copy numbers are 1, 2, and 3, and the storage node C, the storage node D, and the storage node E are allocated to the data B to be processed through the above hash algorithm to store the copies, so that the copy number (i.e. 3) and the copy number 1 are sent to the storage node C, the copy number (i.e. 3) and the copy number 2 are sent to the storage node D, and the copy number (i.e. 3) and the copy number 3 are sent to the storage node E.
After the storage node acquires the object creation instruction, creating a copy object according to the copy number; the copy object, namely an object for executing copy storage, is also a discrete unit for managing data in the storage node; in the embodiment of the invention, the distributed storage system stores the copy of the data to be stored in an object storage mode; the copy sub-objects are sub-units in the copy object, each copy sub-object is used for storing a part of the data to be stored corresponding to the current copy object, namely the sub-data to be stored, and the sub-data to be stored in all the copy sub-objects form complete data to be stored; in the embodiment of the invention, the number of the copy sub-objects is the same as the number of the copies, and the storage node creates the copy sub-objects with the number in the current copy object according to the obtained number of the copies.
For example, for the data B to be stored, after acquiring the number of copies (i.e., 3) and the copy number (i.e., copy number 1), the storage node C creates a copy object with copy number 1, and creates 3 copy sub-objects in the copy object, i.e., copy sub-object 1, copy sub-object 2, and copy sub-object 3.
S102, storing the data to be stored sent by the management node through the copy object, so that each copy sub-object in the copy object is matched with each sub-data to be stored in the data to be stored one by one.
After the management node acquires the data to be stored, the data to be stored is subjected to data splitting so as to split the data to be stored into a plurality of subdata to be stored; the number of the sub data to be stored is the same as the number of the copies; the management node can split the data to be stored in equal proportion according to the data size of the data to be stored, and can also divide continuous data with strong relevance into the same data to be stored according to the relevance among the data in the data to be stored; and the management node sends the complete data to be stored to each copy object.
After the storage node acquires the data to be stored sent by the management node, the sub-data to be stored and the copy sub-objects are sequentially stored in a one-to-one matching mode, and each copy sub-object is sequentially numbered; for example, before the storage node C acquires the data B to be stored, the storage node C has created the replica object 1, and thus when the data B to be stored is stored by the replica object 1, the data B1 to be stored is stored by the replica sub-object 1, the data B2 to be stored is stored by the replica sub-object 2, and the data B3 to be stored is stored by the replica sub-object 3; obviously, one replica object (i.e. replica object 1) includes complete data to be stored (i.e. data B to be stored), and each replica sub-object in one replica object carries different sub-data to be stored.
S103, determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to a cache module, and storing the sub-data to be stored in the non-target copy sub-object to a hard disk module.
After the storage node stores the data to be stored in the copy object, in each copy sub-object, the number of the copy sub-object (i.e., the copy sub-number) may be compared with the number of the copy object (i.e., the copy number), and the copy sub-object corresponding to the copy sub-number with the same number as the current copy number is used as a target copy sub-object; taking the above technical solution as an example, for the data B to be stored, the storage node C takes the copy sub-object 1 in the copy object 1 as a target copy sub-object, and the copy sub-object 1 stores the sub-data B1 to be stored; the storage node D takes the copy sub-object 2 in the copy object 2 as a target copy sub-object, and the copy sub-object 2 stores the sub-data B2 to be stored; the storage node E takes the copy sub-object 3 in the copy object 3 as a target copy sub-object, and the copy sub-object 3 stores the sub-data B3 to be stored therein.
After acquiring the target sub-data to be stored in the target copy sub-object, the storage node stores the target sub-data to be stored in the cache module, and directly stores the sub-data to be stored in other copy sub-objects (namely, non-target copy sub-objects) except the target copy sub-object in the hard disk module; taking the above technical solution as an example, for the data B to be stored, the storage node C stores the sub data B1 to be stored in the replica sub-object 1 of the replica object 1 to the cache module, and directly stores the sub data B2 to be stored in the replica sub-object 2 of the replica object 1 and the sub data B3 to be stored in the replica sub-object 3 of the replica object 1 to the hard disk module; particularly, the storage node may copy the target sub data to be stored in the cache module and store the copied data in the hard disk module by means of data back-flushing, so as to store the complete data B to be stored in the hard disk module.
Similarly, the storage node D stores the sub data B2 to be stored in the replica sub object 2 of the replica object 2 to the cache module, and stores the sub data B to be stored in the replica sub object 1 of the replica object 2 and the sub data B3 to be stored in the replica sub object 3 of the replica object 2 to the hard disk module; the storage node E stores the sub data B3 to be stored in the replica sub object 3 of the replica object 3 to the cache module, and stores the sub data B1 to be stored in the replica sub object 1 of the replica object 3 and the sub data B2 to be stored in the replica sub object 2 of the replica object 3 to the hard disk module.
Obviously, in the above technical solution, not only is the copy storage of the data to be stored realized, that is, the complete data to be stored is stored by the plurality of storage nodes, respectively, so that the data backup is realized; meanwhile, only a part of the data to be stored, namely subdata to be stored, is reserved in the cache module of each storage node, the subdata to be stored in the cache module of each storage node is different, and the subdata to be stored in the cache module of each storage node jointly form the complete data to be stored, so that the management node can read the complete stored data through the cache modules of the plurality of storage nodes, the reading efficiency of the stored data is improved, and meanwhile, the cache module of each storage node does not comprise repeated data, and the waste of cache resources in a distributed storage system is avoided.
Optionally, in this embodiment of the present invention, after determining a target copy sub-object in each copy sub-object according to the copy number, the method further includes: storing the data to be stored in the copy object to a cache module, and copying the data to be stored to a hard disk module through the cache module; and in the cache module, continuously storing the subdata to be stored corresponding to the target copy sub-object, and deleting the subdata to be stored corresponding to the non-target copy sub-object.
Specifically, the storage node may also store all the data to be stored in the copy object into the cache module, and then copy and store the data to be stored into the hard disk module by using a data back-flushing mode and the like through the cache module, and in the cache module, only the target sub-data to be stored in the target copy sub-object is retained, and the sub-data to be stored in the non-target copy sub-object is deleted; the cache module of the storage node does not include repeated data, so that the cache resource waste in the distributed storage system is avoided, the asynchronous storage of the data is realized, and the data storage efficiency is improved.
Optionally, in this embodiment of the present invention, after the copy object stores the data to be stored, which is sent by the management node, so that each copy sub-object in the copy object is matched with each sub-data to be stored in the data to be stored one by one, the method further includes: setting weight for each copy sub-object according to the correction coefficient, the copy number and the copy sub-number of each copy sub-object; determining a target copy sub-object in each copy sub-object according to the copy number, including: determining a target copy sub-object in each copy sub-object according to the weight of each copy sub-object and a preset weight threshold; if the weight of the current copy sub-object is greater than or equal to a preset weight threshold, the current copy sub-object is a target copy sub-object; if the weight of the current copy sub-object is smaller than a preset weight threshold value, the current copy sub-object is a non-target copy sub-object; the preset weight threshold is related to the remaining storage capacity of the cache module.
Specifically, the storage node may set the weight of each replica sub-object in the current replica object according to the following formula;
Figure BDA0003849685240000141
wherein i is the number of the copy sub-object, i.e. the copy sub-number; x is the number of the copy object, i.e. the copy number; m i Is the weight of the ith replica child object; δ is a correction coefficient used for adjusting the proportion or difference of the weights between the copy sub-objects in the same copy object, and may be set to any positive value as required, for example, δ may be set to 1; n is the number of copies corresponding to the current data to be stored; obviously, when each data to be stored corresponds to the same correction coefficient and the number of copies, the weight of each copy sub-object is only related to the absolute value of the difference between the copy number (i.e., the number of the copy object) and the copy sub-number (i.e., the number of the copy sub-object). Taking the above technical solution as an example, the correction coefficient of the data B to be stored is 1, the number of copies is 3, the storage node E corresponds to the copy object 3, and in the copy object 3, the weight of the copy sub-object 1 is 17 (integral digits are reserved as a result); the weight of the duplicate sub-object 2 is 33; the weight of the copy sub-object 3 is 100.
The preset weight threshold value is a preset screening numerical value and is used for screening the weight of each copy sub-object; when the preset weight threshold is set to be a smaller value, the number of the target copy sub-objects may be multiple, that is, the sub-data to be stored in the multiple copy sub-objects are simultaneously stored in the cache module, so as to fully utilize the characteristic that the cache module reads and writes data faster; specifically, when δ is set to 1 and the preset weight threshold is set to 100, only one target copy sub-object exists in each copy object; the preset weight threshold value can be determined according to the current residual storage capacity of the cache module, the residual storage capacity and the preset weight threshold value are in a negative correlation relationship, that is, when the residual storage capacity of the cache module is large, the preset weight threshold value is set to be a small value, so that a large number of sub-data to be stored are stored in the cache module as much as possible, that is, a large amount of data is written into the cache module; when the remaining storage capacity of the cache module is smaller, the preset weight threshold is set to a larger value, so as to reduce the number of the subdata to be stored in the cache module, that is, write a smaller amount of data into the cache module.
The storage node sets the weight for the copy sub-object in the current copy object, so that when the residual storage space in the cache module is large, a large amount of sub-data to be stored is stored in the cache module, the characteristic of high read-write speed of the cache module is fully utilized, and the rapid write-in and read-out of data are realized; when the residual storage space in the cache module is small, only a small amount of subdata to be stored is stored in the cache module so as to realize accurate storage of data, and the integrity of the current data to be stored in the cache module of the distributed storage system is still ensured while the waste of cache resources is reduced. Specifically, the weight of each replica sub-object in the replica object may be calculated and obtained by the management node, and each replica weight in each replica object obtained by calculation may be sent to the corresponding storage node.
Optionally, in this embodiment of the present invention, the data storage method further includes: in response to the fact that the residual storage capacity of the cache module is smaller than or equal to a first storage threshold value, acquiring the weight corresponding to each stored data in the cache module; according to the sorting result of the weight of each stored data, sequentially deleting the target stored data with the lowest weight until the residual storage capacity of the cache module is greater than or equal to a second storage threshold value; wherein the second storage threshold is greater than the first storage threshold.
Specifically, as the storage node continuously executes the storage operation of each data, the remaining storage capacity in the cache module is also continuously reduced, so that the cache module is ensured to provide sufficient data storage space for the subsequent data to be stored by sequencing and deleting each stored data; when the remaining storage capacity in the Cache module is small, that is, smaller than or equal to the First storage threshold, the storage node may directly perform screening based on the weight corresponding to each stored data in the Cache module, or may perform sorting and deletion based on existing Cache elimination algorithms, for example, LFU (Least Frequently Used, least Recently Used), LRU (Least Recently Used, LRU), ARC (Adaptive Cache, adaptive Cache Replacement algorithm), FIFO (First in First out, first in First out algorithm), 2Q (Two queues ), and the like, after calculating the non-elimination probability of each stored data, the storage node performs multiplication with the corresponding weight, and then performs sorting and deletion according to the multiplication result.
Through the weight corresponding to each stored data, the importance comparison among sub-data in the same data is realized, and the importance comparison among sub-data in different data can also be realized.
According to the technical scheme of the embodiment of the invention, after the storage node creates the copy object and the copy sub-objects, the sub-data to be stored of the data to be stored is stored one by one through each copy sub-object, the sub-data to be stored in the target copy sub-object is stored in the cache module, and the sub-data to be stored in the non-target copy sub-object is stored in the hard disk module, so that not only is the copy storage of the data to be stored realized, namely the data backup is realized, but also the cache module of each storage node does not comprise repeated data, the waste of cache resources in a distributed storage system is avoided, in addition, the management node can read the complete stored data through the cache modules of a plurality of storage nodes, and the reading efficiency of the stored data is improved.
Example two
Fig. 2 is a flowchart of a data storage method according to a second embodiment of the present invention, where this embodiment is applicable to a management node splitting data to be stored, so that each storage node performs data storage based on the split data to be stored, and this method may be executed by a data storage device according to a sixth embodiment, where the data storage device may be implemented in a form of hardware and/or software, and the data storage device may be configured in an electronic device such as a server. As shown in fig. 2, the method includes:
s201, responding to the acquired data to be stored, respectively sending object creating instructions to a plurality of corresponding storage nodes, so that each storage node creates a copy object according to the acquired copy number, and creates a copy sub-object in the copy object according to the acquired copy number; wherein different storage nodes correspond to different copy numbers.
S202, performing data splitting on the data to be stored according to the number of the copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored.
The management node can set a splitting mark at a designated position of the data to be stored in a manner of setting the splitting mark, so as to mark how the data to be stored is divided into a plurality of subdata to be stored through the splitting mark.
And S203, sending the data to be stored after the data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the acquired copy number, stores the sub-data to be stored in the target copy sub-object to a cache module, and stores the sub-data to be stored in the non-target copy sub-object to a hard disk module.
Optionally, in this embodiment of the present invention, the data storage method further includes: in response to the data reading instruction, acquiring a plurality of target storage nodes matched with the data reading instruction and target copy objects in the target storage nodes; respectively acquiring target copy sub-objects matched with the copy number of the current target copy object through the target copy objects in each target storage node; respectively acquiring matched stored data from corresponding cache modules through target copy sub-objects of the target storage nodes; and performing data splicing on the stored data to obtain spliced target stored data, and responding to the data reading instruction based on the target stored data.
Specifically, for example, in the above technical solution, when the management node determines that the data read data is data B according to the data read instruction, it determines that the data B is stored in the copy object 1 of the storage node C, the copy object 2 of the storage node E, and the copy object 2 of the storage node F; then according to the copy number, determining a copy sub-object 1 in the copy object 1, determining a copy sub-object 2 in the copy object 2, and determining a copy sub-object 3 in the copy object 3; then the stored data B1 is obtained in the storage node C through the copy sub-object 1, the stored data B2 is obtained in the storage node D through the copy sub-object 2, and the stored data B3 is obtained in the storage node E through the copy sub-object 3; finally, after data splicing is carried out on the stored data B1, the stored data B2 and the stored data B3, complete data B is obtained, and the data reading instruction is responded according to the data B; the management node obtains complete storage data from the cache modules of the plurality of storage nodes, so that the reading efficiency of the stored data is improved, and meanwhile, the cache modules of the storage nodes do not comprise repeated data, so that the waste of cache resources in a distributed storage system is avoided.
According to the technical scheme, after the management node respectively sends the object creating instruction to the storage nodes to enable the storage nodes to create the copy object and the copy sub-object, the data to be stored is divided into the plurality of subdata to be stored, so that the storage nodes store the subdata to be stored in the target copy sub-object into the cache module and store the subdata to be stored in the non-target copy sub-object into the hard disk module, not only is copy storage of the data to be stored realized, namely data backup is realized, but also the cache module of each storage node does not comprise repeated data, waste of cache resources in the distributed storage system is avoided, in addition, the management node can read complete stored data through the cache modules of the plurality of storage nodes, and the reading efficiency of the stored data is improved.
EXAMPLE III
Fig. 3 is a flowchart of a distributed storage system according to an embodiment of the present invention when performing a data storage operation (i.e., acquiring data to be stored), as shown in fig. 3:
the management node responds to the acquired data to be stored, and distributes a plurality of corresponding storage nodes according to the number of the copies; the management node sends an object creating instruction to the plurality of storage nodes, wherein the object creating instruction comprises a copy number and a copy number; after the storage node acquires an object creating instruction sent by the management node, creating a copy object according to the acquired copy number, and creating a copy sub-object in the copy object according to the acquired copy number; and the storage node feeds back the object creation completion to the management node.
The management node acquires that the object of the storage node is created, and then performs data splitting on the data to be stored according to the number of copies, wherein the data to be stored after the data splitting comprises a plurality of sub data to be stored; the management node sends the data to be stored after the data are split to each storage node; and the storage node stores the data to be stored sent by the management node through the copy object, so that each copy sub-object in the copy object is matched with each sub-data to be stored in the data to be stored one by one.
The storage node determines a target copy sub-object in each copy sub-object according to the copy number; and the storage node stores the subdata to be stored in the target copy sub-object to the cache module and stores the subdata to be stored in the non-target copy sub-object to the hard disk module.
According to the technical scheme of the embodiment of the invention, not only is the copy storage of the data to be stored realized, namely, the data backup is realized, but also the cache module of each storage node does not comprise repeated data, so that the waste of cache resources in a distributed storage system is avoided, and in addition, the management node can read complete stored data through the cache modules of a plurality of storage nodes, so that the reading efficiency of the stored data is improved.
Example four
Fig. 4 is a flowchart of a distributed storage system according to an embodiment of the present invention when a data read operation is performed (i.e. a data read instruction is obtained), as shown in fig. 4:
the method comprises the steps that a management node obtains a data reading instruction sent by external equipment, and determines a target storage node corresponding to data to be read and a target copy object in each target storage node according to the data reading instruction; the management node respectively acquires target copy sub-objects from each target copy object according to the copy number of each target copy object; and the management node sends a data reading instruction to the target copy sub-object of each target storage node.
After the storage node acquires the data reading instruction, determining the storage position of the stored data through the corresponding target copy sub-object; and if the storage node determines that the current stored data is located in the cache module, the stored data is obtained through the cache module, and if the storage node determines that the current stored data is not located in the cache module, the stored data is obtained through the hard disk module.
After the management node acquires the stored data fed back by each target storage node, data splicing is carried out on each stored data to acquire spliced target storage data; the management node responds to the data read instruction based on the target storage data.
According to the technical scheme of the embodiment of the invention, the management node acquires complete stored data from the cache modules of the plurality of storage nodes, so that the reading efficiency of the stored data is improved, and meanwhile, the cache modules of the storage nodes do not contain repeated data, so that the waste of cache resources in a distributed storage system is avoided.
EXAMPLE five
Fig. 5 is a block diagram of a data storage device according to a fifth embodiment of the present invention, where the data storage device specifically includes:
a copy object creating module 501, configured to create, in response to an object creating instruction sent by an acquired management node, a copy object according to an acquired copy number, and create a copy child object in the copy object according to the acquired copy number;
a matching storage executing module 502, configured to store, by using the replica object, to-be-stored data sent by the management node, so that each replica sub-object in the replica object is matched with each sub-data to be stored in the to-be-stored data one by one;
a data storage executing module 503, configured to determine a target copy sub-object in each copy sub-object according to the copy number, store sub-data to be stored in the target copy sub-object in a cache module, and store sub-data to be stored in a non-target copy sub-object in a hard disk module.
According to the technical scheme of the embodiment of the invention, after the storage node creates the copy object and the copy sub-objects, the sub-data to be stored of the data to be stored is stored one by one through each copy sub-object, the sub-data to be stored in the target copy sub-object is stored in the cache module, and the sub-data to be stored in the non-target copy sub-object is stored in the hard disk module, so that not only is the copy storage of the data to be stored realized, namely the data backup is realized, but also the cache module of each storage node does not comprise repeated data, the waste of cache resources in a distributed storage system is avoided, in addition, the management node can read the complete stored data through the cache modules of a plurality of storage nodes, and the reading efficiency of the stored data is improved.
Optionally, the data storage executing module 503 is further configured to store the data to be stored in the copy object to a cache module, and copy the data to be stored to a hard disk module through the cache module; and continuously storing the subdata to be stored corresponding to the target copy sub-object in the cache module, and deleting the subdata to be stored corresponding to the non-target copy sub-object.
Optionally, the data storage device further includes:
the weight setting module is used for setting weight for each copy sub-object according to a correction coefficient, the copy number and the copy sub-number of each copy sub-object;
a data storage executing module 503, specifically configured to determine a target copy sub-object in each copy sub-object according to the weight of each copy sub-object and a preset weight threshold; if the weight of the current copy sub-object is greater than or equal to a preset weight threshold, the current copy sub-object is a target copy sub-object; if the weight of the current copy sub-object is smaller than a preset weight threshold value, the current copy sub-object is a non-target copy sub-object; the preset weight threshold is related to the remaining storage capacity of the cache module.
Optionally, the data storage device further includes:
the weight acquisition module is used for acquiring the weight corresponding to each stored data in the cache module in response to the fact that the detected residual storage capacity of the cache module is smaller than or equal to a first storage threshold value;
the sequencing execution module is used for sequentially deleting the target stored data with the lowest weight according to the sequencing result of the weight of each stored data until the residual storage capacity of the cache module is greater than or equal to a second storage threshold; wherein the second storage threshold is greater than the first storage threshold.
The device can execute the data storage method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For details of the data storage method provided in the first embodiment of the present invention, reference may be made to the following description.
Example six
Fig. 6 is a block diagram of a data storage device according to a sixth embodiment of the present invention, where the data storage device includes:
an object creation instruction sending module 601, configured to respond to acquisition of data to be stored, and respectively send object creation instructions to corresponding multiple storage nodes, so that each storage node creates a copy object according to an acquired copy number, and creates a copy child object in the copy object according to an acquired copy number; wherein, different storage nodes correspond to different copy numbers;
a data splitting executing module 602, configured to perform data splitting on the data to be stored according to the number of copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored;
the data sending execution module 603 is configured to send the data to be stored after data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the obtained copy number, stores the sub-data to be stored in the target copy sub-object in the cache module, and stores the sub-data to be stored in the non-target copy sub-object in the hard disk module.
According to the technical scheme, after the management node respectively sends the object creating instruction to the storage nodes to enable the storage nodes to create the copy object and the copy sub-object, the data to be stored is divided into the plurality of subdata to be stored, so that the storage nodes store the subdata to be stored in the target copy sub-object into the cache module and store the subdata to be stored in the non-target copy sub-object into the hard disk module, not only is copy storage of the data to be stored realized, namely data backup is realized, but also the cache module of each storage node does not comprise repeated data, waste of cache resources in the distributed storage system is avoided, in addition, the management node can read complete stored data through the cache modules of the plurality of storage nodes, and the reading efficiency of the stored data is improved.
Optionally, the data storage device further includes:
the target copy object acquisition module is used for responding to the acquired data reading instruction and acquiring a plurality of target storage nodes matched with the data reading instruction and a target copy object in each target storage node;
a target copy sub-object obtaining module, configured to obtain, through a target copy object in each target storage node, a target copy sub-object that matches a copy number of a current target copy object;
the stored data acquisition module is used for acquiring matched stored data from the corresponding cache module through the target copy sub-object of each target storage node;
and the data splicing module is used for performing data splicing on the stored data to acquire spliced target stored data and responding to the data reading instruction based on the target stored data.
The device can execute the data storage method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For details of the data storage method provided in the second embodiment of the present invention, reference may be made to the following description.
EXAMPLE seven
FIG. 7 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a data storage method.
In some embodiments, the data storage method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the heterogeneous hardware accelerators via ROM and/or the communication unit. When the computer program is loaded into RAM and executed by a processor, it may perform one or more of the steps of the data storage method described above. Alternatively, in other embodiments, the processor may be configured to perform the data storage method by any other suitable means (e.g., by way of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described herein may be implemented on a heterogeneous hardware accelerator having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the heterogeneous hardware accelerators. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A data storage method is applied to a storage node of a distributed storage system and comprises the following steps:
in response to the acquisition of an object creation instruction sent by a management node, creating a copy object according to the acquired copy number, and creating a copy sub-object in the copy object according to the acquired copy number;
the data to be stored sent by the management node is saved through the copy object, so that each copy sub-object in the copy object is matched with each sub-data to be stored in the data to be stored one by one;
and determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to a cache module, and storing the sub-data to be stored in the non-target copy sub-object to a hard disk module.
2. The method according to claim 1, further comprising, after determining a target copy sub-object among the copy sub-objects according to the copy number:
storing the data to be stored in the copy object to a cache module, and copying the data to be stored to a hard disk module through the cache module;
and in the cache module, continuously storing the subdata to be stored corresponding to the target copy sub-object, and deleting the subdata to be stored corresponding to the non-target copy sub-object.
3. The method according to claim 1 or 2, wherein after the copy object stores the data to be stored sent by the management node, so that each copy sub-object in the copy object matches each sub-data to be stored in the data to be stored one by one, the method further comprises:
setting weight for each copy sub-object according to a correction coefficient, the copy number and the copy sub-number of each copy sub-object;
determining a target copy sub-object in each copy sub-object according to the copy number, including:
determining a target copy sub-object in each copy sub-object according to the weight of each copy sub-object and a preset weight threshold;
if the weight of the current copy sub-object is greater than or equal to a preset weight threshold, the current copy sub-object is a target copy sub-object; if the weight of the current copy sub-object is smaller than a preset weight threshold value, the current copy sub-object is a non-target copy sub-object; the preset weight threshold is related to the remaining storage capacity of the cache module.
4. The method of claim 3, wherein the data storage method further comprises:
in response to the fact that the residual storage capacity of the cache module is smaller than or equal to a first storage threshold value, acquiring the weight corresponding to each stored data in the cache module;
according to the sorting result of the weight of each stored data, deleting the target stored data with the lowest weight in sequence until the residual storage capacity of the cache module is greater than or equal to a second storage threshold value; wherein the second storage threshold is greater than the first storage threshold.
5. A data storage method is characterized in that the method is applied to a management node of a distributed storage system and comprises the following steps:
responding to the acquired data to be stored, respectively sending object creating instructions to a plurality of corresponding storage nodes, so that each storage node creates a copy object according to the acquired copy number, and creates a copy sub-object in the copy object according to the acquired copy number; wherein, different storage nodes correspond to different copy numbers;
carrying out data splitting on the data to be stored according to the number of the copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored;
and sending the data to be stored after data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the obtained copy number, stores the sub-data to be stored in the target copy sub-object to a cache module, and stores the sub-data to be stored in the non-target copy sub-object to a hard disk module.
6. The method of claim 5, wherein the data storage method further comprises:
in response to the data reading instruction, acquiring a plurality of target storage nodes matched with the data reading instruction and target copy objects in the target storage nodes;
respectively acquiring target copy sub-objects matched with the copy number of the current target copy object through the target copy objects in each target storage node;
respectively acquiring matched stored data from corresponding cache modules through target copy sub-objects of the target storage nodes;
and performing data splicing on the stored data to obtain spliced target storage data, and responding to the data reading instruction based on the target storage data.
7. A data storage device is characterized in that the data storage device is applied to a storage node of a distributed storage system and comprises the following components:
the copy object creating module is used for creating a copy object according to the acquired copy number in response to acquiring an object creating instruction sent by the management node, and creating a copy sub-object in the copy object according to the acquired copy number;
the matching storage execution module is used for storing the data to be stored sent by the management node through the copy object so as to enable each copy sub-object in the copy object to be matched with each sub-data to be stored in the data to be stored one by one;
and the data storage execution module is used for determining a target copy sub-object in each copy sub-object according to the copy number, storing the sub-data to be stored in the target copy sub-object to the cache module, and storing the sub-data to be stored in the non-target copy sub-object to the hard disk module.
8. A data storage apparatus, applied to a management node of a distributed storage system, comprising:
the object creation instruction sending module is used for responding to the acquired data to be stored, and respectively sending object creation instructions to the corresponding storage nodes so as to enable the storage nodes to create copy objects according to the acquired copy numbers and create copy sub-objects in the copy objects according to the acquired copy number; wherein, different storage nodes correspond to different copy numbers;
the data splitting execution module is used for carrying out data splitting on the data to be stored according to the number of the copies; the data to be stored after the data splitting comprises a plurality of subdata to be stored;
and the data sending execution module is used for sending the data to be stored after the data splitting to each storage node, so that each storage node determines a target copy sub-object in each copy sub-object according to the acquired copy number, stores the sub-data to be stored in the target copy sub-object to the cache module, and stores the sub-data to be stored in the non-target copy sub-object to the hard disk module.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data storage method of any one of claims 1-4 or to perform the data storage method of claim 5 or 6.
10. A computer-readable storage medium, characterized in that it stores computer instructions for causing a processor, when executed, to implement the data storage method of any one of claims 1-4, or to implement the data storage method of claim 5 or 6.
CN202211129755.2A 2022-09-16 2022-09-16 Data storage method and device, electronic equipment and storage medium Pending CN115454344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211129755.2A CN115454344A (en) 2022-09-16 2022-09-16 Data storage method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211129755.2A CN115454344A (en) 2022-09-16 2022-09-16 Data storage method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115454344A true CN115454344A (en) 2022-12-09

Family

ID=84305621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211129755.2A Pending CN115454344A (en) 2022-09-16 2022-09-16 Data storage method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115454344A (en)

Similar Documents

Publication Publication Date Title
CN114861911B (en) Deep learning model training method, device, system, equipment and medium
CN113656501B (en) Data reading method, device, equipment and storage medium
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN115631273A (en) Big data duplicate removal method, device, equipment and medium
CN116540938A (en) Data reading method, device, distributed storage system, equipment and storage medium
CN115099175B (en) Method and device for acquiring time sequence netlist, electronic equipment and storage medium
CN112783417A (en) Data reduction method and device, computing equipment and storage medium
CN115438007A (en) File merging method and device, electronic equipment and medium
CN115454344A (en) Data storage method and device, electronic equipment and storage medium
CN113868254B (en) Method, device and storage medium for removing duplication of entity node in graph database
CN113641688B (en) Node updating method, related device and computer program product
CN115617549A (en) Thread decoupling method and device, electronic equipment and storage medium
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN114722048A (en) Data processing method and device, electronic equipment and storage medium
CN114564149A (en) Data storage method, device, equipment and storage medium
CN113961641A (en) Database synchronization method, device, equipment and storage medium
CN113553216A (en) Data recovery method and device, electronic equipment and storage medium
US12007965B2 (en) Method, device and storage medium for deduplicating entity nodes in graph database
EP4131017A2 (en) Distributed data storage
CN113326890B (en) Labeling data processing method, related device and computer program product
CN113220230B (en) Data export method and device, electronic equipment and storage medium
CN115186032A (en) Database expansion method and device, electronic equipment and storage medium
CN114416687A (en) Time layering merging method, device, equipment and medium for time sequence data
CN115617811A (en) Data processing method and device, electronic equipment and storage medium
CN117082046A (en) Data uploading method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination