CN114756620A - Data storage method, distributed storage system and storage medium - Google Patents

Data storage method, distributed storage system and storage medium Download PDF

Info

Publication number
CN114756620A
CN114756620A CN202011560049.4A CN202011560049A CN114756620A CN 114756620 A CN114756620 A CN 114756620A CN 202011560049 A CN202011560049 A CN 202011560049A CN 114756620 A CN114756620 A CN 114756620A
Authority
CN
China
Prior art keywords
storage
data
group
stored
control application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011560049.4A
Other languages
Chinese (zh)
Inventor
周炜
霍道安
雍帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202011560049.4A priority Critical patent/CN114756620A/en
Publication of CN114756620A publication Critical patent/CN114756620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data storage method, a distributed storage system and a storage medium, wherein data to be stored are received; determining a target storage group for storing the data to be stored according to the storage node identification for receiving the data to be stored, wherein the main storage control application associated with the target storage group is deployed in the storage node corresponding to the storage node identification; sending a storage request of data to be stored to a main storage control application in a target storage group; according to the method and the device, the technical means of randomly selecting the storage group is modified, the condition that the storage node receiving the data to be stored is inconsistent with the storage node where the main storage control application in the selected storage group is located is avoided, the process of forwarding the storage request of the data to be stored to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced.

Description

Data storage method, distributed storage system and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data storage and retrieval, in particular to a data storage method, a distributed storage system and a storage medium.
Background
In a distributed storage application scenario, a user usually accesses a distributed storage system through a protocol gateway on a storage node, and data to be stored sent by the user is sent to the protocol gateway first, and then the data to be stored is requested to the distributed storage system through the protocol gateway. Current distributed Storage systems include a plurality of Storage groups, each of which is associated with a plurality of Storage control applications (e.g., the distributed Storage system includes a plurality of destination groups (PGs), each of which is associated with a plurality of Object Storage Drives (OSDs)), wherein, for each Storage group, one of the Storage groups is a primary Storage control application and the others are secondary Storage control applications.
In the distributed storage system, data to be stored is randomly distributed to a storage group, then the data to be stored is firstly sent to a main storage control application of the storage group, then the main storage control application executes storage operation, and the data to be stored is sent to other auxiliary storage control applications, so that the auxiliary storage control application executes storage operation of a copy.
Therefore, since the storage group is randomly selected, when the protocol gateway on a certain storage node in the distributed storage system receives the data to be stored, the data to be stored may be sent to the rest of the storage nodes (the storage nodes have the main storage control application deployed therein), thereby introducing additional network traffic, limiting the overall storage rate, and increasing the access delay.
Disclosure of Invention
The invention provides a data storage method, a distributed storage system and a storage medium, which can save network flow during data storage, improve storage rate and reduce access delay.
The technical scheme of the invention is realized as follows:
the embodiment of the invention provides a data storage method, which is applied to a distributed storage system, wherein the distributed storage system comprises a plurality of storage nodes, and the method comprises the following steps:
receiving data to be stored, and dividing the data to be stored into a plurality of data objects;
determining a target storage group for storing each data object in the data to be stored according to the storage node identifier of the received data to be stored, wherein a main storage control application associated with the target storage group is deployed in a storage node corresponding to the storage node identifier; the distributed storage system comprises a plurality of storage groups, and each storage group is associated with a plurality of storage control applications;
sending a storage request to store the data object to a primary storage control application in the target storage group.
In the above solution, the distributed storage system includes a plurality of data storage pools, each data storage pool corresponds to each storage node one to one, each data storage pool includes a plurality of storage groups, and a primary storage control application associated with each storage group is deployed on a storage node corresponding to a data storage pool to which the storage group belongs; for each data storage pool, the storage control applications in the distributed storage system are uniformly distributed in the storage groups of the data storage pool;
correspondingly, the determining a target storage group for storing the data to be stored according to the storage node identifier receiving the data to be stored includes:
determining a data storage pool identifier corresponding to a storage node identifier according to the storage node identifier for receiving data to be stored;
randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identifier;
in the above solution, before determining, according to the storage node identifier of the received data to be stored, the data storage pool identifier corresponding to the storage node identifier, the method further includes:
constructing a plurality of data storage pools, wherein each data storage pool corresponds to each storage node one by one;
for each data storage pool, uniformly distributing the storage control applications in the distributed storage system in the storage groups of the data storage pool to obtain the available storage control applications associated with each storage group in the data storage pool;
for each storage group, determining a primary storage control application associated with the storage group based on the respective available storage control applications associated with the storage group, so that the primary storage control application is deployed on the storage node corresponding to the data storage pool to which the storage group belongs.
In the above solution, for each storage group, determining the primary storage control application associated with the storage group based on the respective available storage control applications associated with the storage group includes:
if the storage node corresponding to the data storage pool to which each storage group belongs has an available storage control application, determining the available storage control application as a main storage control application of the storage group;
and if no available storage control application exists on the storage node corresponding to the data storage pool to which each storage group belongs, determining one of the obtained available storage control applications included in the storage group as a main storage control application.
In the foregoing solution, after the sending the storage request for storing the data to be stored to the primary storage control application in the target storage group, the method further includes:
the main storage control application stores the data to be stored to a storage unit corresponding to the main storage control application;
and the main storage control application sends the data to be stored to the rest storage control applications in the target storage group to instruct the rest storage control applications to execute storage copy operation.
In the above scheme, after the receiving the data to be stored, the method further includes:
dividing the data to be stored into a plurality of data objects with consistent data size to be stored;
correspondingly, the randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identifier includes:
for each data object to be stored, randomly determining a target storage group for storing the data object in each storage group corresponding to the data storage pool identification based on the data object identification;
correspondingly, the sending the storage request for storing the data to be stored to the primary storage control application in the target storage group includes:
for each data object, a storage request to store the data object is sent to the primary storage control application associated with the target storage group corresponding to the data object.
In the foregoing solution, for each data object to be stored, based on the data object identifier, randomly determining a target storage group in which the data object is stored in each storage group corresponding to the data storage pool identifier, includes:
for each data object to be stored, acquiring an object identifier of the data object;
based on a preset random algorithm, calculating the object identification of the data object to obtain a random value corresponding to the data object;
determining a random group to which the random value belongs based on the random value;
and selecting the storage group corresponding to the random group as the target storage group from the storage groups corresponding to the data storage pool identification based on the random group to which the data storage pool belongs.
In the above solution, after the step of determining, according to the storage node identifier receiving the data to be stored, the data storage pool identifier corresponding to the storage node identifier, the solution further includes:
acquiring data identification information of the data to be stored;
storing a first mapping relationship between the data identification information and the identification of the data storage pool;
the data storage method further comprises:
and storing a second mapping relation between the data identification information and the identification of each data object.
In the above scheme, the method further comprises:
receiving a retrieval request, wherein the retrieval request carries data identification information of data to be retrieved;
determining a data storage pool corresponding to the data to be retrieved in the first mapping relation based on the data identification information of the data to be retrieved;
determining the identification of each data object corresponding to the data to be retrieved in the second mapping relation based on the data identification information of the data to be retrieved;
obtaining each random group corresponding to the data to be retrieved based on a preset random algorithm for the identifier of the data object corresponding to the data to be retrieved, and further determining the storage group in which each data object of the data to be retrieved is located;
searching each data object based on the determined data storage pool corresponding to the data to be retrieved and the storage group in which each data object is positioned;
and integrating the data objects to obtain the data to be retrieved.
The embodiment of the present invention further provides a distributed storage system, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor executes the steps in the method.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the above method.
The embodiment of the invention receives the data to be stored; determining a target storage group for storing the data to be stored according to the storage node identification for receiving the data to be stored, wherein the main storage control application associated with the target storage group is deployed in the storage node corresponding to the storage node identification; sending a storage request of data to be stored to a main storage control application in a target storage group; according to the method and the device, the technical means of randomly selecting the storage group is modified, the condition that the storage node for receiving the data to be stored is inconsistent with the storage node where the main storage control application in the selected storage group is located is avoided, the process of forwarding the storage request of the data object to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced.
Drawings
Fig. 1 is a first flowchart of a data storage method according to an embodiment of the present invention;
fig. 2 is a schematic view of an application scenario of a data storage method according to an embodiment of the present invention;
fig. 3 is a schematic view of an application scenario of a data storage method according to an embodiment of the present invention;
fig. 4 is a second flowchart of a data storage method according to an embodiment of the present invention;
fig. 5 is a flowchart of a data storage method according to an embodiment of the present invention;
fig. 6 is a fourth flowchart of a data storage method according to an embodiment of the present invention;
fig. 7 is a fifth flowchart of a data storage method according to an embodiment of the present invention;
fig. 8 is a sixth flowchart of a data storage method according to an embodiment of the present invention;
fig. 9 is a seventh flowchart of a data storage method according to an embodiment of the present invention;
fig. 10 is a flowchart eight of a data storage method according to an embodiment of the present invention;
fig. 11 is a flowchart nine of a data storage method according to an embodiment of the present invention;
fig. 12 is a flowchart ten of a data storage method according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
fig. 14 is a schematic entity diagram of a distributed storage system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention are further described in detail with reference to the drawings and the embodiments, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
To the extent that similar descriptions of "first/second" appear in this patent document, the description below will be added, where reference is made to the term "first \ second \ third" merely to distinguish between similar objects and not to imply a particular ordering with respect to the objects, it being understood that "first \ second \ third" may be interchanged either in a particular order or in a sequential order as permitted, to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Fig. 1 is a first flowchart of a data storage method provided by an embodiment of the present invention, which will be described with reference to the steps shown in fig. 1.
And S01, receiving the data to be stored.
In the embodiment of the invention, the client establishes a communication line with the distributed storage system in advance, when the client needs to store the data to be stored, the client sends the data to be stored to the distributed storage system, and the distributed storage system receives the data to be stored. The client can be a mobile phone, a computer, an intelligent terminal, a vehicle-mounted computer and the like.
In the embodiment of the invention, a protocol conversion layer of a storage node corresponding to a client in the distributed storage system receives data to be stored, and then the protocol conversion layer sends a storage request to a metadata access layer of the distributed storage system. The storage request informs a metadata access layer of the distributed storage system that there is data to be stored. The storage request includes an identification of the storage node at which the protocol translation layer is located.
In the embodiment of the present invention, with reference to fig. 2, a distributed storage system 200 establishes a communication line 201 with a client 1 and a client 2 in advance. The client 1 is a mobile terminal, and the client 2 is a computer. The client 1 or the client 2 may transmit data to be stored to the distributed storage system 200 through the pre-established communication line 201. The distributed storage system 200 receives a storage request over a communication line 201.
In the embodiment of the present invention, the data to be stored may be: picture data, text data, video data, voice data, or an encoding string.
And S02, determining a target storage group for storing the data to be stored according to the storage node identification for receiving the data to be stored.
In the embodiment of the invention, the distributed storage system acquires the storage node identification fed back by the protocol conversion layer of the storage node corresponding to the client. The storage node identification represents address information of a storage node which is connected with the client and receives data to be stored. And the distributed storage system determines a target storage group of the data to be stored in the data storage pool corresponding to the storage node identification according to the storage node identification.
In an embodiment of the present invention, the distributed storage system may include a plurality of data storage pools, each data storage pool corresponding to each storage node. Each data storage pool includes a plurality of storage groups. The main storage control application associated with each storage group is deployed on the storage node corresponding to the data storage pool to which the storage group belongs. The distributed storage system may determine, according to the storage node identifier, a data storage pool corresponding to the storage node that sends the data to be stored, that is, a data storage pool corresponding to the data to be stored. The distributed storage system determines a target storage group for storing data to be stored in a plurality of storage groups in the data storage pool.
In the embodiment of the invention, before the distributed storage system stores the data to be stored, the distributed storage system binds each storage node with one corresponding data storage pool. The distributed storage system may be identified from the storage nodes of the storage nodes. The distributed storage system establishes a mapping relationship between the storage nodes and the corresponding data storage pools. The distributed storage system can determine the data storage pool corresponding to the storage node identifier according to the mapping relationship, that is, the data storage pool corresponding to the data to be stored is determined. A target storage group for storing data to be stored is then determined among the plurality of storage groups within the data storage pool.
S03, sending the storage request for storing the data to be stored to the main storage control application in the target storage group.
In the embodiment of the invention, the distributed storage system sends the storage request of the data to be stored to the main storage control application of the target storage group corresponding to the data to be stored, so that the main storage control application stores the data to be stored to the storage unit corresponding to the main storage control application. Further, the distributed storage system may store the data to be stored in the storage unit corresponding to the corresponding primary storage control application.
In addition, after receiving the data to be stored, the main storage control application can also send the data to be stored to the other auxiliary storage control applications in the storage group, so that the auxiliary storage control applications store the data to be stored to the storage units corresponding to the auxiliary storage control applications, thereby forming distributed storage of the data to be stored and providing disaster tolerance.
In the embodiment of the present application, each storage node may store one storage unit, or may store a plurality of storage units.
In this embodiment of the present invention, the storage group may include: a Placement Group (PG), storage control application may include: object Storage Driver (OSD); accordingly, the primary storage control application includes: and a main OSD.
Of course, in the embodiment of the present invention, the storage control application may also be: secure Digital Memory Card (SD) or a storage hard disk or optical disc.
In the embodiment of the invention, a distributed storage system receives data to be stored; determining a target storage group for storing the data to be stored according to the storage node identification for receiving the data to be stored, wherein the main storage control application associated with the target storage group is deployed in the storage node corresponding to the storage node identification; sending a storage request of data to be stored to a main storage control application in a target storage group; the process of forwarding the storage request of the data object to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced.
It should be appreciated by those skilled in the art that, in the steps illustrated in S01-S03, by modifying the technical means of randomly selecting a storage group, the storage node that receives data to be stored is avoided being inconsistent with the storage node where the primary storage control application in the selected storage group is located, and although the process of forwarding the storage request of the data object to the primary storage control application may be avoided to some extent, so as to save network traffic, some storage groups may store a large amount of data, and some storage groups store less data, which may result in inconsistent data stored in each storage unit in the distributed storage, which results in unbalanced storage of data, and is not favorable for load balancing.
Therefore, the present application provides a further improved data storage method, and in particular defines that a distributed storage system includes a plurality of data storage pools, each data storage pool corresponds to each storage node one to one, each data storage pool includes a plurality of storage groups, and a primary storage control application associated with each storage group is deployed on a storage node corresponding to a data storage pool to which the storage group belongs. For each data storage pool, the storage control applications in the distributed storage system are uniformly distributed in the storage group of the data storage pool (colloquially, if the storage unit capacity corresponding to each storage control application is the same, the storage groups are uniformly distributed on the storage control applications, that is, each storage control application contains the same number of storage groups, but if the storage unit capacity corresponding to each storage control application is different, the storage control application with large corresponding capacity can be associated with more storage groups, in the embodiment of the present application, an existing algorithm can be specifically adopted to realize uniform distribution of the storage groups in each storage control application, such as the existing algorithm Crush algorithm).
Accordingly, the step S02 may include the following steps: determining a data storage pool identifier corresponding to a storage node identifier according to the storage node identifier for receiving data to be stored; and randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identification.
Through the further improved technical scheme, the method for randomly determining the storage group is adopted after the data storage pool is determined, so that the storage data can be uniformly distributed in each storage unit of the distributed storage system, the network flow is saved, and the data can be uniformly distributed on each storage to a certain extent.
In addition, in order to further improve the load balance of data storage, the data to be stored may be divided into a plurality of data objects with the same storage size. However, as will be readily understood by those skilled in the art, for a certain data to be stored, it may not be possible to strictly divide the data into data objects with consistent sizes under certain circumstances, and therefore, the present application is not limited thereto.
In some embodiments, referring to fig. 4, fig. 4 is an optional flowchart of the data storage method provided by the embodiment of the present invention, and S02 shown in fig. 1 may also be implemented through S04 to S05, which will be described with reference to steps.
And S04, determining the data storage pool identification corresponding to the storage node identification according to the storage node identification of the received data to be stored.
In the embodiment of the present invention, according to the storage node identifier that receives the data to be stored, the distributed storage system determines the identifier of the data storage pool corresponding to the storage node identifier, that is, the data storage pool identifier.
In an embodiment of the present invention, a distributed storage system includes a plurality of data storage pools, each data storage pool corresponding one-to-one to each storage node. Each data storage pool comprises a plurality of storage groups, and the main storage control application associated with each storage group is deployed on the storage node corresponding to the data storage pool to which the storage group belongs; for each data storage pool, the individual storage control applications in the distributed storage system are evenly distributed among the storage groups of that data storage pool.
In the embodiment of the invention, the distributed storage system allocates a group of storage control applications to each storage group in the data storage pool according to a distributed selection algorithm of data storage. For example, when the capacities of the object storage devices of the distributed storage system are the same, the distributed storage system configures the same number of storage groups for each object storage device according to a distributed selection algorithm (such as a Crush algorithm) of data storage, thereby ensuring that the capacities of each storage group are the same. When the capacities of the object storage devices of the distributed storage system are different, the distributed storage system configures a plurality of storage groups for the object storage devices with larger capacity according to a distributed selection algorithm for data storage, so that the capacities of the storage groups are the same, and the object storage devices with larger capacity can store more data to be stored.
And S05, randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identification.
In the embodiment of the invention, the distributed storage system can perform hash calculation on the identification information of the data to be stored to obtain the random value corresponding to the data to be stored. The random value may be a character string corresponding to data to be stored. The distributed storage system determines a target storage group corresponding to each character string according to the character string of the data to be stored in each storage group corresponding to the data storage pool, namely determines the target storage group of the data to be stored.
In the embodiment of the present invention, the identification information of the data to be stored may be attribute information of the data to be stored, or may be data type information of the data to be stored.
In the embodiment of the invention, a distributed storage system receives data to be stored and divides the data to be stored into a plurality of data objects to be stored; according to a storage node identifier for receiving data to be stored, determining a data storage pool corresponding to the storage node identifier, randomly determining a target storage group stored by the data object in each storage group corresponding to the data storage pool identifier, wherein a main storage control application associated with the target storage group is deployed in the storage node corresponding to the storage node identifier; sending a storage request to store a data object to a primary storage control application in a target storage group; the process of forwarding the storage request of the data object to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced. In addition, the scheme provided by the attached figure 4 can also ensure the consistency of storage and ensure load balance while saving network flow.
In some embodiments, referring to fig. 5, fig. 5 is an optional flowchart of the data storage method according to the embodiment of the present invention, and S06 to S08 are further included before S01 shown in fig. 1, which will be described with reference to the steps.
S06 builds a plurality of data storage pools, wherein each data storage pool is in one-to-one correspondence with each storage node.
In the embodiment of the invention, the distributed storage system comprises a plurality of storage nodes connected with a client. The distributed storage system acquires node identifications corresponding to the plurality of storage nodes. Wherein the node identification comprises: the node identifies address information or other label information of the corresponding storage node. The distributed storage system correspondingly matches one data storage pool to each storage node.
In the embodiment of the present invention, the distributed storage system may establish a mapping relationship between the storage pool identifier of each data storage pool and the storage node identifier of the corresponding storage node. Wherein, include in the data storage pond: individual storage control applications in a distributed storage system.
In the embodiment of the present invention, the storage node may include a protocol conversion layer connected to the client in the distributed storage system. The storage node converts the standard access protocol of the client into the internal access protocol of the distributed storage system, and the storage node is responsible for accessing the data to be stored.
In this embodiment of the present invention, in the embodiment of the present invention, the distributed storage system may further correspondingly match the storage nodes with a plurality of storage pools, and the distributed storage system establishes a mapping relationship between the storage pool identifiers of the plurality of storage pools and the node identifiers of the corresponding storage nodes. When one of the storage pools corresponding to the storage nodes is disconnected from the matching with the storage nodes, the distributed storage system deletes the corresponding relation between the disconnected storage pool and the node identification in the mapping relation.
S07, for each data storage pool, the storage control applications in the distributed storage system are uniformly distributed in the storage groups of the data storage pool, and the available storage control applications associated with each storage group in the data storage pool are obtained.
In the embodiment of the invention, each data storage pool comprises each storage control application in the distributed storage system. The distributed storage system can divide a plurality of storage control applications in the data storage pool into a plurality of storage groups uniformly. The specific means can be realized by referring to the existing Crush algorithm.
S08, for each storage group, determining a primary storage control application associated with the storage group based on the respective available storage control applications associated with the storage group.
In the embodiment of the invention, the distributed storage system takes one storage control application deployed in a storage node corresponding to a storage pool in which the storage group is located in each storage group as a main storage control application. When a plurality of storage control applications are deployed in each storage group corresponding to the storage node, the distributed storage system selects one of the storage control applications as a primary storage control application.
In an embodiment of the present invention, the distributed storage system may use other storage control applications in each storage group, except the primary storage control application, as secondary storage control applications. The secondary storage control application is used to store copies of data objects within the primary storage control applications within the storage group.
In the embodiment of the present invention, the distributed storage system may deploy the plurality of storage control applications in each storage group to the plurality of storage nodes, respectively. The distributed storage system may deploy the plurality of storage control applications for each storage group on the plurality of storage nodes in an encoding order of the storage control applications.
In the embodiment of the present invention, the distributed storage system may determine the storage control application included in each storage group through a pause (controlled Replication Under Scalable hashing) algorithm.
The primary storage control application in each storage group is used for storing data objects, and the secondary storage control application is used for storing copies of the data objects in the primary storage control application, namely secondary data objects corresponding to the data objects. Referring to fig. 3, the storage pool has three storage groups, namely storage group 1, storage group 2, and storage group 3. Wherein the storage group 1 comprises: the 3 storage control applications are storage control application 2, storage control application 5, and storage control application 8, respectively. The distributed storage system selects the storage control application 2 as the primary storage control application of the storage group 1, and the storage control applications 5 and 8 are the secondary storage control applications of the storage group 1.
In the embodiment of the invention, the distributed storage system can send the storage request of the data object of the data to be stored to the main storage control application in the target storage group after receiving the data to be stored by establishing the corresponding relation between each storage node and each data storage pool and configuring one storage control application corresponding to the storage node in each storage group in the data storage pool; the process of forwarding the storage request of the data object to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced.
In some embodiments, referring to fig. 6, fig. 6 is an optional flowchart of the data storage method provided by the embodiment of the present invention, and S08 shown in fig. 5 can also be implemented through S09, which will be described with reference to steps.
S09, if the storage node corresponding to the data storage pool to which each storage group belongs has an available storage control application, determining the available storage control application as the primary storage control application of the storage group.
In the embodiment of the present invention, if a storage node corresponding to a data storage pool to which a storage group belongs has an available storage control application. The distributed storage system selects one available storage control application deployed on a storage node corresponding to a data storage pool to which the storage group belongs from a plurality of storage control applications of the storage group as a main storage control application of the storage group.
In the embodiment of the present invention, in the multiple storage control applications of each storage group, the distributed storage system may select a first available storage control application deployed on a storage node corresponding to a data storage pool to which the storage group belongs, as a primary storage control application of the storage group.
In an embodiment of the present invention, the distributed storage system may use other storage control applications in the storage group, except the primary storage control application, as the secondary storage control application.
In this case, if there is no available storage control application in the corresponding storage node, the primary storage control application of the storage group may be temporarily set as an unavailable storage control application. Further, the primary storage control application may also be set up as described later in step S12 of fig. 7.
In some embodiments, referring to fig. 7, fig. 7 is an optional flowchart of the data storage method provided in the embodiment of the present invention, and S08 shown in fig. 5 may also be implemented through S10, which will be described with reference to steps.
S10, if there is no available storage control application on the storage node corresponding to the data storage pool to which each storage group belongs, determining one of the available storage control applications included in the obtained storage group as the primary storage control application.
In the embodiment of the invention, if the storage node corresponding to the data storage pool to which one storage group belongs does not have available storage control application. The distributed storage system selects one of the storage control applications in the plurality of storage control applications of the storage group as a primary storage control application of the storage group.
In the embodiment of the present invention, if the storage node corresponding to the data storage pool to which the storage group belongs does not have an available storage control application. The distributed storage system selects the first storage control application configured by the Crush algorithm from a plurality of storage control applications of the storage group as a main storage control application of the storage group.
In some embodiments, referring to fig. 8, fig. 8 is an optional flowchart of the data storage method according to the embodiments of the present invention, and S03 shown in fig. 1 may further include implementation of S11 to S12, which will be described with reference to the steps.
S11, the main storage control application stores the data to be stored in the storage unit corresponding to the main storage control application.
In the embodiment of the invention, the distributed storage system sends the storage request of the data to be stored to the main storage control application of the target storage group corresponding to the data to be stored. The main storage control application acquires the data to be stored, and stores the data to be stored to the storage unit corresponding to the main storage control application.
Of course, in the embodiment of the present invention, the storage unit may also be: a memory in a distributed storage system.
S12, the main storage control application sends the data to be stored to the rest of the storage control applications in the target storage group to instruct the rest of the storage control applications to execute the storage copy operation.
In the embodiment of the present invention, after the main storage control application acquires the data to be stored, the main storage control application may further send the data to be stored to the other auxiliary storage control applications in the storage group, so that the auxiliary storage control applications store the data to be stored to the storage units corresponding to the auxiliary storage control applications, thereby forming distributed storage of the data to be stored, and providing disaster tolerance.
In the embodiment of the invention, after the main storage control application acquires the data to be stored, the main storage control application copies the data to be stored to form a copy corresponding to the data to be stored. And respectively sending the copies of the data to be stored to the other secondary storage control applications in the storage group by the primary storage control application. And the secondary storage control application stores the copy of the data to be stored to the storage unit corresponding to the secondary storage control application, so that distributed storage of the data to be stored is formed.
In some embodiments, referring to fig. 9, fig. 9 is an optional flowchart of the data storage method according to the embodiment of the present invention, and S13 is further included between S01 and S04 shown in fig. 4, which will be described with reference to steps.
And S13, dividing the data to be stored into a plurality of data objects with the same data size to be stored.
In the embodiment of the invention, after the client sends the data to be stored to the distributed storage system, the distributed storage system receives the data to be stored. The distributed storage system divides data to be stored into a plurality of data objects to be stored. Wherein the data size of each data object may be the same.
In the embodiment of the present invention, the distributed storage system may divide a piece of data to be stored with a size of 4M into 4 data objects with sizes of 1M.
In some embodiments, referring to fig. 9, fig. 9 is an optional flowchart of the data storage method according to the embodiment of the present invention, and S05 to S03 shown in fig. 4 may also be implemented through S15 to S16, which will be described with reference to each step.
And S15, for each data object to be stored, randomly determining a target storage group in which the data object is stored in each storage group corresponding to the data storage pool identification based on the data object identification.
In the embodiment of the present invention, the distributed storage system may perform hash calculation on the identification information of each data object to obtain a random value corresponding to each data object. Wherein the random value may be a character string of the corresponding data object. The distributed storage system determines a target storage group corresponding to each character string, namely a target storage group of each data object, according to the character string of each data object in each storage group corresponding to the data storage pool.
In the embodiment of the present invention, the identification information of the data object may be attribute information of the data object, and may be data type information of the data object.
In the embodiment of the present invention, the distributed storage system may calculate the identifiers of 4 data objects with a size of 1M, and obtain random values respectively corresponding to the 4 data objects.
S16, for each data object, sending the storage request for storing the data object to the main storage control application associated with the target storage group corresponding to the data object.
In the embodiment of the invention, the distributed storage system sends the storage request of each data object of the data to be stored to the main storage control application associated with the target storage group corresponding to each data object. Further, the distributed storage system stores each data object in a storage location corresponding to the corresponding primary storage control application. The main storage control application associated with the target storage group is deployed on the storage node corresponding to the data storage pool where the target storage group is located.
In the embodiment of the present invention, the distributed storage system may send one storage request corresponding to each data object to the primary storage control application of the target storage group corresponding to the data object for storage. The distributed storage system can also respectively send each data object to a main storage control application of a target storage group corresponding to the data object for storage; the distributed storage system may also send a storage request corresponding to the plurality of data objects to a primary storage control application of a target storage group corresponding to the plurality of data objects for storage. The specific storage implementation is not limited.
In the embodiment of the present invention, with reference to fig. 3, the distributed storage system divides the data to be stored into three data objects with the same size. Data object 1, data object 2, and data object 3, respectively. The distributed storage system stores the data object 1 on a storage unit corresponding to the primary storage control application of the storage group 1. The primary storage control applications for storage group 1 are: and the storage control application 2 is configured on the storage node sending the data to be stored. The distributed storage system stores the data object 2 on a storage unit corresponding to the primary storage control application of the storage group 2. The primary storage control applications for storage group 2 are: and the storage control application 4 is configured on the storage node sending the data to be stored. The distributed storage system stores the data object 3 on a storage unit corresponding to the primary storage control application of the storage group 3. The primary storage control applications for storage group 3 are: and the storage control application 3 is configured on the storage node sending the data to be stored.
In some embodiments, referring to fig. 10, fig. 10 is an optional flowchart of the data storage method provided in the embodiment of the present invention, and S15 shown in fig. 9 can also be implemented through S17 to S20, which will be described with reference to steps.
S17, for each data object to be stored, obtaining the identification of the data object.
In the embodiment of the invention, the distributed storage system acquires the identifier of each data object corresponding to the data to be stored. The identifier of the data object may be attribute information of the data object, and may be data type information of the data object.
And S18, based on the preset random algorithm, calculating the object identifier of the data object to obtain a random value corresponding to the data object.
In the embodiment of the invention, the distributed storage system calculates the identification information of each data object based on a preset random algorithm to obtain a random value corresponding to the data object.
In the embodiment of the invention, the distributed storage system can calculate the identification information of each data object through a hash algorithm to obtain a random value corresponding to the data object.
In the embodiment of the present invention, in general, since the random value calculated by the preset random algorithm is always within a fixed range, the range may be divided into a plurality of random groups by the distributed storage system. The distribution is that the storage system can establish a corresponding relationship between the random set and the storage set in each data storage pool. For example: the distributed storage system may determine that the random value is within a range of: 1-9. The distributed storage system divides the range into three random groups of 1-3, 3-6, 7-9. The distributed storage system can establish the corresponding relation of storage groups with random groups of 1-3, 3-6, 7-9.
S19, determining the random group to which the random value belongs based on the random value.
In this embodiment of the present invention, the distributed storage system may determine, according to the random value, a random group to which the random value belongs.
And S20, selecting the storage group corresponding to the random group as the target storage group from the storage groups corresponding to the data storage pool identification based on the random group to which the random group belongs.
In the embodiment of the invention, after the random value of the identification information of each data object is acquired by the distributed storage system, the target storage group corresponding to each identification information is determined according to the corresponding relation between the random group to which the random value of the identification information belongs and the storage group. I.e., the target storage group to which each data object corresponds is determined.
In the embodiment of the invention, if the random value of the identification information is 5, the random group corresponding to the identification information is determined to be a random group of 3-6, and the distributed storage system determines the storage group corresponding to the random group of 3-6 to be a target storage group corresponding to the identification information with the random value of 5 according to the corresponding relation of each storage group corresponding to each random group.
In some embodiments, referring to fig. 11, fig. 11 is an optional flowchart of the data storage method provided by the embodiment of the present invention, and S21 to S22 are further included after S04 shown in fig. 10, which will be described with reference to the steps.
And S21, acquiring data identification information of the data to be stored.
In the embodiment of the invention, after the distributed storage system acquires the data to be stored, the distributed storage system also acquires the data identification information of the data to be stored. The data identification information may be attribute information of the data to be stored.
S22, a first mapping of data identification information to data storage pool identification is stored.
In the embodiment of the invention, the distributed storage system establishes a corresponding first mapping relation between the acquired data identification information and the identification of the data storage pool corresponding to the data to be stored. The distributed storage system may store the first mapping relationship in a database of the distributed storage system.
In other embodiments, the distributed storage system may also store the first mapping relationship in a storage unit corresponding to the storage node.
In some embodiments, referring to fig. 11, fig. 11 is an optional flowchart of the data storage method according to the embodiment of the present invention, and S23 is further included after S17 shown in fig. 8, which will be described with reference to the steps.
S23, storing a second mapping relation between the data identification information and the identification of each data object.
In the embodiment of the invention, because the plurality of data objects of the data to be stored are obtained by differentiating the data to be stored, the data identification information of the data to be stored and the identification information of the plurality of corresponding data objects have corresponding relationship. The distributed storage system may establish a second mapping relationship between data identification information of the data to be stored and the identifications of the corresponding plurality of data objects. The distributed storage system may store the second mapping relationship in a data storage pool.
In other embodiments, the distributed storage system may store the second mapping relationship in the data storage pool after storing the plurality of data objects in the corresponding primary storage control application. For example, the distributed storage system may store the second mapping relationship in the data storage pool after S03.
In some embodiments, referring to fig. 12, fig. 12 is an optional flowchart of a data storage method according to an embodiment of the present invention, and will be described with reference to steps.
And S24, receiving a retrieval request, wherein the retrieval request carries data identification information of the data to be retrieved.
In the embodiment of the invention, the distributed storage system receives a retrieval request sent by a client, wherein the retrieval request carries data identification information of data to be retrieved.
The data identification information of the data to be retrieved can represent attribute information of the data to be retrieved and can also represent storage address information of the target data.
S25, based on the data identification information of the data to be retrieved, in the first mapping relation, determining the data storage pool corresponding to the data to be retrieved.
In the embodiment of the present invention, after the distributed storage system obtains the data identification information of the retrieved data, the distributed storage system may find the data storage pool corresponding to the data identification information of the retrieved data in the first mapping relationship.
S26, determining, in the second mapping relationship, an identifier of each data object corresponding to the data to be retrieved based on the data identifier information of the data to be retrieved.
In the embodiment of the present invention, after the distributed storage system acquires the data identification information of the data to be retrieved, the distributed storage system may determine, in the second mapping relationship, the identification of each data object corresponding to the data to be retrieved.
In this embodiment of the present invention, the first mapping relationship and the second mapping relationship may be merged into one mapping relationship, where the merged mapping relationship includes: the mapping relationship between the data identification information and the data storage pool, and the mapping relationship between the data identification information and the identification.
And S27, obtaining each random group corresponding to the data to be retrieved based on a preset random algorithm for the identifier of the data object corresponding to the data to be retrieved, and further determining the storage group in which each data object of the data to be retrieved is located.
In the embodiment of the invention, the distributed storage system calculates the identification of the data object respectively corresponding to the data to be retrieved through a Hash algorithm to obtain the random values corresponding to a plurality of data objects. The distributed storage system can determine the storage groups of the data objects corresponding to the data to be retrieved respectively according to the corresponding relation between the random group to which the random values corresponding to the plurality of data objects belong and the storage group.
In the embodiment of the present invention, the distributed storage system calculates the identifier of one data object through a hash algorithm, and the obtained random value corresponding to the identifier may be 100010. The distributed storage system may determine that the random set to which the data object corresponds is a random set of 6 digits comprising a random value. The distributed storage system can determine the storage group corresponding to the identifier according to the corresponding relation between the random group and the storage group. That is, the storage group corresponding to the data object is determined
And S28, finding out each data object based on the data storage pool corresponding to the determined data to be retrieved and the storage group in which each data object is located.
In the embodiment of the invention, after the distributed storage system determines the storage groups in which a plurality of data objects of a plurality of data to be retrieved are stored correspondingly, the distributed storage system acquires the data objects in the storage units corresponding to the main storage control application of each storage group through a preset program.
In the embodiment of the present invention, the distributed storage system may obtain the data object in the storage unit corresponding to the primary storage control application in each storage group, and the distributed storage system may also obtain the copy of the data object in the storage unit corresponding to the secondary storage control application in each storage group.
And S29, integrating the data objects to obtain the data to be retrieved.
In the embodiment of the invention, the distributed storage system combines a plurality of acquired data objects to form the data to be retrieved, and the distributed storage system sends the data to be retrieved to the client through the pre-established communication line.
In the embodiment of the invention, the distributed storage system can determine the data storage pool in which the data to be retrieved is stored and can determine the storage group in which the data object corresponding to the data to be retrieved is stored by receiving the retrieval request and according to the data identification information carried in the retrieval request; therefore, the data objects can be rapidly acquired in the storage group to form the data to be retrieved.
Fig. 13 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention. An embodiment of the present invention further provides a distributed storage system 800, including: a receiving unit 803, a processing unit 804 and a storage unit 805.
A receiving unit 803, configured to receive data to be stored;
the processing unit 804 is configured to determine, according to the storage node identifier that receives the data to be stored, a target storage group for storing the data to be stored, where a main storage control application associated with the target storage group is deployed in a storage node corresponding to the storage node identifier; the distributed storage system comprises a plurality of storage groups, and each storage group is associated with a plurality of storage control applications;
a storage unit 805, configured to send a storage request for storing data to be stored to a primary storage control application in a target storage group.
In this embodiment of the present invention, the distributed storage system 800 includes a plurality of data storage pools, each data storage pool corresponds to each storage node one to one, each data storage pool includes a plurality of storage groups, and a primary storage control application associated with each storage group is deployed on a storage node corresponding to a data storage pool to which the storage group belongs; for each data storage pool, the storage control applications in the distributed storage system are uniformly distributed in the storage groups of the data storage pool;
in this embodiment of the present invention, the processing unit 804 of the distributed storage system 800 is configured to determine, according to a storage node identifier that receives data to be stored, a data storage pool identifier corresponding to the storage node identifier; randomly determining a target storage group in which data to be stored is stored in each storage group corresponding to the data storage pool identification;
in the embodiment of the present invention, the distributed storage system 800 constructs a plurality of data storage pools, wherein each data storage pool corresponds to each storage node one to one; for each data storage pool, uniformly distributing each storage control application in the distributed storage system in a storage group of the data storage pool to obtain each available storage control application associated with each storage group in the data storage pool; for each storage group, determining a primary storage control application associated with the storage group based on the available storage control applications associated with the storage group, so that the primary storage control application is deployed on a storage node corresponding to the data storage pool to which the storage group belongs.
In this embodiment of the present invention, for each storage group in the distributed storage system 800, if the storage node corresponding to the data storage pool to which each storage group belongs has an available storage control application, the available storage control application is determined as a primary storage control application of the storage group; if no storage control application is available on the storage node corresponding to the data storage pool to which each storage group belongs, one of the available storage control applications included in the obtained storage group is determined as a primary storage control application.
In this embodiment of the present invention, the primary storage control application of the distributed storage system 800 stores data to be stored to a storage unit corresponding to the primary storage control application; and the main storage control application sends the data to be stored to the rest storage control applications in the target storage group to instruct the rest storage control applications to execute the storage copy operation.
In this embodiment of the present invention, the processing unit 804 of the distributed storage system 800 is configured to divide data to be stored into a plurality of data objects to be stored, where the data objects have the same size; the processing unit 804 of the distributed storage system 800 is configured to, for each data object to be stored, randomly determine, based on the data object identifier, a target storage group in which the data object is stored in each storage group corresponding to the data storage pool identifier; the storage unit 805 of the distributed storage system 800 is configured to, for each data object, send a storage request for storing the data object to the primary storage control application associated with the target storage group corresponding to the data object.
The processing unit 804 of the distributed storage system 800 is configured to, for each data object to be stored, obtain an object identifier of the data object; based on a preset random algorithm, calculating the object identification of the data object to obtain a random value corresponding to the data object; determining a random group to which the random value belongs based on the random value; and selecting a storage group corresponding to the random group as a target storage group from the storage groups corresponding to the data storage pool identifications on the basis of the random group to which the data storage pool belongs.
In this embodiment of the present invention, the receiving unit 803 of the distributed storage system 800 is configured to obtain data identification information of data to be stored; a storage unit 805 of the distributed storage system 800 is configured to store a first mapping relationship between the data identification information and the identification of the determined data storage pool; the storage unit 805 of the distributed storage system 800 is further configured to store a second mapping relationship between the data identification information and the identification of each data object.
In this embodiment of the present invention, the receiving unit 803 of the distributed storage system 800 is configured to receive a retrieval request, where the retrieval request carries data identification information of data to be retrieved; the processing unit 804 is configured to determine, in the first mapping relationship, a data storage pool corresponding to the data to be retrieved based on the data identification information of the data to be retrieved; determining the identification of each data object corresponding to the data to be retrieved in the second mapping relation based on the data identification information of the data to be retrieved; obtaining each random group corresponding to the data to be retrieved based on a preset random algorithm for the identifier of the data object corresponding to the data to be retrieved, and further determining the storage group where each data object of the data to be retrieved is located; searching each data object based on the data storage pool corresponding to the determined data to be retrieved and the storage group in which each data object is positioned; and integrating the data objects to obtain the data to be retrieved.
The distributed storage system 800 according to the embodiment of the present invention receives data to be stored through the receiving unit 803; the processing unit 804 determines a target storage group for storing the data to be stored according to the storage node identifier of the received data to be stored, and a main storage control application associated with the target storage group is deployed in the storage node corresponding to the storage node identifier; the storage unit 805 sends a storage request of data to be stored to a primary storage control application in a target storage group; the process of forwarding the storage request of the data object to the main storage control application is omitted, the network flow is saved, the overall storage rate is improved, and the access delay is reduced.
It should be noted that, in the embodiment of the present invention, if the data storage method is implemented in the form of a software functional module and is sold or used as an independent product, the data storage method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially implemented or a part of the technical solutions contributing to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling the distributed storage system 800 (which may be a personal computer, a distributed storage system, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
Correspondingly, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned method.
Correspondingly, the embodiment of the present invention provides a distributed storage system 800, which includes a memory 802 and a processor 801, where the memory 802 stores a computer program that can be executed on the processor 801, and the processor 801 executes the computer program to implement the steps in the method described above.
Here, it should be noted that: the above description of the storage medium and device embodiments, similar to the description of the method embodiments above, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention.
It should be noted that fig. 14 is a schematic diagram of a hardware entity of a distributed storage system according to an embodiment of the present invention, as shown in fig. 14, the hardware entity of the distributed storage system 800 includes: a processor 801 and a memory 802, wherein:
the processor 801 generally controls the overall operation of the distributed storage system 800.
The Memory 802 is configured to store instructions and applications executable by the processor 801, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 801 and modules in the distributed storage system 800, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).
It should be noted that the distributed storage system 800 may include a plurality of terminals, each of which may have the processor 801 and the memory 802, and the above steps may be executed in one terminal or may be executed in a plurality of different terminals respectively. The terminal may be a storage node of the distributed storage system 800.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a distributed storage system, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and all such changes or substitutions are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (11)

1. A data storage method applied to a distributed storage system, wherein the distributed storage system comprises a plurality of storage nodes, the method comprising:
receiving data to be stored;
determining a target storage group for storing the data to be stored according to the storage node identification for receiving the data to be stored, wherein a main storage control application associated with the target storage group is deployed in a storage node corresponding to the storage node identification; the distributed storage system comprises a plurality of storage groups, and each storage group is associated with a plurality of storage control applications;
and sending the storage request for storing the data to be stored to the main storage control application in the target storage group.
2. The data storage method according to claim 1, wherein the distributed storage system comprises a plurality of data storage pools, each data storage pool corresponds to each storage node one by one, each data storage pool comprises a plurality of storage groups, and the main storage control application associated with each storage group is deployed on the storage node corresponding to the data storage pool to which the storage group belongs; for each data storage pool, the storage control applications in the distributed storage system are uniformly distributed in the storage groups of the data storage pool;
correspondingly, the determining a target storage group for storing the data to be stored according to the storage node identifier receiving the data to be stored includes:
determining a data storage pool identifier corresponding to a storage node identifier according to the storage node identifier for receiving data to be stored;
and randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identification.
3. The data storage method according to claim 2, wherein before determining, according to the storage node identifier receiving the data to be stored, the data storage pool identifier corresponding to the storage node identifier, the method further comprises:
constructing a plurality of data storage pools, wherein each data storage pool corresponds to each storage node one by one;
for each data storage pool, uniformly distributing the storage control applications in the distributed storage system in the storage groups of the data storage pool to obtain the available storage control applications associated with each storage group in the data storage pool;
for each storage group, determining a primary storage control application associated with the storage group based on the available storage control applications associated with the storage group, so that the primary storage control application is deployed on a storage node corresponding to the data storage pool to which the storage group belongs.
4. The data storage method of claim 3, wherein determining, for each storage group, the primary storage control application associated with the storage group based on the respective available storage control applications associated with the storage group comprises:
if the storage node corresponding to the data storage pool to which each storage group belongs has an available storage control application, determining the available storage control application as a main storage control application of the storage group;
and if no available storage control application exists on the storage node corresponding to the data storage pool to which each storage group belongs, determining one of the obtained available storage control applications included in the storage group as a main storage control application.
5. The data storage method of any of claims 1 to 4, wherein after sending the storage request to store the data to be stored to the primary storage control application in the target storage group, the method further comprises:
the main storage control application stores the data to be stored to a storage unit corresponding to the main storage control application;
and the main storage control application sends the data to be stored to the rest storage control applications in the target storage group to instruct the rest storage control applications to execute storage copy operation.
6. The data storage method of any of claims 2 to 4, wherein after said receiving data to be stored, the method further comprises:
dividing the data to be stored into a plurality of data objects with consistent data size to be stored;
correspondingly, the randomly determining a target storage group in which the data to be stored is stored in each storage group corresponding to the data storage pool identifier includes:
for each data object to be stored, randomly determining a target storage group for storing the data object in each storage group corresponding to the data storage pool identification based on the data object identification;
correspondingly, the sending the storage request for storing the data to be stored to the primary storage control application in the target storage group includes:
for each data object, a storage request to store the data object is sent to the primary storage control application associated with the corresponding target storage group for the data object.
7. The data storage method of claim 6, wherein for each data object to be stored, randomly determining a target storage group for storing the data object in the respective storage group corresponding to the data storage pool identifier based on the data object identifier comprises:
for each data object to be stored, acquiring an identifier of the data object;
based on a preset random algorithm, calculating the identifier of the data object to obtain a random value corresponding to the data object;
determining a random group to which the random value belongs based on the random value;
and selecting the storage group corresponding to the random group as the target storage group from the storage groups corresponding to the data storage pool identification based on the random group to which the data storage pool belongs.
8. The data storage method of claim 7, wherein after the step of determining, according to the storage node identifier receiving the data to be stored, the data storage pool identifier corresponding to the storage node identifier, the method further comprises:
acquiring data identification information of the data to be stored;
storing a first mapping relationship between the data identification information and the data storage pool identification;
the data storage method further comprises the following steps:
and storing a second mapping relation between the data identification information and the identification of each data object.
9. The data storage method of claim 8, wherein the method further comprises:
receiving a retrieval request, wherein the retrieval request carries data identification information of data to be retrieved;
determining a data storage pool corresponding to the data to be retrieved in the first mapping relation based on the data identification information of the data to be retrieved;
determining the identifier of each data object corresponding to the data to be retrieved in the second mapping relation based on the data identifier information of the data to be retrieved;
obtaining each random group corresponding to the data to be retrieved based on a preset random algorithm for the identifier of the data object corresponding to the data to be retrieved, and further determining the storage group in which each data object of the data to be retrieved is located;
searching each data object based on the determined data storage pool corresponding to the data to be retrieved and the storage group in which each data object is positioned;
and integrating the data objects to obtain the data to be retrieved.
10. A distributed storage system comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps of the method of any one of claims 1 to 9 when executing the program.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.
CN202011560049.4A 2020-12-25 2020-12-25 Data storage method, distributed storage system and storage medium Pending CN114756620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011560049.4A CN114756620A (en) 2020-12-25 2020-12-25 Data storage method, distributed storage system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011560049.4A CN114756620A (en) 2020-12-25 2020-12-25 Data storage method, distributed storage system and storage medium

Publications (1)

Publication Number Publication Date
CN114756620A true CN114756620A (en) 2022-07-15

Family

ID=82324662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011560049.4A Pending CN114756620A (en) 2020-12-25 2020-12-25 Data storage method, distributed storage system and storage medium

Country Status (1)

Country Link
CN (1) CN114756620A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202589A (en) * 2022-09-14 2022-10-18 浪潮电子信息产业股份有限公司 Placement group member selection method, device, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202589A (en) * 2022-09-14 2022-10-18 浪潮电子信息产业股份有限公司 Placement group member selection method, device, equipment and readable storage medium
CN115202589B (en) * 2022-09-14 2023-02-24 浪潮电子信息产业股份有限公司 Placement group member selection method, device and equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN103036597B (en) Method and device of sharing resources among devices in close range
JP2020523700A (en) Distributed search and index update method, system, server, and computer device
US11768706B2 (en) Method, storage medium storing instructions, and apparatus for implementing hardware resource allocation according to user-requested resource quantity
CN114039947B (en) Terminal address allocation method, UPF, system and storage medium
CN106790552B (en) A kind of content providing system based on content distributing network
CN107786758B (en) Agent distribution method and device
CN105245500A (en) Multimedia resource sharing method and device
CN117008818A (en) Data processing method, apparatus, computer device, and computer readable storage medium
JP2019510435A (en) Network access method, related device and system
CN114756620A (en) Data storage method, distributed storage system and storage medium
CN110764688A (en) Method and device for processing data
CN104092754A (en) File storage system and method
US20180302404A1 (en) Method for processing data request and system therefor, access device, and storage device
CN111200640B (en) Uploading method based on client and client
CN111666265B (en) Data management method, device, server and storage medium
CN107181773A (en) Data storage and data managing method, the equipment of distributed memory system
US20130260804A1 (en) Apparatus and method for wireless network connection
WO2002037769A1 (en) Communication control apparatus and method
CN106649528A (en) Picture writing and reading methods and devices
CN108874798B (en) Big data sorting method and system
CN114238264A (en) Data processing method, data processing device, computer equipment and storage medium
CN110110004B (en) Data operation method, device and storage medium
CN117539949B (en) Processing method and device of database access request, electronic equipment and storage medium
CN112685613A (en) Resource packet query method and device and information processing system
CN104537081A (en) File management system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination