WO2018000993A1 - 一种分布式存储的方法和系统 - Google Patents

一种分布式存储的方法和系统 Download PDF

Info

Publication number
WO2018000993A1
WO2018000993A1 PCT/CN2017/085383 CN2017085383W WO2018000993A1 WO 2018000993 A1 WO2018000993 A1 WO 2018000993A1 CN 2017085383 W CN2017085383 W CN 2017085383W WO 2018000993 A1 WO2018000993 A1 WO 2018000993A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage node
storage
content data
node
migration
Prior art date
Application number
PCT/CN2017/085383
Other languages
English (en)
French (fr)
Inventor
林灿榕
李耀辉
沈剑刚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018000993A1 publication Critical patent/WO2018000993A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1036Load balancing of requests to servers for services different from user content provisioning, e.g. load balancing across domain name servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and system for distributed storage.
  • a distributed storage system consists of multiple servers with storage capabilities. Among them, each server is interconnected through the network, and provides storage services as a whole.
  • distributed storage systems have two designs. One is a centralized design. This design uses a centrally deployed central server for data distribution and management. The client queries the central server for data location and data before accessing the data. After the location, a request to read data is initiated to the server to which the data belongs. The other is decentralized design. This design uses distributed algorithms, such as Distributed Hash Table (DHT) algorithm, to calculate the data location to manage the data distribution.
  • DHT Distributed Hash Table
  • Embodiments of the present invention provide a method and system for implementing distributed storage to solve the problem of waste of storage resources in the prior art.
  • an embodiment of the present invention provides a distributed storage method, in which a first storage node of a distributed storage system receives a request for writing content data from an application server, and the first storage node may A capacity load and a traffic load of each storage node in the distributed storage system to determine a second storage node for writing the content data, and then notifying the application server to write the content data to the determined second storage node, and locally creating Management data of the content data.
  • the storage location of the content data is recorded in the management data, that is, the storage node where the content data is located.
  • the content data is distributed on the basis of fully considering the respective capacity conditions and turbulence conditions of each storage node, thereby avoiding the problem of resource waste caused by the distribution of content data by using a unified algorithm.
  • the storage node calculated by the application server 101 through the distributed algorithm is used as the access node of the application server 101.
  • the storage node that is the access node determines the storage node that stores the content data according to the capacity load and the traffic load of each storage node. In this way, bottlenecks in centralized design can be avoided.
  • the capacity load of each storage node is calculated according to the storage capacity supported by each storage node and the respective used storage capacity.
  • the traffic load of each storage node is calculated according to the turbulence capability supported by each storage node and the respective average ⁇ traffic.
  • the intersection of the capacity load and the traffic load can be considered in a way of intersection.
  • the first storage node finds a storage node whose capacity load is within a preset capacity range, and forms a first node set.
  • the first storage node also finds a storage node whose traffic load is within a preset traffic range, and forms a second node set. Then, a second storage node for writing the data is selected from the intersection of the first node set and the second node set.
  • selecting the second storage node for writing the data from the intersection of the first node set and the second node set specifically includes: determining, by the first storage node, that the data is to be written. Whether the data is hot data or cold data; if it is hot data, the first storage node selects a storage node with the smallest traffic load from the intersection of the first node set and the second node set as writing the data a second storage node; if it is cold data, the first storage node selects a storage node with the smallest capacity load from the intersection of the first node set and the second node set as the second storage node that writes the data.
  • the second storage node for writing content data when selecting a second storage node for writing content data from the intersection, it may further determine whether there is a first storage node in the intersection, if the first storage node exists The first storage node is then preferred as the second storage node for writing content data.
  • the method also includes an access process to the content data.
  • the first storage node receives a request for accessing content data, determines, from the management data, that the content data is stored in the second storage node, and forwards the request for accessing the content data to the first Two storage nodes.
  • the embodiment of the present invention may store the hot content data to a storage node with strong turbulence capability, and store the cold content data to a storage node with a large storage capacity. The process is specifically described below.
  • each storage node may determine the access heat of the content data according to the number of times the content data is accessed, and record the access heat of the content data in the management data of the content data, so as to subsequently follow the access.
  • the heat is used to migrate content data.
  • the first storage node identifies the hot content data according to the access heat of the content data stored by the first storage node, and the first storage node migrates the hot content data to the traffic load ratio of the first storage node.
  • the storage node having a small traffic load and notifying the storage node storing the management data of the hot content data to update the storage location of the hot content data.
  • the first storage node migrates the hot content data to a storage node whose traffic load is smaller than the traffic load of the first storage node, and specifically includes: the distributed storage system
  • the storage nodes are sorted according to the traffic load, and the first migration relationship between the storage nodes is determined according to the principle that the storage node with a large traffic load migrates the hot content data to the storage node with a small traffic load, and the first migration relationship includes The migration storage node and the migration storage node that form the migration pair.
  • the first storage node migrates the hot data into an inbound storage node that forms a migration pair with the first storage node in the first migration relationship.
  • the first storage node also identifies the access heat of the content data stored by itself. Cold content data.
  • the first storage node migrates the cold content data to a storage node whose capacity load is smaller than the capacity load of the first storage node, and notifies the storage node storing the management data of the cold content data to update the storage of the cold content data. position.
  • the first storage node migrates the cold content data to a storage node whose capacity load is smaller than the capacity load of the first storage node, and specifically includes: each of the distributed storage systems
  • the storage node is sorted according to the capacity load, and the second migration relationship between the storage nodes is determined according to the principle that the storage node with large capacity load migrates cold content data to the storage node with small capacity load, and the second migration relationship includes formation.
  • the first storage node migrates the cold content data into an inbound storage node that forms a migration pair with the first storage node in the second migration relationship.
  • the storage resources of the storage node with high turbulence capability can be released, and the hot content data can be stored to the storage node with strong turbulence capability, thereby improving the performance of the entire distributed storage system.
  • the embodiment of the present invention provides a distributed storage method, which can be applied to a scenario in which a distributed storage system is expanded.
  • the distributed system stores content data and management data of each content data, and each management data includes a storage location of content data corresponding to the management data; and the management data is distributed in the distribution by a distributed algorithm.
  • the content data is distributed according to the capacity load and the traffic load of each storage node.
  • the distributed storage system recalculates the distribution of each content data through a distributed algorithm.
  • the content data that should be attributed to the expansion storage node is retained in the original storage node that stores the content data, and the calculated management data of the content data is migrated to the expansion storage node.
  • management data for each content data
  • the management data is much smaller than the content data itself, the migration amount is very small, which greatly shortens the migration time, and enables the distributed storage system to provide services quickly after the expansion.
  • the embodiment of the present invention further optimizes the distribution of the content data: the hot content data is migrated to the storage node with strong turbulence capability, and the cold content data is migrated to the storage capacity. Storage node. The migration process is described below.
  • the distributed storage system sorts the storage nodes according to the traffic load, and determines the relationship between the storage nodes according to the principle that the storage nodes with large traffic load migrate the hot content data to the storage nodes with small traffic load.
  • the first migration relationship includes an inbound storage node and an inbound storage node that form a migration pair.
  • Each storage node of the distributed storage system performs hot content data migration according to the first migration relationship, and updates the storage location of the hot content data to the migrated storage in the management data of the hot content data. position.
  • the distributed storage system also sorts the storage nodes of the distributed storage system according to the capacity load, and migrates the cold content data according to the storage node with large capacity load to the storage node with small capacity load. And determining a second migration relationship between the storage nodes, where the second migration relationship includes an inbound storage node and an inbound storage node that form a migration pair.
  • Each storage node of the distributed storage system performs cold content data migration according to the second migration relationship, and updates the storage location of the cold content data to the migrated storage location in the management data of the cold content data.
  • the performing, by the storage node of the distributed storage system, the migration of the hot content data according to the first migration relationship includes: the migration storage node in the first migration relationship identifies the storage of the local node. Hot content data in content data. Migrating the identified hot content data to the migrated storage in the first migration relationship The node is merged into the migration storage node of the migration pair.
  • the storage node of the distributed storage system performs the migration of the cold content data according to the second migration relationship, and specifically includes: the migration storage node in the second migration relationship identifies the content stored by the storage node. Cold content data in the data. The identified cold data is migrated to the migrated storage node that forms a migration pair with the migrated storage node in the second migration relationship.
  • an embodiment of the present invention provides a storage node, where the storage node has a function of implementing behavior of a first storage node in the foregoing method embodiment.
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more components corresponding to the functions described above (eg, determining a distribution of content data based on a capacity load and a traffic load of each storage node of the distributed storage system).
  • an embodiment of the present invention provides a distributed storage system, which has the function of implementing the behavior of a distributed storage system in the foregoing method embodiment, including the function of the behavior of each storage node in the distributed storage system. ).
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more components corresponding to the above functions (eg, capacity expansion migration, hot and cold data migration, etc.).
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions used by the first storage node, which includes a program designed to execute the foregoing aspect for a first storage node.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the distributed storage system, including a program designed to execute the above aspects for a distributed storage system.
  • the embodiment of the present invention can design the page component to include the data service and the configuration, so that the page can be modified by the configuration, and the work of the prior art that requires the hard code to write the JS code part is completely changed into the configuration. , greatly reducing delivery time.
  • the management data of the content data is created by using the storage node calculated by the distributed algorithm, so that the management of the content data can be distributed to the storage nodes of the distributed storage system, thereby avoiding adopting the central node. Performance bottlenecks caused by management.
  • the distribution of content data is distributed according to the hardware capabilities (such as storage capacity and turbulence capability) of each storage node, thereby avoiding the problem of resource waste caused by using a single distributed algorithm in a heterogeneous environment.
  • FIG. 1 is a network architecture diagram of implementing distributed storage according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a computer device according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for implementing distributed storage according to an embodiment of the present invention.
  • 3-1 is a schematic diagram of a digital space according to an embodiment of the present invention.
  • 3-2 is a schematic diagram of content data mapping according to an embodiment of the present invention.
  • 3-3 is a schematic diagram of a storage node mapping according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of accessing content data according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a distributed storage system for expanding capacity according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of migration after capacity expansion according to an embodiment of the present invention.
  • FIG. 7 is a structural diagram of a storage node according to an embodiment of the present invention.
  • the network architecture and the service scenario described in the embodiments of the present invention are used to more clearly illustrate the technical solutions of the embodiments of the present invention, and do not constitute a limitation of the technical solutions provided by the embodiments of the present invention.
  • the technical solutions provided by the embodiments of the present invention are equally applicable to similar technical problems.
  • FIG. 1 is a schematic diagram of a network architecture for implementing distributed storage according to an embodiment of the present invention.
  • the network architecture includes a distributed storage system 102 and at least one application server 101. Two or more storage nodes are included in the distributed storage system 102 (only three are shown in FIG. 1 as an example).
  • the storage node may be a server with storage capability. Each storage node is interconnected through a network to provide storage services as a whole.
  • both the application server 101 and the storage nodes of the distributed storage system 102 can be connected to the network for communication over the network.
  • the network can be the Internet (Internet) or other type of network such as a local area network or a wireless network.
  • the application server 101 can access the storage nodes in the distributed storage system 102 to perform operations such as writing or reading of data.
  • the application server 101 may adopt a distributed algorithm, for example, a distributed hash table (DHT) algorithm, and calculate a storage node to which the content data to be written or read belongs, and access the The storage node performs a write or read operation of the content data.
  • the client that interacts with the storage node may be deployed on the application server 101.
  • the computing process is performed by the client to shield the application structure in the distributed storage system 102 from the application in the application server 101.
  • the storage node of the distributed storage system 102 When the storage node of the distributed storage system 102 receives the request for writing content data from the application server 101, the storage node may determine to write the data according to the capacity load and the traffic load of each storage node in the distributed storage system 102. The storage node of the content data then notifies the application server 101 to write the content data to the determined storage node, and locally creates the management data of the content data. The storage location of the content data is recorded in the management data, that is, the storage node where the content data is located. When accessing the content data, the application server 101 calculates a storage node to which the content data belongs by using a distributed algorithm used when writing the content data, and transmits a request to access the content data to the calculated storage node.
  • the storage node that receives the request learns the storage node where the content data is located by searching the management data of the content data, and then forwards the request to the storage node storing the content data, and the storage node storing the content data
  • the application server 101 provides the content data.
  • Embodiments of the present invention can be applied to heterogeneous distributed storage systems.
  • different storage nodes may use different storage media, such as SATA disks, SAS disks, SSD disks, memory, and the like.
  • Different storage media have different storage capabilities (eg, capacity) and turbulence capabilities.
  • the turbulence capability depends on the hardware capabilities of the storage medium, such as the ⁇ traffic per unit time that the hardware of the storage medium can support.
  • the capacity load of each storage node may be calculated based on the storage capacity supported by each storage node and the used storage capacity, and may be based on the turbulence supported by each storage node.
  • the capacity and the average traffic volume of each storage node are calculated to calculate the traffic load of each storage node, so that the content data is distributed on the basis of fully considering the respective capacity conditions and turbulence conditions of each storage node, thereby avoiding the use of a unified algorithm for content data distribution.
  • the waste of resources Moreover, in the embodiment of the present invention, the storage node calculated by the application server 101 through the distributed algorithm is used as the response The access node of the server 101 is used. Then, the storage node that stores the content data is determined by the storage node as the access node according to the capacity load and the traffic load of each storage node. In this way, bottlenecks in centralized design can be avoided.
  • FIG. 2 is a schematic diagram of a computer device according to an embodiment of the present invention.
  • the computer device 200 includes at least one processor 201, a communication bus 202, a memory 203, and at least one communication interface 204.
  • the processor 201 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Communication bus 202 can include a path for communicating information between the components described above.
  • the communication interface 204 is applicable to any device such as a transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), etc.
  • the communication interface 204 can be used to communicate with an application server and with other storage nodes in the distributed storage system.
  • the memory 203 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
  • the dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this.
  • the memory can exist independently and be connected to the processor via a bus.
  • the memory can also be integrated with the processor.
  • the memory 203 is used to store application code for executing the solution of the present invention, and is controlled by the processor 201 for execution.
  • the processor 201 is configured to execute application code stored in the memory 203 (such as program code implementing a data manager, implementing program code of a migration manager, etc.).
  • the memory is further configured to store content data and management data of the content data.
  • processor 201 may include one or more CPUs, such as CPU0 and CPU1 in FIG.
  • computer device 200 can include multiple processors, such as two processors 201 shown in FIG. Each of these processors can be a single-CPU processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
  • the computer device 200 described above can be a general purpose computer device or a special purpose computer device.
  • computer device 200 can be a network server, a communication device, an embedded device, or a device having a similar structure in FIG. Embodiments of the invention do not limit the type of computer device 200.
  • FIG. 3 is a flowchart of a method for implementing distributed storage according to an embodiment of the present invention. As shown in FIG. 2, this embodiment is a process of writing content data, and the process includes:
  • the application server sends a request for writing content data to a first storage node in the distributed system.
  • the request message may carry description information of the content data, for example, identifier, size or type of the content data.
  • a storage node for writing the content data can be determined in the distributed system.
  • the application server can adopt a distributed algorithm (for example, The DHT algorithm) calculates a storage node for writing the content data.
  • the storage node determined by the application server is referred to as a first storage node.
  • the DHT algorithm is taken as an example to describe the principle of using distributed algorithms to calculate the distribution of content data.
  • the distributed storage system hashes the corresponding key (key) to a space having 2 ⁇ 32 power buckets according to a commonly used hash algorithm, that is, a number of 0 to (2 ⁇ 32)-1. In space. These numbers can be joined end to end to form a closed loop. See Figure 3-1 below.
  • the distributed storage system can process the content data through a certain hash algorithm and map it to the ring shown in FIG. 5.
  • the mapping process is described by taking the four content data of object1, object2, object3, and object4 as an example.
  • the key values corresponding to the four content data of object1, object2, object3, and object4 are calculated by a specific hash function, and the key values are as follows:
  • Hash(object1) key1
  • Hash(object2) key2
  • Hash(object3) key3
  • Hash(object4) key4
  • the calculated key value is then hashed onto the Hash ring. See Figure 3-2 below.
  • the distributed storage system then maps the storage nodes to the ring through a hash algorithm. Specifically, the distributed storage system maps the storage node to the ring by using the same hash algorithm as the mapped content data (generally, the hash calculation of the storage node may adopt the IP of the storage node or a unique alias of the storage node as a hash. Enter the value) and turn it in a clockwise direction to store all content data in the storage node closest to itself.
  • Hash(NODE1) KEY1
  • Hash(NODE2) KEY2
  • Hash(NODE3) KEY3
  • the obtained KEY value is mapped into the ring, and its schematic diagram is shown in Figure 3-3.
  • the content data is in the same hash space as the storage node, so that object1 is stored clockwise and stored in NODE1, object3 is stored in NODE2, and object2 and object4 are stored in NODE3. Therefore, by calculating the hash value of the content data, the storage node to which the content data should belong can be quickly located.
  • the first storage node acquires a capacity load and a traffic load of each storage node in the distributed system.
  • Each storage node in a distributed system can calculate its own capacity load and traffic load periodically or in real time.
  • Each storage node in a distributed system can periodically synchronize its respective capacity load and traffic load to other storage nodes.
  • a real-time request may be employed, that is, each time a request to write content data is received, the storage node that received the request queries the other storage nodes. If the periodic synchronization is adopted, the first storage node acquires the capacity load and the traffic load of each storage node from the synchronized data. If the real-time request is adopted, the first storage node acquires the capacity load and the traffic load of each storage node by sending a query request to each storage node.
  • the capacity load of each storage node may be calculated according to the storage capacity supported by each storage node and the used storage capacity.
  • the used storage capacity can be divided by the supported storage capacity to obtain the used capacity ratio, and the used capacity ratio is used to represent the capacity load.
  • the remaining storage capacity may also be calculated first, and the remaining storage capacity is divided by the supported storage capacity to obtain the remaining capacity ratio, and the remaining capacity ratio is used to represent the capacity load. The difference is that when the used capacity is used, the larger the used capacity is, the larger the capacity load is. Use the ratio of remaining capacity In the case of representation, the larger the ratio of remaining capacity, the smaller the capacity load.
  • the traffic load of each storage node can be calculated according to the turbulence capability supported by each storage node and the respective average ⁇ traffic.
  • the turbulence capability may be the outbound traffic within a unit time supported by the hardware capabilities of the storage node.
  • the average traffic volume can be the average traffic volume per unit time in the most recent statistical period. The duration of the statistical period can be pre-configured.
  • the average ⁇ traffic can be divided by the turbulence capacity of the storage node to obtain the turbulence ratio, and the traffic load is represented by the turbulence ratio. The larger the turbulence ratio, the greater the traffic load.
  • the remaining traffic volume by subtracting the average traffic volume from the turbulence capability, and then dividing the remaining traffic volume by the turbulence capability of the storage node to obtain the remaining traffic ratio, and representing the traffic load by the ratio of the remaining traffic.
  • the calculation algorithm of the foregoing capacity load and traffic load is only an example, and other algorithms may be used for calculation, and no limitation is made herein.
  • the capacity load is represented by the occupied capacity ratio
  • the traffic load is represented by the sag ratio.
  • the first storage node determines a second storage node for writing content data based on the acquired capacity load and the traffic load.
  • a storage node for writing content data determined based on a capacity load and a traffic load is referred to as a second storage node.
  • the following describes how to determine the second storage node.
  • the first storage node finds a storage node whose capacity load is within a preset capacity range from Set1 (for example, selects a storage node whose capacity ratio is less than 70%, wherein the capacity ratio represents a capacity load, and less than 70% is A preset capacity range, which is configurable, forms a first node set Set2.
  • the preset capacity range may not be a fixed value, but an expression.
  • the preset capacity range can be expressed as: capacity ratio ⁇ (1 - content data size / capacity size), wherein the capacity ratio represents the current capacity load, and the content data size is the size of the content data to be written this time. .
  • the eligible storage nodes are different for the size of the content data written each time.
  • the first storage node finds a storage node whose traffic load is within a preset traffic range from Set1 (for example, selects a storage node whose turbulence ratio is less than 80%, wherein the turbulence ratio represents a traffic load, less than 80 % is the preset traffic range, which is configurable) to form the second node set Set3.
  • Set1 for example, selects a storage node whose turbulence ratio is less than 80%, wherein the turbulence ratio represents a traffic load, less than 80 % is the preset traffic range, which is configurable
  • the first storage node selects a second storage node for writing content data from the intersection set4.
  • it may be randomly selected, and other factors may be considered for optimization. The following is an example of considering reducing cross-node access and considering access performance.
  • a second storage node for writing content data from the intersection it may further determine whether there is a first storage node in the intersection, and if there is a first storage node, preferably the first storage node A second storage node for writing content data.
  • the storage node having the smallest traffic load is selected from the intersection as the second storage node that writes the content data.
  • the content data to be written is cold content data, and the storage node having the smallest capacity load is selected from the intersection as the second storage node that writes the content data.
  • the first storage node notifies the application server to write the content data to the second storage node.
  • the application server writes the content data to the second storage node.
  • the second storage node After the content data is written, the second storage node notifies the first storage node to create management data of the content data.
  • the second storage node may calculate a first storage node for storing management data for the content data using a distributed algorithm consistent with the application server.
  • the identifier of the first storage node may be sent by the application server to the second storage node, so that the second storage node learns the first management data for storing the content data. Storage node.
  • the first storage node creates management data of the content data, and records a storage location of the content data in the management data (that is, stores the content data in a second storage node).
  • step S306 may be optional.
  • the first storage node may directly create management data of the content data after the notification application server of step 304 writes the content data to the second storage node.
  • the management data of the content data is created by using the storage node calculated by the distributed algorithm, so that the management of the content data can be distributed to the storage nodes of the distributed storage system, thereby avoiding adopting the central node. Performance bottlenecks caused by management.
  • the distribution of content data is distributed according to the hardware capabilities (such as storage capacity and turbulence capability) of each storage node, thereby avoiding the problem of resource waste caused by using a single distributed algorithm in a heterogeneous environment.
  • FIG. 4 is a flowchart of accessing content data according to an embodiment of the present invention. As shown in Figure 4, the access process includes:
  • the application server initiates a request for accessing content data to the first storage node.
  • the application server may calculate a first storage node storing the management data of the content data by using a distributed algorithm consistent with the writing of the content data, and initiate an access request to the calculated first storage node, where the access request carries The identification of the content data, the type of access operation, and the application server identifier.
  • the access operation type includes reading content data and the like.
  • the first storage node determines a storage location of the accessed content data.
  • the first storage node searches for management data of the content data according to the identifier of the content data, and obtains a storage location of the content data from the management data.
  • the first storage node forwards the request for accessing the content data to the second storage node that stores the content data.
  • the second storage node sends the content data requested to be accessed to the application server.
  • the second storage node may locally extract the content data by accessing the identifier of the content data carried in the request, and send the extracted content data to the application server according to the identifier of the application server carried in the access request.
  • the second storage node may return the content data to the first storage node, and the first storage node forwards the content data to the application server.
  • the application server can find the management data of the content data through the distributed algorithm, thereby finding the content data through the management data. That is to say, in the embodiment of the present invention, the access of the content data can be separated from the limitation of the distributed algorithm, and distributed according to the respective capabilities of each storage node, thereby improving the availability of resources.
  • the solution of the embodiment of the present invention can further optimize the expansion of the distributed storage system.
  • the distributed data system stores the content data and the management data of each content data.
  • Each tube The management data includes a storage location of the content data corresponding to the management data.
  • Management data is distributed among storage nodes of the distributed storage system by a distributed algorithm.
  • the content data is distributed according to the capacity load and traffic load of each storage node.
  • the embodiment of the present invention can migrate only the management data without migrating the content data.
  • the distributed storage system can recalculate the distribution of each content data through a distributed algorithm.
  • the distributed algorithm used in capacity expansion can be consistent with the distributed algorithm used when writing content data.
  • the distributed system After the calculated content data that should be attributed to the expansion storage node, the distributed system retains the calculated content data that should be attributed to the expansion storage node in the original storage node that stores the content data, and the calculated content data is to be calculated.
  • the management data is migrated to the expansion storage node.
  • FIG. 5 is a schematic diagram of a capacity expansion of a distributed storage system according to an embodiment of the present invention.
  • the distributed storage system includes a storage node identified as node1 and a storage node identified as node2.
  • the content data C1, C3, Cn, Cn+3, and Cm are stored in the node 1, and the node 2 stores C2, Cn+1, and Cm+1.
  • the storage node identified as node3 is added.
  • the content data C2, Cn+2, Cn+3, Cm should belong to node3. Therefore, the distributed storage system migrates the management data of the content data C2, Cn+2, Cn+3, Cm to node3.
  • the DHT algorithm is taken as an example to describe how the distribution of content data changes after capacity expansion.
  • a new storage node NODE4 is added, and KEY4 is obtained through the corresponding hash algorithm and mapped into the ring, as shown in FIG. 6.
  • object2 should belong to NODE4, and other content data will remain in the original storage location.
  • the core value of the distributed algorithm is that the storage node to which the content data belongs can be calculated according to the information of the content data and the information of the storage node in the distributed storage system.
  • the results calculated by the distributed algorithm will change. Therefore, after a new storage node is added, some content data in the original storage node needs to be migrated to the new storage node, so that the newly added storage node can be served online.
  • the storage space of the storage node is very large, the amount of content data to be migrated is usually calculated to be very large, which causes the migration to take a long time and seriously exceeds the time window of the expansion operation.
  • management data for each content data
  • the management data is much smaller than the content data itself, the migration amount is very small, which greatly shortens the migration time, and enables the distributed storage system to provide services quickly after the expansion.
  • the distribution of the content data is further optimized. Specifically, the hot content data can be migrated to a storage node with strong turbulence capability, and the cold content data is migrated to a storage node with strong storage capability. The migration process is described in detail below.
  • the distributed storage system analyzes the migration relationship of each storage node.
  • Each storage node in the distributed storage system may select an decision node from each storage node by using an election algorithm, and the decision node analyzes the migration relationship of each storage node.
  • the election process can be implemented using existing election algorithms, and will not be described here.
  • the decision node can sort the storage nodes according to the traffic load, and determine the first migration relationship between the storage nodes according to the principle that the storage node with large traffic load migrates the hot content data to the storage node with small traffic load.
  • the first migration relationship includes an inbound storage node and an emigration storage node that form a migration pair.
  • the first migration relationship can be represented by a migration pair set. Assume that there are five storage nodes of Node1-Node5 in the distributed storage system, according to the capacity load from small to large.
  • the sorting sequence is ⁇ Node1>Node2>Node3>Node4>Node5 ⁇ , which is recorded as O-Hot in this embodiment.
  • Node1 and Node5 can form a migration pair
  • Node2 and Node4 form a migration pair
  • the migration pair formed by Node1 and Node5 can be expressed as Pair1 ⁇ Node1-->Node5 ⁇
  • the migration pair formed by Node2 and Node4 can be expressed as Pair2 ⁇ Node2-->Node4 ⁇ .
  • the arrow symbol represents the direction of migration.
  • the first migration relationship can be expressed as Set ⁇ Pair1, Pair2 ⁇ .
  • the decision node may further sort the storage nodes according to the capacity load, and determine the second migration relationship between the storage nodes according to the principle that the storage nodes with large capacity load migrate cold content data to the storage nodes with small capacity load.
  • the second migration relationship includes an inbound storage node and a migration storage node that form a migration pair.
  • the second migration relationship can also be represented by a migration pair set. Assume that the above-mentioned five storage nodes of Node1-Node5 have a sorting sequence from small to large according to the capacity load as ⁇ Node5>Node4>Node3>Node2>Node1 ⁇ , and this sequence is recorded as O-Space in this embodiment.
  • Node1 and Node5 can form a migration pair
  • Node2 and Node4 form a migration pair
  • the migration pair formed by Node1 and Node5 can be expressed as Pair1' ⁇ Node1-->Node5 ⁇
  • the migration pair formed by Node2 and Node4 can be expressed as Pair2' ⁇ Node2-->Node4 ⁇
  • the arrow symbol represents the direction of migration.
  • the second migration relationship can be expressed as Set ⁇ Pair1', Pair2’ ⁇ .
  • the distributed storage system sends the migration relationship to the egress node in the migration relationship.
  • the decision node in the distributed storage system can separately send the migration pair in the migration relationship to the egress node in each migration pair.
  • the migrating node identifies the content data to be migrated according to the migration relationship.
  • the eviction node sorts the content data stored by the node according to the access heat. After sorting the content data of the node, the egress node in the first migration relationship selects n (n-value configurable) content data with the highest popularity as the hot content data to be migrated, and the set of hot content data to be migrated. , recorded as Setn-H. After sorting the content data of the node, the egress node in the second migration relationship selects the cold content data to be migrated by the m (m value configurable) content data with the lowest access heat, and the cold content data to be migrated, Recorded as Setn-C.
  • n n-value configurable
  • the outbound node notifies the migrated node that is paired with the migrated node with the content data to be migrated.
  • the ingress node generates an immigration list according to the content data notified by the evicting node.
  • the move-in list can consist of multiple records in the format ⁇ NodeN, Cn ⁇ . Among them, NodeN represents the migration node, and Cn represents the migrated content data.
  • the migrating node may further determine, according to the remaining capacity size, which of the content data to be migrated by the migrating node is acceptable, and generate an migrating list according to the acceptable content data.
  • the moving in node performs the relocation of the content data according to the moving in list.
  • Node4 requests the data content of C1 from Node2.
  • Node4 After requesting the content data of C1, Node4 writes the content data of C1 to the node.
  • Node4 After Node4 finishes writing the content data of C1, it notifies Node3 to modify the management data of C1, and changes the storage location of C1 to Node4.
  • Node3 After Node4 finishes writing the content data of C1, it notifies Node3 to modify the management data of C1, and changes the storage location of C1 to Node4.
  • Node4 notifies Node2 to delete the content data of C1.
  • the migration of the hot content data and the migration of the cold content data may be performed as needed, or both, and no limitation is imposed here.
  • hot content data can be migrated to a storage node with strong outflow capability to improve access performance.
  • cold content data it can be migrated to a storage node with a large capacity, thereby releasing the capacity space of the storage node with strong outflow capability to the hot content data, thereby improving the performance and utilization of the entire distributed storage system.
  • FIG. 7 shows a possible structural diagram of a storage node involved in the above embodiment. As shown in FIG. 7 , this embodiment is described by taking a storage node that receives a request for writing content data of an application server as an example.
  • the storage node is referred to as a first storage node.
  • the first storage node includes: a communication interface 701, a data manager 702, a migration manager 703, and a memory 704.
  • the communication interface 701 is used to interact with an application server and/or other storage nodes.
  • the communication interface 701 can receive a request for writing content data sent by the application server, notify the application server to write the content data to the second storage node determined by the data manager 702, and receive an access request of the application server for the content data.
  • the data manager 702 is configured to perform distributed management on the content data to be written, and perform scheduling management on the content data to be accessed.
  • For the distribution management process of the content data refer to the capacity load of the storage node and the traffic load to determine the second storage node for writing the content data and the management data portion for creating the content data, which are not described herein again.
  • the memory 704 is used to store content data as well as management data.
  • the migration manager 703 is used to manage the migration of hot content data and cold content data, and is also used to manage the migration of management data when the distributed system is expanded.
  • the migration manager 703 may be made to the part of the migration process in the embodiment and the part of the expansion embodiment shown in FIG. 5, and details are not described herein again.
  • the disclosed systems and methods can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit.
  • the above integrated modules can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional units described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform portions of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a random access memory (English name: Random Access Memory, RAM for short), a magnetic disk or an optical disk, and the like, which can store data.
  • Another embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the storage node, including a program designed to execute the method embodiment shown in FIG.
  • Another embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the distributed storage system described above, including a program designed to execute the above-described FIG. 5 and the migration method embodiment. Capacity migration and migration of hot and cold content data can be achieved by executing stored programs.
  • embodiments of the present invention can be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program is stored/distributed in a suitable medium, provided with other hardware or as part of the hardware, or in other distributed forms, such as over the Internet or other wired or wireless telecommunication systems.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式存储方法,该方法中,分布式存储系统的第一存储节点在接收到来自应用服务器的写入内容数据的请求时,第一存储节点可以根据分布式存储系统中各存储节点的容量负载和流量负载来确定用于写入该内容数据的第二存储节点,然后通知应用服务器将内容数据写入确定出的第二存储节点,并在本地创建该内容数据的管理数据。管理数据中记录该内容数据的存储位置,即该内容数据所位于的存储节点。在充分考虑各存储节点各自的容量情况和岀流情况的基础上来分布内容数据,避免了采用统一算法进行内容数据分布所导致的资源浪费问题。

Description

一种分布式存储的方法和系统
本申请要求于2016年6月29日提交中国专利局、申请号为201610507128.6、发明名称为“一种分布式存储的方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域,特别涉及一种分布式存储的方法和系统。
背景技术
分布式存储系统是由多台具有有存储能力的服务器组成。其中,各服务器通过网络互联,对外作为一个整体提供存储服务。通常情况下,分布式存储系统有两种设计,一种是中心化设计,此设计采用集中部署的中心服务器进行数据分布的分配和管理,客户端访问数据之前向中心服务器询问数据位置,确定数据位置后再向数据归属的服务器发起读取数据的请求。另一种是去中心化设计,此设计是采用分布式算法,如,分布式哈希(Distributed Hash Table,DHT)算法,计算数据位置的方式进行数据分布的管理,客户端请求数据时,根据请求的数据的信息计算出数据归属的服务器,直接向数据归属的服务器发起请求。然而,中心化设计所有数据管理都在中心节点进行,性能受中心节点的能力限制,存在性能瓶颈。而采用分布式设计,虽然解决了中心化设计中的性能瓶颈问题,但是去中心化设计的数据路由采用计算方式,对于异构的分布式存储系统,由于各服务器的存储介质多样化,例如,SATA盘、SAS盘、SSD盘、内存等,不同存储介质间的存储能力和岀流能力差异非常大。而且岀流能力越强价格越高,所能配置的容量越小。也就是说存储能力和岀流能力存在非常大的矛盾,采用单一的算法无法兼顾这种差异性和矛盾性,往往出现容量越大的服务器存储的数据很少的情况,导致资源的浪费。
发明内容
本发明实施例提供了一种实现分布式存储的方法和系统,以解决现有技术中存储资源浪费的问题。
为达到上述目的,本发明采用如下技术方案:
一方面,本发明实施例提供了一种分布式存储方法,该方法中,分布式存储系统的第一存储节点在接收到来自应用服务器的写入内容数据的请求时,第一存储节点可以根据分布式存储系统中各存储节点的容量负载和流量负载来确定用于写入该内容数据的第二存储节点,然后通知应用服务器将内容数据写入确定出的第二存储节点,并在本地创建该内容数据的管理数据。管理数据中记录该内容数据的存储位置,即该内容数据所位于的存储节点。在充分考虑各存储节点各自的容量情况和岀流情况的基础上来分布内容数据,避免了采用统一算法进行内容数据分布所导致的资源浪费问题。而且,本发明实施例中,将应用服务器101通过分布式算法计算得到的存储节点,作为该应用服务器101的接入节点。然 后,再由作为接入节点的存储节点根据各存储节点的容量负载和流量负载来确定存储内容数据的存储节点。这样,就可以避免中心化设计的瓶颈问题。
在一种可能的设计中,各存储节点的容量负载是根据各存储节点各自支持的存储容量以及各自的已用存储容量计算得到。各存储节点的流量负载是根据所述各存储节点各自支持的岀流能力以及各自的平均岀流量计算得到的。
在一种可能的设计中,可以采用求交集的方式来综合考虑容量负载以及流量负载。具体的,第一存储节点找出容量负载在预设容量范围内的存储节点,形成第一节点集合。第一存储节点还找出流量负载在预设流量范围内的存储节点,形成第二节点集合。然后,再从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点。
其中,在选择的时候,可以随机选择,也可以考虑其它因素进行优选。下面分别以考虑减少跨节点访问以及考虑访问性能为例进行说明。
在一种可能的设计中,提高访问性能,从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点具体包括:第一存储节点判断要写入的所述数据是热数据还是冷数据;若为热数据,所述第一存储节点从所述第一节点集合和第二节点集合的交集中选择流量负载最小的存储节点作为写入所述数据的第二存储节点;若为冷数据,所述第一存储节点从所述第一节点集合和第二节点集合的交集中选择容量负载最小的存储节点作为写入所述数据的第二存储节点。
在一种可能的设计中,为了减少跨节点访问,在从交集中选择用于写入内容数据的第二存储节点时,可进一步判断交集中是否存在第一存储节点,如果存在第一存储节点,则优选第一存储节点作为用于写入内容数据的第二存储节点。
在一种可能的设计中,该方法还包括对内容数据的访问过程。其中,第一存储节点接收访问内容数据的请求,从所述管理数据中确定出所述内容数据存储于所述第二存储节点,并将所述访问所述内容数据的请求转发给所述第二存储节点。
为了提高系统的性能,本发明实施例可以将热内容数据存储到岀流能力强的存储节点,将冷内容数据存储到存储容量大的存储节点,下面对该过程进行具体说明。
在一种可能的设计中,各存储节点可以根据内容数据被访问的次数确定该内容数据的访问热度,并在该内容数据的管理数据中记录所述内容数据的访问热度,以便后续根据该访问热度进行内容数据的迁移。
在一种可能的设计中,第一存储节点根据自身存储的内容数据的访问热度,识别出热内容数据,第一存储节点将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,并通知存储所述热内容数据的管理数据的存储节点更新所述热内容数据的存储位置。
在一种可能的设计中,所述第一存储节点将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,具体包括:将所述分布式存储系统的各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。所述第一存储节点将所述热数据迁移到在所述第一迁移关系中与所述第一存储节点结成迁移对的迁入存储节点中。
在一种可能的设计中,第一存储节点还根据自身存储的内容数据的访问热度,识别出 的冷内容数据。第一存储节点将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点,并通知存储所述冷内容数据的管理数据的存储节点更新所述冷内容数据的存储位置。
在一种可能的设计中,所述第一存储节点将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点,具体包括:将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。所述第一存储节点将所述冷内容数据迁移在所述第二迁移关系中与所述第一存储节点结成迁移对的迁入存储节点。
通过上述迁移过程,能够释放高岀流能力的存储节点的存储资源,并且可以将热内容数据存储到岀流能力强的存储节点,从而提升整个分布式存储系统的性能。
又一方面,本发明实施例提供了一种分布式存储方法,该方法可以应用于分布式存储系统扩容的场景下。其中,分布式系统中存储有内容数据以及各内容数据的管理数据,每个管理数据中包括与所述管理数据对应的内容数据的存储位置;所述管理数据通过分布式算法分布在所述分布式存储系统的各存储节点中,所述内容数据按照各存储节点的容量负载和流量负载进行分布。在分布式存储系统扩容时,分布式存储系统通过分布式算法对各内容数据的分布进行重新计算。将计算出的应归属于扩容存储节点的内容数据保留在存储所述内容数据的原存储节点,将计算出的所述内容数据的管理数据迁移到所述扩容存储节点。在本发明实施例中,通过为每个内容数据创建管理数据,在扩容时,仅对管理数据进行迁移,而不真正迁移内容数据本身。由于管理数据远小于内容数据本身,因此迁移量非常小,大大缩短了迁移时长,使得分布式存储系统在扩容后可以快速的提供服务。
此外,由于不同存储介质的存储能力和岀流能力不同,势必存在存储能力相对更强或者岀流能力相对更强的节点。因此,为了提高对内容数据的操作效率,本发明实施例还对内容数据的分布进行了进一步的优化:将热内容数据迁移到岀流能力强的存储节点,将冷内容数据迁移到存储能力强的存储节点。下面对该迁移过程在进行说明。
在一种可能的设计中,分布式存储系统将各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。所述分布式存储系统的各存储节点按照所述第一迁移关系进行热内容数据的迁移,并在所述热内容数据的管理数据中将所述热内容数据的存储位置更新为迁移后的存储位置。
在一种可能的设计中,分布式存储系统还将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。分布式存储系统的各存储节点按照所述第二迁移关系进行冷内容数据的迁移,并在所述冷内容数据的管理数据中将所述冷内容数据的存储位置更新为迁移后的存储位置。
在一种可能的设计中,所述分布式存储系统的各存储节点按照所述第一迁移关系进行热内容数据的迁移具体包括:第一迁移关系中的迁出存储节点识别出本节点存储的内容数据中的热内容数据。将识别出的热内容数据迁移到在所述第一迁移关系中与所述迁出存储 节点结成迁移对的迁入存储节点中。
在一种可能的设计中,所述分布式存储系统的各存储节点按照所述第二迁移关系进行冷内容数据的迁移具体包括:第二迁移关系中的迁出存储节点识别出自身存储的内容数据中的冷内容数据。将识别出的冷数据迁移到在所述第二迁移关系中与所述迁出存储节点结成迁移对的迁入存储节点中。
又一方面,本发明实施例提供了一种存储节点,该存储节点具有实现上述方法实施例中第一存储节点行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的部件(比如,基于分布式存储系统的各存储节点的容量负载和流量负载确定内容数据的分布)。
又一方面,本发明实施例提供了一种分布式存储系统,该分布式存储系统具有实现上述方法实施例中分布式存储系统行为的功能(包括分布式存储系统中各存储节点的行为的功能)。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的部件(比如,扩容迁移,冷热数据迁移等)。
再一方面,本发明实施例提供了一种计算机存储介质,用于储存为上述第一存储节点所用的计算机软件指令,其包含用于执行上述方面为第一存储节点所设计的程序。
再一方面,本发明实施例提供了一种计算机存储介质,用于储存为上述分布式存储系统所用的计算机软件指令,其包含用于执行上述方面为分布式存储系统所设计的程序。
本发明实施例通过将页面组件设计成包括数据服务和配置的方式,从而可以通过配置的方式来进行页面的修改,将现有技术中的需要硬代码写JS代码部分的工作完全变成了配置,大大缩短交付时间。
在上述实施例中,通过采用分布式算法计算得到的存储节点来创建内容数据的管理数据,这样就可以将对内容数据的管理分布到分布式存储系统的各存储节点上,避免了采用中心节点进行管理的造成的性能瓶颈问题。而对内容数据的分布则根据各存储节点各自的硬件能力(如,存储能力、岀流能力)进行分布,则避免了在异构环境下,采用单一的分布式算法所造成的资源浪费问题。
附图说明
图1为本发明实施例提供的一种实现分布式存储的网络架构图;
图2为本发明实施例提供的一种的计算机设备示意图;
图3为本发明实施例提供的一种实现分布式存储的方法流程图。
图3-1为本发明实施例提供的一种数字空间的示意图;
图3-2为本发明实施例提供的一种内容数据映射的示意图;
图3-3为本发明实施例提供的一种存储节点映射的示意图;
图4为本发明实施例提供的一种对内容数据的访问流程图;
图5为本发明实施例提供的一种分布式存储系统进行扩容的示意图;
图6为本发明实施例提供的一种扩容后的迁移示意图;
图7为本发明实施例提供的一种存储节点的结构图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明实施例描述的网络架构以及业务场景是为了更加清楚的说明本发明实施例的技术方案,并不构成对于本发明实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本发明实施例提供的技术方案对于类似的技术问题,同样适用。
参见图1为本发明实施例提供的实现分布式存储的网络架构图,该网络架构中包括分布式存储系统102和至少一个应用服务器101。分布式存储系统102中包括两个或两个以上存储节点(图1中仅示出三个作为举例)。其中,存储节点可以是具有存储能力的服务器。各存储节点通过网络互联,对外作为一个整体提供存储服务。
在图1所示的实施例中,应用服务器101和分布式存储系统102的存储节点均可以连接到网络中,通过网络进行通信。该网络可以是互联网(Internet)或局域网或无线网络等其他类型的网络。
应用服务器101可以访问分布式存储系统102中的存储节点,进行数据的写入或读取等操作。在具体实现时,应用服务器101可以采用分布式算法,例如,分布式哈希表(distributed hash table,DHT)算法,计算需要写入或读取的内容数据应归属的存储节点,并通过访问该存储节点进行内容数据的写入或读取操作。其中,可在应用服务器101上部署与存储节点进行交互的客户端。由该客户端进行计算处理,从而向应用服务器101中的应用屏蔽分布式存储系统102内部的组网结构。
分布式存储系统102的存储节点在接收到来自应用服务器101的写入内容数据的请求时,存储节点可以根据分布式存储系统102中各存储节点的容量负载和流量负载来确定用于写入该内容数据的存储节点,然后通知应用服务器101将内容数据写入确定出的存储节点,并在本地创建该内容数据的管理数据。管理数据中记录该内容数据的存储位置,即该内容数据所位于的存储节点。应用服务器101在访问该内容数据时,采用写入该内容数据时所使用的分布式算法计算该内容数据应归属的存储节点,并向计算出的存储节点发送访问该内容数据的请求。接收到该请求的存储节点通过查找该内容数据的管理数据,获知该内容数据的所位于的存储节点,然后将该请求转发到存储该内容数据的存储节点,由存储该内容数据的存储节点向应用服务器101提供该内容数据。
本发明实施例可以应用于异构的分布式存储系统。在异构的分布式系统中,不同存储节点可以采用不同的存储介质,例如,SATA盘、SAS盘、SSD盘、内存等。不同存储介质的存储能力(如,容量大小)和岀流能力不同。其中,岀流能力依赖于存储介质的硬件能力,如,存储介质的硬件所能支持的单位时间的岀流量。本发明实施例在分布内容数据的过程中,可以基于各存储节点各自支持的存储容量以及各自的已用存储容量来计算得到各存储节点的容量负载,以及可以基于各存储节点各自支持的岀流能力以及各自的平均岀流量来计算得到各存储节点的流量负载,从而在充分考虑各存储节点各自的容量情况和岀流情况的基础上来分布内容数据,避免了采用统一算法进行内容数据分布所导致的资源浪费问题。而且,本发明实施例中,将应用服务器101通过分布式算法计算得到的存储节点,作为该应 用服务器101的接入节点。然后,再由作为接入节点的存储节点根据各存储节点的容量负载和流量负载来确定存储内容数据的存储节点。这样,就可以避免中心化设计的瓶颈问题。
需要说明的是,图1中所示的各存储节点可以采用图2中的计算机设备来实现。图2为所示为本发明实施例提供的计算机设备示意图。计算机设备200包括至少一个处理器201,通信总线202,存储器203以及至少一个通信接口204。
处理器201可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。
通信总线202可包括一通路,在上述组件之间传送信息。所述通信接口204,适用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等,在本发明实施例中,通信接口204可用于与应用服务器以及与分布式存储系统中的其它存储节点进行通信。
存储器203可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器203用于存储执行本发明方案的应用程序代码,并由处理器201来控制执行。所述处理器201用于执行所述存储器203中存储的应用程序代码(比如实现数据管理器的程序代码,实现迁移管理器的程序代码等)。本发明实施例中,存储器还用于存储内容数据以及内容数据的管理数据。
在具体实现中,作为一种实施例,处理器201可以包括一个或多个CPU,例如图2中的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备200可以包括多个处理器,例如图2中示出两个处理器201。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
上述的计算机设备200可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,计算机设备200可以是网络服务器、通信设备、嵌入式设备或有图2中类似结构的设备。本发明实施例不限定计算机设备200的类型。
图3为本发明实施例提供的实现分布式存储的方法流程图。如图2所示,本实施例为内容数据的写入过程,该过程包括:
S301,应用服务器向分布式系统中的第一存储节点发送写入内容数据的请求。该请求消息中可携带该内容数据的描述信息,例如,内容数据的标识、大小或者类型等。
分布式系统中包括有至少两个存储节点。应用服务器要写入内容数据时,可在分布式系统中确定一个用于写入该内容数据的存储节点。其中,应用服务器可以采用分布式算法(如, DHT算法)来计算用于写入该内容数据的存储节点。本实施例将应用服务器确定出的存储节点称为第一存储节点。
下面以DHT算法为例,对利用分布式算法计算内容数据的分布的原理进行详细介绍。
分布式存储系统按照常用的哈希(hash)算法来将对应的关键字(key)哈希到一个具有2^32次方个桶的空间中,即0~(2^32)-1的数字空间中。这些数字可以头尾相连形成成一个闭合的环形。如下图3-1所示。
分布式存储系统可以将内容数据通过一定的hash算法处理后映射到图5所示的环上。现在以object1、object2、object3、object4四个内容数据为例对该映射过程进行说明。首先,通过特定的Hash函数计算出object1、object2、object3、object4四个内容数据对应的key值,其key值如下:
Hash(object1)=key1;
Hash(object2)=key2;
Hash(object3)=key3;
Hash(object4)=key4;
然后将计算出的key值散列到Hash环上。如下图3-2所示。
分布式存储系统再将存储节点通过hash算法映射到环上。具体的,分布式存储系统通过使用与映射内容数据一样的Hash算法将存储节点也映射到环中(一般情况下。对存储节点的hash计算可采用存储节点的IP或者存储节点的唯一的别名作为输入值),然后以顺时针的方向转动,将所有内容数据存储到离自己最近的存储节点中。
假设现在有NODE1,NODE2,NODE3三个存储节点,通过Hash算法得到对应的KEY值,如下:
Hash(NODE1)=KEY1;
Hash(NODE2)=KEY2;
Hash(NODE3)=KEY3;
将得到的KEY值映射到环中,其示意图如图3-3所示。
通过图3-3可以看出内容数据与存储节点处于同一哈希空间中,这样按顺时针转动object1存储到了NODE1中,object3存储到了NODE2中,object2、object4存储到了NODE3中。因此,通过算出内容数据的hash值就能快速的定位到该内容数据应该归属的存储节点了。
S302,第一存储节点获取分布式系统中的各存储节点的容量负载以及流量负载。
分布式系统中的各存储节点可定期或实时的计算自身的容量负载和流量负载。分布式系统中的各存储节点可以定期将各自的容量负载以及流量负载同步到其它存储节点。或者,也可以采用实时请求的方式,即,在每次接收到写入内容数据的请求时,由接收到该请求的存储节点向其它各存储节点查询。如果采用定期同步的方式,第一存储节点则从已同步数据中获取各存储节点的容量负载和流量负载。如果采用实时请求的方式,第一存储节点则通过向各存储节点发送查询请求来获取各存储节点的容量负载和流量负载。
其中,各存储节点的容量负载可以根据各存储节点各自支持的存储容量以及各自的已用存储容量计算得到。例如,可以采用已用存储容量除以支持的存储容量得到已用容量占比,以该已用容量占比来表示容量负载。也可以先计算得到剩余存储容量,由剩余存储容量除以支持的存储容量得到剩余容量占比,以该剩余容量占比来表示容量负载。不同的是,采用已用容量占比来表示时,已用容量占比越大,表示容量负载越大。采用剩余容量占比来 表示时,剩余容量占比越大,表示容量负载越小。
各存储节点的流量负载可以根据所述各存储节点各自支持的岀流能力以及各自的平均岀流量计算得到的。其中,岀流能力可以是存储节点的硬件能力所能支持的单位时间内的出流量。平均岀流量可以是最近统计周期内的单位时间的平均岀流量。统计周期的时长可以预先配置。在计算流量负载时,可用平均岀流量除以存储节点的岀流能力得到岀流占比,以该岀流占比来表示流量负载。岀流占比越大,表示流量负载越大。当然,也可以以岀流能力减去平均岀流量得到剩余岀流量,再以剩余出流量除以存储节点的岀流能力得到剩余流量占比,以该剩余流量占比来表示流量负载。剩余流量占比越大,表示流量负载越小。
需要说明的是,上述容量负载以及流量负载的计算算法只是举例,也可以采用其它算法进行计算,这里不做限制。后续在本发明实施例中,以已用容量占比来表示容量负载,以及以岀流占比来表示流量负载为例进行说明。
S303,第一存储节点基于获取到的容量负载以及流量负载确定用于写入内容数据的第二存储节点。
本发明实施例中,将基于容量负载以及流量负载确定出的用于写入内容数据的存储节点称为第二存储节点。下面对如何确定出第二存储节点进行说明。
假设分布式存储系统中的存储节点的集合为Set1。
第一存储节点从Set1中找出容量负载在预设容量范围内的存储节点(例如,选择出容量占比低于70%的存储节点,其中,容量占比表示容量负载,低于70%为预设容量范围,该范围可配置),形成第一节点集合Set2。需要说明的是,预设容量范围也可以不是固定值,而是一个表达式。例如,预设容量范围可表示为:容量占比<(1-内容数据大小/容量大小),其中,容量占比表示当前的容量负载,内容数据大小为本次要写入的内容数据的大小。这样,对于每次写入的内容数据的大小的不同,符合条件的存储节点也不同。
第一存储节点从Set1中找出流量负载在预设流量范围内的存储节点(例如,选择出岀流占比低于80%的存储节点,其中,岀流占比表示流量负载,低于80%为预设流量范围,该范围可配置),形成第二节点集合Set3。
对Set2和Set3的求交集,得到集合set4。第一存储节点从交集set4中选择用于写入内容数据的第二存储节点。其中,在选择的时候,可以随机选择,也可以考虑其它因素进行优选。下面分别以考虑减少跨节点访问以及考虑访问性能为例进行说明。
为了减少跨节点访问,在从交集中选择用于写入内容数据的第二存储节点时,可进一步判断交集中是否存在第一存储节点,如果存在第一存储节点,则优选第一存储节点作为用于写入内容数据的第二存储节点。
为了提高访问性能,可以预先判断要写入的内容数据是热内容数据还是冷内容数据。若为要写入的内容数据热内容数据,则从交集中选择流量负载最小的存储节点作为写入所述内容数据的第二存储节点。若要写入的内容数据为冷内容数据,则从交集中选择容量负载最小的存储节点作为写入所述内容数据的第二存储节点。在判断内容数据是热内容数据还是冷内容数据时,可以根据要写入的内容数据的类型来判断。各存储节点中可预先配置哪些类型的内容数据为热内容数据,哪些类型的内容数据为冷内容数据。
需要说明的是,基于各节点的容量负载以及流量负载动态计算内容数据应该写入的存储节点的算法可以有多种,上述求交集的选择方式只是一种举例。
S304,第一存储节点通知应用服务器将内容数据写入到第二存储节点。
S305,应用服务器将内容数据写入到第二存储节点。
内容数据的写入过程可采用现有技术,这里不再赘述。
S306,第二存储节点在内容数据写入完成后,通知第一存储节点创建该内容数据的管理数据。
在一个实施例中,第二存储节点可以采用与应用服务器一致的分布式算法来计算得到用于存储该内容数据的管理数据的第一存储节点。
在另一个实施例中,也可以在步骤305中,由应用服务器将第一存储节点的标识发送给第二存储节点,从而使第二存储节点获知用于存储该内容数据的管理数据的第一存储节点。
S307,第一存储节点创建该内容数据的管理数据,在管理数据中记录该内容数据的存储位置(即记录该内容数据的存储于第二存储节点)。
需要说明的是,步骤S306可以是可选的,第一存储节点可以在步骤304的通知应用服务器将内容数据写入到第二存储节点后,直接创建该内容数据的管理数据。
在上述实施例中,通过采用分布式算法计算得到的存储节点来创建内容数据的管理数据,这样就可以将对内容数据的管理分布到分布式存储系统的各存储节点上,避免了采用中心节点进行管理的造成的性能瓶颈问题。而对内容数据的分布则根据各存储节点各自的硬件能力(如,存储能力、岀流能力)进行分布,则避免了在异构环境下,采用单一的分布式算法所造成的资源浪费问题。
下面对图3中写入的内容数据的访问过程进行详细介绍。图4为本发明实施例提供的对内容数据的访问流程图。如图4所示,该访问过程包括:
S401,应用服务器向第一存储节点发起访问内容数据的请求。
应用服务器可采用与写入该内容数据时一致的分布式算法计算出存储有该内容数据的管理数据的第一存储节点,并向计算出的第一存储节点发起访问请求,该访问请求中携带该内容数据的标识、访问操作类型以及应用服务器标识。其中,访问操作类型包括读取内容数据等。
S402,第一存储节点确定被访问的内容数据的存储位置。
第一存储节点根据内容数据的标识查找该内容数据的管理数据,从管理数据中获得该内容数据的存储位置。
S403,第一存储节点将访问该内容数据的请求转发给存储有该内容数据的第二存储节点。
S404,第二存储节点将请求访问的内容数据发送给应用服务器。
其中,第二存储节点可以访问请求中携带的内容数据的标识在本地提取出该内容数据,并根据访问请求中携带的应用服务器的标识将提取出的内容数据发送给应用服务器。
需要说明的是,上述步骤S404中,第二存储节点也可以将内容数据返回给第一存储节点,由第一存储节点将内容数据转发给应用服务器。
在图4所示的访问过程中,应用服务器可以通过分布式算法找到内容数据的管理数据,从而通过管理数据找到内容数据。也就是说,在本发明实施例中,内容数据的存取可以脱离分布式算法的限制,而依据各存储节点各自的能力进行分布,提高了资源的可用性。
此外,采用本发明实施例的方案,还可以进一步对分布式存储系统的扩容进行优化。根据上述实施例可知,分布式系统中存储有内容数据以及各内容数据的管理数据。每个管 理数据中包括与该管理数据对应的内容数据的存储位置。管理数据通过分布式算法分布在所述分布式存储系统的各存储节点中。内容数据按照各存储节点的容量负载和流量负载进行分布。在所述分布式存储系统扩容时,本发明实施例可以只对管理数据进行迁移,而不迁移内容数据。具体的,分布式存储系统可以通过分布式算法对各内容数据的分布进行重新计算。扩容时采用的分布式算法可与写入内容数据时采用的分布式算法一致。在计算出的应归属于扩容存储节点的内容数据后,分布式系统将计算出的应归属于扩容存储节点的内容数据保留在存储该内容数据的原存储节点,而将计算出的该内容数据的管理数据迁移到扩容存储节点。
图5为本发明实施例提供的分布式存储系统进行扩容的示意图。如图5所示,扩容前,分布式存储系统中包括标识为node1的存储节点和标识为node2的存储节点。其中,node1中存储了内容数据C1、C3、Cn、Cn+3、Cm,node2存储了C2、Cn+1、Cm+1。扩容时,增加了标识为node3的存储节点。扩容后,根据分布式算法,内容数据C2、Cn+2、Cn+3、Cm应归属node3。因此,分布式存储系统将内容数据C2、Cn+2、Cn+3、Cm的管理数据迁移至node3中。
下面以DHT算法为例,对扩容后,内容数据的分布如何变化进行详细说明。例如,在图2-3所示的实施例中,添加一个新的存储节点NODE4,通过对应的哈希算法得到KEY4,并映射到环中,如图6所示。通过按顺时针转动的规则,那么object2应该归属到NODE4中,其它内容数据则还保持原有的存储位置。
由上述实施例可知,分布式算法的核心价值是可以根据内容数据的信息和分布式存储系统中的存储节点的信息计算出内容数据归属的存储节点。当新增一个存储节点时,分布式算法所计算的结果将会发生变化。所以新增一个存储节点后,需要将原存储节点中的部分内容数据迁移到新存储节点后,才能让新增的存储节点上线提供服务。但是,由于存储节点的存储空间非常大,通常计算出的要迁移的内容数据的量也非常大,这就导致迁移耗时长,严重超出扩容操作的时间窗。而在本发明实施例中,通过为每个内容数据创建管理数据,在扩容时,仅对管理数据进行迁移,而不真正迁移内容数据本身。由于管理数据远小于内容数据本身,因此迁移量非常小,大大缩短了迁移时长,使得分布式存储系统在扩容后可以快速的提供服务。
此外,由于不同存储介质的存储能力和岀流能力不同,势必存在存储能力相对更强或者岀流能力相对更强的节点。因此,为了提高对内容数据的操作效率,本发明另一个实施例中,对内容数据的分布进行了进一步的优化。具体的,可以将热内容数据迁移到岀流能力强的存储节点,将冷内容数据迁移到存储能力强的存储节点。下面该迁移过程在进行详细介绍。
A,分布式存储系统分析各存储节点的迁移关系。
分布式存储系统中的各存储节点可采用选举算法从各存储节点中选择出一个决策节点,由该决策节点来分析各存储节点的迁移关系。选举的过程可采用现有的选举算法来实现,这里不再赘述。
决策节点可以将各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定出各存储节点间的第一迁移关系。该第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。该第一迁移关系可以以迁移对集合来表示。假设分布式存储系统中有Node1-Node5五个存储节点,按照容量负载从小到大 的排序序列为{Node1>Node2>Node3>Node4>Node5},本实施例中将该序列记为O-Hot。根据序列O-Hot,可将Node1与Node5结成迁移对,Node2与Node4结成迁移对。Node1与Node5结成的迁移对可表示为Pair1{Node1-->Node5},Node2与Node4结成的迁移对可表示为Pair2{Node2-->Node4}。其中,箭头符号代表的是迁移方向。第一迁移关系可表示为Set{Pair1,Pair2}。
决策节点也可以进一步将各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系。该第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点。同样的,第二迁移关系也可以以迁移对集合来表示。假设上述Node1-Node5五个存储节点,按照容量负载从小到大的排序序列为{Node5>Node4>Node3>Node2>Node1},本实施例中将该序列记为O-Space。根据序列O-Space,可将Node1与Node5结成迁移对,Node2与Node4结成迁移对。Node1与Node5结成的迁移对可表示为Pair1’{Node1-->Node5},Node2与Node4结成的迁移对可表示为Pair2’{Node2-->Node4}。其中,箭头符号代表的是迁移方向。第二迁移关系可表示为Set{Pair1’,Pair2’}。
B、分布式存储系统将迁移关系发送给迁移关系中的迁出节点。
分布式存储系统中的决策节点可以将迁移关系中的迁移对分别发送给各迁移对中的迁出节点。
C、迁出节点根据迁移关系识别出要迁移的内容数据。
迁出节点在接收到决策节点发送的迁移对后,对本节点存储的内容数据按照访问热度进行排序。第一迁移关系中的迁出节点在对本节点的内容数据排序后,选择出访问热度最高的n(n值可配置)个内容数据作为要迁移的热内容数据,要迁移的热内容数据的集合,记为Setn-H。第二迁移关系中的迁出节点在对本节点的内容数据排序后,选择出访问热度最低的m(m值可配置)个内容数据要迁移的冷内容数据,要迁移的冷内容数据的集合,记为Setn-C。
D、迁出节点将要迁移的内容数据通知与该迁出节点结对的迁入节点。
E、迁入节点根据迁出节点通知的内容数据生成迁入列表。
迁入列表可以由多条记录组成,记录格式为{NodeN,Cn}。其中,NodeN表示迁出节点,Cn表示迁移的内容数据。
在另一实施例中,迁入节点还可以根据自身剩余的容量大小来确定迁出节点要迁移的内容数据中哪些可以接受,根据可接受的内容数据生成迁入列表。
F、迁入节点根据迁入列表进行内容数据的搬迁。
假设迁移对Pair2{Node2-->Node4}中,Node4生成的迁入列表为{Node2,C1}。Node4根据DHT计算C1的管理数据归属的存储节点为Node3,则C1的搬迁流程如下:
1、Node4向Node2请求C1的数据内容。
2、Node4请求到C1的内容数据后,向本节点写入C1的内容数据。
3、Node4完成C1的内容数据的写入后,通知Node3修改C1的管理数据,将C1的存储位置修改为Node4。例如:
{C1,热度,Node2}---->{C1,热度,Node4}
4、Node4通知Node2的删除C1的内容数据。
需要说明的是,热内容数据的迁移和冷内容数据的迁移可以根据需要择一进行,也可以两者都进行,这里不做限制。
通过上述迁移过程,可以将热内容数据迁移到出流能力强的存储节点上,提高访问性能。另外,对于冷内容数据,则可以迁移到容量大的存储节点上,从而将出流能力强的存储节点的容量空间释放给热内容数据,提高整个分布式存储系统的性能和利用率。
图7示出了上述实施例中涉及的存储节点的一种可能的结构示意图。如图7所示,本实施例以接收应用服务器的写入内容数据的请求的存储节点为例进行说明,本实施例中将该存储节点称为第一存储节点。具体的,该第一存储节点包括:通信接口701,数据管理器702、迁移管理器703以及存储器704。
其中,通信接口701用于和应用服务器和/或其它存储节点交互。例如,通信接口701可以接收应用服务器发送的写入内容数据的请求,通知应用服务器将该内容数据写入到数据管理器702确定出的第二存储节点以及接收应用服务器对内容数据的访问请求。
数据管理器702用于对要写入的内容数据进行分布管理,并对要访问的内容数据进行调度管理。对内容数据进行分布管理过程可参见方法实施例中基于存储节点的容量负载以及流量负载确定用于写入内容数据的第二存储节点以及创建该内容数据的管理数据部分,这里不再赘述。
存储器704用于存储内容数据以及管理数据。
迁移管理器703用于管理热内容数据和冷内容数据的迁移,也用于管理在分布式系统扩容时,对管理数据的迁移。具体的实现可参考方式实施例中迁移过程部分以及图5所示的扩容实施例部分,这里不再赘述。
在本申请所提供的几个实施例中,应该理解到,所公开的系统和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、随机存取存储器(英文全称:Random Access Memory,简称:RAM)、磁碟或者光盘等各种可以存储数据的介质。
本发明实施例还提供了另一种计算机存储介质,用于储存为上述存储节点所用的计算机软件指令,其包含用于执行上述图3所示方法实施例所设计的程序。
本发明实施例还提供了另一种计算机存储介质,用于储存为上述分布式存储系统所用的计算机软件指令,其包含用于执行上述图5以及迁移方法实施例所设计的程序。通过执行存储的程序,可以实现扩容迁移和冷热内容数据的迁移。
本领域技术人员应明白,本发明的实施例可提供为方法、装置(设备)、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机程序存储/分布在合适的介质中,与其它硬件一起提供或作为硬件的一部分,也可以采用其他分布形式,如通过Internet或其它有线或无线电信系统。
本发明是参照本发明实施例的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管结合具体特征及其实施例对本发明进行了描述,显而易见的,在不脱离本发明的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本发明的示例性说明,且视为已覆盖本发明范围内的任意和所有修改、变化、组合或等同物。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (28)

  1. 一种分布式存储方法,其特征在于,应用于包括至少两个存储节点的分布式存储系统,
    所述方法包括:
    分布式存储系统中的第一存储节点接收应用服务器发送的写入内容数据的请求,所述第一存储节点为按照分布式算法计算出的用于管理所述内容数据的存储节点;
    所述第一存储节点获取所述分布式存储系统中各存储节点的容量负载以及流量负载,并基于获取到的容量负载以及流量负载确定用于写入所述内容数据的第二存储节点;
    所述第一存储节点通知所述应用服务器将所述内容数据写入到所述第二存储节点;
    所述第一存储节点在所述第一存储节点中创建所述内容数据的管理数据,所述管理数据中记录所述内容数据存储于第二存储节点。
  2. 如权利要求1所述的方法,其特征在于,所述各存储节点的容量负载是根据各存储节点各自支持的存储容量以及各自的已用存储容量计算得到;
    所述各存储节点的流量负载是根据所述各存储节点各自支持的岀流能力以及各自的平均岀流量计算得到的。
  3. 如权利要求1或2所述的方法,其特征在于,所述基于获取到的容量负载以及流量负载确定出用于写入所述数据对象的第二存储节点具体包括:
    所述第一存储节点找出容量负载在预设容量范围内的存储节点,形成第一节点集合;
    所述第一存储节点找出流量负载在预设流量范围内的存储节点,形成第二节点集合;
    从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点。
  4. 如权利要求3所述的方法,其特征在于,从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点具体包括:
    所述第一存储节点判断要写入的所述数据是热数据还是冷数据;若为热数据,所述第一存储节点从所述第一节点集合和第二节点集合的交集中选择流量负载最小的存储节点作为写入所述数据的第二存储节点;若为冷数据,所述第一存储节点从所述第一节点集合和第二节点集合的交集中选择容量负载最小的存储节点作为写入所述数据的第二存储节点。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点接收访问所述内容数据的请求;
    所述第一存储节点从所述管理数据中确定出所述内容数据存储于所述第二存储节点,并将所述访问所述内容数据的请求转发给所述第二存储节点。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点根据所述内容数据被访问的次数确定所述内容数据的访问热度,在所述内容数据的管理数据中记录所述内容数据的访问热度。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点根据自身存储的内容数据的访问热度,识别出热内容数据;
    所述第一存储节点将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,并通知存储所述热内容数据的管理数据的存储节点更新所述热内容数据的存储位置。
  8. 如权利要求7的方法,其特征在于,所述第一存储节点将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,具体包括:
    将所述分布式存储系统的各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述第一存储节点将所述热数据迁移到在所述第一迁移关系中与所述第一存储节点结成迁移对的迁入存储节点中。
  9. 如权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点根据自身存储的内容数据的访问热度,识别出的冷内容数据;
    所述第一存储节点将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点,并通知存储所述冷内容数据的管理数据的存储节点更新所述冷内容数据的存储位置。
  10. 如权利要求9的方法,其特征在于,所述第一存储节点将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点,具体包括:
    将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述第一存储节点将所述冷内容数据迁移在所述第二迁移关系中与所述第一存储节点结成迁移对的迁入存储节点。
  11. 一种分布式存储方法,其特征在于,应用于包括至少两个存储节点的分布式存储系统,所述分布式系统中存储有内容数据以及各内容数据的管理数据,每个管理数据中包括与所述管理数据对应的内容数据的存储位置;所述管理数据通过分布式算法分布在所述分布式存储系统的各存储节点中,所述内容数据按照各存储节点的容量负载和流量负载进行分布,所述方法包括:
    在所述分布式存储系统扩容时,所述分布式存储系统通过分布式算法对各内容数据的分布进行重新计算;
    将计算出的应归属于扩容存储节点的内容数据保留在存储所述内容数据的原存储节点,将计算出的所述内容数据的管理数据迁移到所述扩容存储节点。
  12. 如权利要求11所述的方法,其特征在于,所述方法还包括:
    所述分布式存储系统将各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述分布式存储系统的各存储节点按照所述第一迁移关系进行热内容数据的迁移;
    在所述热内容数据的管理数据中将所述热内容数据的存储位置更新为迁移后的存储位置。
  13. 如权利要求11或12所述的方法,其特征在于,所述方法还包括:
    所述分布式存储系统将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述分布式存储系统的各存储节点按照所述第二迁移关系进行冷内容数据的迁移;
    在所述冷内容数据的管理数据中将所述冷内容数据的存储位置更新为迁移后的存储位置。
  14. 如权利要求12所述的方法,其特征在于,所述分布式存储系统的各存储节点按照所述第一迁移关系进行热内容数据的迁移具体包括:
    所述第一迁移关系中的迁出存储节点识别出本节点存储的内容数据中的热内容数据;
    将识别出的热内容数据迁移到在所述第一迁移关系中与所述迁出存储节点结成迁移对的迁入存储节点中。
  15. 如权利要求13任一项所述的方法,其特征在于,所述分布式存储系统的各存储节点按照所述第二迁移关系进行冷内容数据的迁移具体包括:
    所述第二迁移关系中的迁出存储节点识别出自身存储的内容数据中的冷内容数据;
    将识别出的冷数据迁移到在所述第二迁移关系中与所述迁出存储节点结成迁移对的迁入存储节点中。
  16. 一种实现分布式存储的存储节点,其特征在于,所述存储节点为所述分布式存储系统中的第一存储节点,所述第一存储节点为按照分布式算法计算出的用于管理所述内容数据的存储节点;所述第一存储节点包括:
    通信接口,用于接收应用服务器发送的写入内容数据的请求;
    数据管理器,用于获取分布式存储系统中各存储节点的容量负载以及流量负载,并基于获取到的容量负载以及流量负载确定用于写入所述内容数据的第二存储节点;
    所述通知接口还用于通知所述应用服务器将所述内容数据写入到所述数据管理器确定出的所述第二存储节点;
    所述数据管理器还用于创建所述内容数据的管理数据,所述管理数据中记录所述内容数据存储于所述第二存储节点。
  17. 如权利要求16所述的存储节点,其特征在于,所述各存储节点的容量负载是根据各存储节点各自支持的存储容量以及各自的已用存储容量计算得到;
    所述各存储节点的流量负载是根据所述各存储节点各自支持的岀流能力以及各自的平均岀流量计算得到的。
  18. 如权利要求16或17所述的存储节点,其特征在于,数据管理器于获取到的容量负载以及流量负载确定用于写入所述内容数据的第二存储节点具体包括:
    所述数据管理器找出容量负载在预设容量范围内的存储节点,形成第一节点集合;找出流量负载在预设流量范围内的存储节点,形成第二节点集合;从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点。
  19. 如权利要求18所述的存储节点,其特征在于,所述数据管理器从第一节点集合和第二节点集合的交集中选择用于写入所述数据的第二存储节点,具体包括:
    所述数据管理器用于判断要写入的所述数据是热数据还是冷数据;若为热数据,从所述第一节点集合和第二节点集合的交集中选择流量负载最小的存储节点作为写入所述数据的所述另一存储节点;若为冷数据,从所述第一节点集合和第二节点集合的交集中选择容量负载最小的存储节点作为写入所述数据的所述另一存储节点。
  20. 如权利要求16-19任一项所述的存储节点,其特征在于,所述通信接口还用于接收访问所述内容数据的请求;
    所述数据管理器还用于从所述管理数据中确定出所述内容数据存储于所述第二存储节点,并将所述访问所述内容数据的请求转发给所述第二存储节点。
  21. 如权利要求16-20任一项所述的存储节点,其特征在于,所述数据管理器还用于根据所述内容数据被访问的次数确定所述内容数据的访问热度,在所述内容数据的管理数据中记录所述内容数据的访问热度。
  22. 如权利要求16-21任一项所述的存储节点,其特征在于,所述第一存储节点还包括:
    迁移管理器,用于根据第一存储节点存储的内容数据的访问热度,识别出热内容数据;将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,并通知存储所述热内容数据的管理数据的存储节点更新所述热内容数据的存储位置。
  23. 如权利要求22所述的存储节点,其特征在于,所述迁移管理器将所述热内容数据迁移到流量负载比所述第一存储节点的流量负载小的一个存储节点,具体包括:
    所述迁移管理器用于将所述分布式存储系统的各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;将所述热内容数据迁移到在所述第一迁移关系中与所述第一存储节点结成迁移对的迁入存储节点中。
  24. 如权利要求16-22任一项所述的存储节点,其特征在于,所述迁移管理器还用于根据本存储节点存储的内容数据的访问热度,识别出的冷内容数据;将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点,并通知存储所述冷内容数据的管理数据的存储节点更新所述冷内容数据的存储位置。
  25. 如权利要求24所述的存储节点,其特征在于,所述迁移管理器将冷内容数据迁移到容量负载比所述第一存储节点的容量负载小的一个存储节点具体包括:
    所述迁移管理器用于将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;将所述冷内容数据迁移在所述第二迁移关系中与所述第一存储节点结成迁移对的迁入存储节点。
  26. 一种分布式存储系统,其特征在于,所述分布式存储系统包括至少两个存储节点;所述分布式系统中存储有内容数据以及各内容数据的管理数据,每个管理数据中包括与所述管理数据对应的内容数据的存储位置;所述管理数据通过分布式算法分布在所述分布式存储系统的各存储节点中,所述内容数据按照各存储节点的容量负载和流量负载进行分布;
    所述分布式系统还包括扩容存储节点;
    所述分布式存储系统包括的至少两个存储节点用于通过分布式算法对各自存储的各内容数据的分布进行重新计算;将计算出的应归属于扩容存储节点的内容数据保留在存储所述内容数据的原存储节点,将计算出的所述内容数据的管理数据迁移到所述扩容存储节点;
    所述扩容存储节点用于存储迁移后的管理数据。
  27. 如权利要求26所述的分布式系统,其特征在于,所述分布式存储系统中的一个存储节点还用于将所述分布式存储系统中的各存储节点按流量负载排序,并按照流量负载大的存储节点向流量负载小的存储节点迁移热内容数据的原则,确定各存储节点间的第一迁移关系,所述第一迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述分布式存储系统的各存储节点还用于按照所述第一迁移关系进行热内容数据的迁移;在所述热内容数据的管理数据中将所述热内容数据的存储位置更新为迁移后的存储位置。
  28. 如权利要求26或27所述的分布式系统,其特征在于,所述分布式存储系统中的一个存储节点还用于将所述分布式存储系统的各存储节点按容量负载排序,并按照容量负载大的存储节点向容量负载小的存储节点迁移冷内容数据的原则,确定各存储节点间的第二迁移关系,所述第二迁移关系中包括结成迁移对的迁入存储节点和迁出存储节点;
    所述分布式存储系统的各存储节点还用于按照所述第二迁移关系进行冷内容数据的迁移;在所述冷内容数据的管理数据中将所述冷内容数据的存储位置更新为迁移后的存储位置。
PCT/CN2017/085383 2016-06-29 2017-05-22 一种分布式存储的方法和系统 WO2018000993A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610507128.6A CN106161610A (zh) 2016-06-29 2016-06-29 一种分布式存储的方法和系统
CN201610507128.6 2016-06-29

Publications (1)

Publication Number Publication Date
WO2018000993A1 true WO2018000993A1 (zh) 2018-01-04

Family

ID=57350814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085383 WO2018000993A1 (zh) 2016-06-29 2017-05-22 一种分布式存储的方法和系统

Country Status (2)

Country Link
CN (1) CN106161610A (zh)
WO (1) WO2018000993A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008199A (zh) * 2019-03-25 2019-07-12 华南理工大学 一种基于访问热度的数据迁移部署方法
CN110058822A (zh) * 2019-04-26 2019-07-26 北京计算机技术及应用研究所 一种磁盘阵列横向拓展方法
CN111459914A (zh) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 分布式图数据库的优化方法、装置和电子设备
CN112181309A (zh) * 2020-10-14 2021-01-05 上海德拓信息技术股份有限公司 一种海量对象存储的在线扩容方法
CN112637327A (zh) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 一种数据处理方法、装置及系统

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106161610A (zh) * 2016-06-29 2016-11-23 华为技术有限公司 一种分布式存储的方法和系统
CN106814971B (zh) * 2016-12-20 2020-09-29 中国银联股份有限公司 一种异构存储方法及异构存储平台
CN109002448B (zh) * 2017-06-07 2020-12-08 中国移动通信集团甘肃有限公司 一种报表统计方法、装置及系统
CN107277144B (zh) * 2017-06-22 2021-02-09 浙江力石科技股份有限公司 一种分布式高并发云存储数据库系统及其负荷均衡方法
CN107566505A (zh) * 2017-09-15 2018-01-09 郑州云海信息技术有限公司 数据存储资源管理方法、主节点、系统、装置及存储介质
CN108595108A (zh) * 2017-12-29 2018-09-28 北京奇虎科技有限公司 一种数据的迁移方法和装置
CN110244901B (zh) * 2018-03-07 2021-03-26 杭州海康威视系统技术有限公司 任务分配方法及装置、分布式存储系统
CN108763577A (zh) * 2018-06-05 2018-11-06 平安科技(深圳)有限公司 节点处理方法及装置、存储介质和电子设备
CN111078126B (zh) * 2018-10-19 2023-09-15 阿里巴巴集团控股有限公司 分布式存储系统及其存储方法
CN109542352B (zh) * 2018-11-22 2020-05-08 北京百度网讯科技有限公司 用于存储数据的方法和装置
CN109885533A (zh) * 2019-02-22 2019-06-14 深圳市网心科技有限公司 一种基于dht网络的数据部署方法、节点设备、数据部署系统及存储介质
CN109960587A (zh) * 2019-02-27 2019-07-02 厦门市世纪网通网络服务有限公司 超融合云计算系统的存储资源分配方法和装置
CN110162270B (zh) * 2019-04-29 2020-08-25 平安国际智慧城市科技股份有限公司 基于分布式存储系统的数据存储方法、存储节点及介质
CN110531938A (zh) * 2019-09-02 2019-12-03 广东紫晶信息存储技术股份有限公司 一种基于多维度的冷热数据迁移方法及系统
CN112749004B (zh) * 2019-10-30 2023-09-05 中国移动通信集团安徽有限公司 基于节点访问热度的数据存储方法及装置
CN113032137A (zh) * 2019-12-25 2021-06-25 中科寒武纪科技股份有限公司 任务分配方法、装置、计算机设备及可读存储介质
CN111245842B (zh) * 2020-01-14 2021-02-05 深圳市恒悦创客空间有限公司 一种园区情报处理方法
CN111309732B (zh) * 2020-02-19 2024-03-08 杭州网易数之帆科技有限公司 数据处理方法、装置、介质和计算设备
CN114281256A (zh) * 2021-12-20 2022-04-05 广州炒米信息科技有限公司 基于分布式存储系统的数据同步方法、装置、设备及介质
CN117370275A (zh) * 2022-07-01 2024-01-09 中兴通讯股份有限公司 文件方法、服务器、存储节点、文件存储系统、客户端

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610287A (zh) * 2009-06-16 2009-12-23 浙江大学 一种应用于分布式海量存储系统的负载均衡方法
CN102055650A (zh) * 2009-10-29 2011-05-11 华为技术有限公司 负载均衡方法及系统和管理服务器
CN103106207A (zh) * 2011-11-10 2013-05-15 中国移动通信集团公司 一种对象存储系统中元数据分布的方法和设备
US20150242150A1 (en) * 2011-03-22 2015-08-27 Amazon Technologies, Inc. Methods and apparatus for optimizing resource utilization in distributed storage systems
CN105025053A (zh) * 2014-04-24 2015-11-04 苏宁云商集团股份有限公司 基于云存储技术的分布式文件的上传方法及其系统
CN106161610A (zh) * 2016-06-29 2016-11-23 华为技术有限公司 一种分布式存储的方法和系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692239B (zh) * 2009-10-19 2012-10-03 浙江大学 一种分布式文件系统元数据分配方法
CN102739622A (zh) * 2011-04-15 2012-10-17 北京兴宇中科科技开发股份有限公司 一种可扩展的数据存储系统
CN104378447B (zh) * 2014-12-03 2017-10-31 深圳市鼎元科技开发有限公司 一种基于哈希环的非迁移分布式存储方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610287A (zh) * 2009-06-16 2009-12-23 浙江大学 一种应用于分布式海量存储系统的负载均衡方法
CN102055650A (zh) * 2009-10-29 2011-05-11 华为技术有限公司 负载均衡方法及系统和管理服务器
US20150242150A1 (en) * 2011-03-22 2015-08-27 Amazon Technologies, Inc. Methods and apparatus for optimizing resource utilization in distributed storage systems
CN103106207A (zh) * 2011-11-10 2013-05-15 中国移动通信集团公司 一种对象存储系统中元数据分布的方法和设备
CN105025053A (zh) * 2014-04-24 2015-11-04 苏宁云商集团股份有限公司 基于云存储技术的分布式文件的上传方法及其系统
CN106161610A (zh) * 2016-06-29 2016-11-23 华为技术有限公司 一种分布式存储的方法和系统

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008199A (zh) * 2019-03-25 2019-07-12 华南理工大学 一种基于访问热度的数据迁移部署方法
CN110058822A (zh) * 2019-04-26 2019-07-26 北京计算机技术及应用研究所 一种磁盘阵列横向拓展方法
CN110058822B (zh) * 2019-04-26 2022-06-24 北京计算机技术及应用研究所 一种磁盘阵列横向拓展方法
CN111459914A (zh) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 分布式图数据库的优化方法、装置和电子设备
CN111459914B (zh) * 2020-03-31 2023-09-05 北京金山云网络技术有限公司 分布式图数据库的优化方法、装置和电子设备
CN112181309A (zh) * 2020-10-14 2021-01-05 上海德拓信息技术股份有限公司 一种海量对象存储的在线扩容方法
CN112637327A (zh) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 一种数据处理方法、装置及系统

Also Published As

Publication number Publication date
CN106161610A (zh) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2018000993A1 (zh) 一种分布式存储的方法和系统
US10997211B2 (en) Systems and methods for database zone sharding and API integration
US10977277B2 (en) Systems and methods for database zone sharding and API integration
US11431791B2 (en) Content delivery method, virtual server management method, cloud platform, and system
CN112840322B (zh) 网络路由环境中的单节点和多节点数据存储库系统
EP2901308B1 (en) Load distribution in data networks
KR101585146B1 (ko) 오브젝트를 복수 개의 데이터 노드들의 위치에 기반하여 분산 저장하는 분산 저장 시스템 및 그 위치 기반 분산 저장 방법 및 컴퓨터에 의하여 독출 가능한 저장 매체
JP4681615B2 (ja) ノードのワークロードの分割
KR101544480B1 (ko) 복수 개의 프락시 서버를 포함하는 분산 저장 시스템 및 그 오브젝트 관리 방법 및 컴퓨터에 의하여 독출가능한 저장 매체
US20140101300A1 (en) Method and apparatus for automated deployment of geographically distributed applications within a cloud
US11327688B2 (en) Master data placement in distributed storage systems
US10320905B2 (en) Highly available network filer super cluster
Narendra et al. Towards cloud-based decentralized storage for internet of things data
WO2010025653A1 (zh) 搜索信息的方法、系统、装置及垂直搜索引擎注册的方法
Yang et al. A reinforcement learning based data storage and traffic management in information-centric data center networks
US11606415B2 (en) Method, apparatus and system for processing an access request in a content delivery system
JP2022528726A (ja) K-最近傍探索のための分散型インメモリ空間データストア
CN107408058A (zh) 一种虚拟资源的部署方法、装置及系统
JP2024514467A (ja) 地理的に分散されたハイブリッドクラウドクラスタ
Vijayakumar et al. FIR3: A fuzzy inference based reliable replica replacement strategy for cloud Data Centre
US11971902B1 (en) Data retrieval latency management system
WO2024140698A1 (zh) 一种会话处理方法、系统、装置及存储介质
WO2021110176A1 (zh) 一种边缘系统及数据操作请求的处理方法
Dou et al. Industrial-Metadata Intelligent Service for Geo-Distributed File System
Gupta et al. Efficient data replication algorithm for mobile Ad-hoc networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17818990

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17818990

Country of ref document: EP

Kind code of ref document: A1