Specific embodiment
Present disclosure is described in more detail below hereinafter with reference to attached drawing, wherein showing the reality of present disclosure in the accompanying drawings
Apply mode.But these embodiments can be realized with many different forms and be should not be construed as being limited to described herein
Embodiment.On the contrary, provide these examples so that present disclosure will be thorough and complete, and will comprehensively to
Those skilled in the art expression scope of the present disclosure.
Various embodiments of detailed description of the present invention in an illustrative manner with reference to the accompanying drawing.
Fig. 3 A shows the configuration diagram of the storage system of embodiment according to the present invention.The storage system includes storage
Network;Memory node is connected to the storage network;And storage equipment, it is similarly connected to the storage network.Each storage
Equipment includes at least one storage medium.For example, inventor, which commonly stores equipment, can place 45 pieces of storage mediums.Wherein,
The storage network is configured such that each memory node can access all deposit without other memory nodes
Storage media.In Fig. 3 A will storage network be illustrated as SAS switch, but it is to be understood that storage network can also be SAS set,
Or the other forms that will be discussed below.Fig. 3 A schematically shows three memory nodes, i.e. memory node S1, storage
Node S2 and memory node S3, difference are directly connected with SAS switch.Storage system shown in Fig. 3 A includes physical server
31,32 and 33, these physical servers are connect with storage equipment by storing network respectively.Physical server 31 includes being co-located in
Its calculate node C11, C12 and memory node S1, physical server 32 include being co-located in its calculate node C21, C22 and depositing
Node S2 is stored up, physical server 33 includes calculate node C31, the C32 and memory node S3 for being co-located in it.It is deposited shown in Fig. 3 A
Storage system includes storage equipment 34,35 and 36, and storage equipment 34 includes the storage medium 1 for being co-located in it, storage medium 2 and deposits
Storage media 3, storage equipment 35 include the storage medium 1, storage medium 2 and storage medium 3 for being co-located in it, and storage equipment 36 is wrapped
Include the storage medium 1, storage medium 2 and storage medium 3 for being co-located in it.
Using storage system provided in an embodiment of the present invention, each memory node can be saved without other storages
Point and access all storage mediums so that all storage medium of the present invention is all actually total to by all memory nodes
It enjoys, and then realizes the effect of pool of global storage.That is, storage network is configured such that each memory node can
It is enough to access all storage mediums without other memory nodes.Further, storage network is configured such that each deposit
It stores up node and is only responsible for the fixed storage medium of management simultaneously, and guarantee that a storage medium will not be by multiple memory nodes simultaneously
Be written, lead to corrupted data, so as to realize each memory node can without other memory nodes and
The storage medium that access is managed by it, and can guarantee the integrality of the data stored in storage system.Furthermore, it is possible to by institute
The storage pool of building is divided at least two storage regions, and each memory node is responsible for management zero to multiple storage regions.With reference to
Fig. 3 A, using different background pattern, the situation for the storage region for diagrammatically illustrating memory node management, wherein to identical
The storage medium and the responsible memory node for managing it that storage region includes are indicated with identical background patterns.Specifically
For, memory node S1 is responsible for managing the first storage region comprising stores in storing the storage medium 1 of equipment 34, being in
The storage medium 1 of equipment 35 and in storage equipment 36 storage medium 1;Memory node S2 is responsible for managing the second memory block
Domain comprising in storage equipment 34 storage medium 2, in storage equipment 35 storage medium 2 and in storage equipment
36 storage medium 2;Memory node S3 is responsible for managing third storage region comprising the storage medium in storage equipment 34
3, the storage medium 3 in storage equipment 35 and the storage medium 3 in storage equipment 36.
Meanwhile it can be seen from the above description that compared with the prior art (wherein memory node is located at storage medium side,
Or strictly speaking, storage medium is the built-in disk of physical machine where memory node), in the embodiment of the present invention, memory node institute
Physical machine independently of storage equipment, storage equipment more as connection storage medium with storage network a channel.
Such mode so that when needing to carry out dynamic equilibrium, without by physical data in different storage mediums
It is migrated, it is only necessary to by configuring the storage region (or storage medium) for balancing different memory nodes and being managed.
In an alternative embodiment of the invention, storage-node side further comprises calculate node, and calculate node and storage
Node is arranged in a physical server, which connect with storage equipment by storing network.Utilize the present invention
Calculate node and memory node are located to the gathering storage system of same physical machine, from overall structure constructed by embodiment
For, it is possible to reduce the quantity of required physical equipment, to reduce cost.Meanwhile calculate node can also be arrived in local IP access
Its storage resource desired access to.In addition, since calculate node and memory node to be aggregated on same physical server, two
Data exchange can be as simple as only shared drive between person, and performance is especially excellent.
In storage system provided in an embodiment of the present invention, calculate node to the I/O data path length between storage medium
It include: (1) storage medium to memory node;And (2) memory node is to the calculate node for being aggregated in same physical server
(cpu bus access).And in contrast, the storage system of the prior art shown in Fig. 1, calculate node is between storage medium
I/O data path length includes: (1) storage medium to memory node;(2) memory node to storage network insertion network switch;
(3) network insertion network switch is stored to core network switches;(4) core network switches to calculate network insertion network switch;
And (5) calculate network insertion network switch to calculate node.Obviously, the total data road of the storage system of embodiment of the present invention
Diameter is only close to (1) item of heritage storage system.That is, storage system provided in an embodiment of the present invention, by I/O data road
The ultimate attainment compression of electrical path length can greatly improve the I/O channel performance of storage system, and practical operational effect is very close
In the channel I/O of read-write local hard drive.
In an embodiment of the present invention, memory node can be a virtual machine of physical server, a container or straight
A module on the physical operating system for operating in server is connect, calculate node is also possible to the same physical machine server
One virtual machine, a container run directly in a module on the physical operating system of the server.In a reality
It applies in example, each memory node can correspond to one or more calculate nodes.
Specifically, a physical server can be divided into multiple virtual machines, wherein a virtual machine does memory node
With other virtual machines do calculate node use;It can also be and do memory node use using a module on physics OS, to realize more
Good performance.
In an embodiment of the present invention, formed virtual machine virtualization technology can be KVM or Zen or VMware or
Hyper-V virtualization technology, formed the container container technique can be Docker or Rockett or Odin or Chef or
LXC or Vagrant or Ansible or Zone or Jail or Hyper-V container technique.
In an embodiment of the present invention, each memory node is only responsible for the fixed storage medium of management, and one simultaneously
Storage medium will not be written by multiple memory nodes simultaneously, to avoid data collision, so as to realize each storage
Node can access the storage medium managed by it without other memory nodes, and can guarantee in storage system
The integrality of the data of storage.
In an embodiment of the present invention, storage medium all in system can be divided according to storage logic, is had
For body, the storage pool of whole system can be divided into storage region, storage group, logical storage layers grade frame as memory block
Structure, wherein memory block is minimum memory unit.In an embodiment of the present invention, storage pool at least two can be divided into deposit
Storage area domain.
In an embodiment of the present invention, each storage region can be divided at least one storage group.Preferably at one
In embodiment, each storage region is at least divided into two storage groups.
In some embodiments, storage region and storage group can merge, so as in the storage tier framework
One level of middle omission.
In an embodiment of the present invention, each storage region (or storage group) can be made of at least one memory block,
Wherein a part that memory block can be a complete storage medium, be also possible to a storage medium.In order in memory block
Domain internal build redundant storage, each storage region (or storage group) can be made of at least two memory blocks, when wherein appointing
It, can be from complete stored data be calculated in the group in remaining memory block when what memory block breaks down.Redundancy is deposited
Storage mode can be more copy modes, raid-array (RAID) mode, correcting and eleting codes (erase code) mode.At this
It invents in an embodiment, redundant storage mode can be established by ZFS file system.In an embodiment of the present invention, in order to right
Anti- storage equipment/storage medium hardware fault, multiple memory blocks that each storage region (or storage group) is included will not
It is not located in the same storage equipment in the same storage medium, or even also.In an embodiment of the present invention, each storage
Any two memory block that region (or storage group) is included will not all be located in the same storage medium/storage equipment.?
In another embodiment of the present invention, the storage of same storage medium/storage equipment is located in same storage region (or storage group)
Number of blocks is preferably less than or equal to the redundancy of redundant storage.For example, when storing 5 mode of RAID that redundancy is taken,
The redundancy of redundant storage is 1, then the storage number of blocks for being located at the same storage group of same storage equipment is up to 1;It is right
RAID6, the redundancy of redundant storage are 2, then the storage number of blocks for being located at the same storage group of same storage equipment is most
It is 2.
In an embodiment of the present invention, self-administered storage region can only be read and be write to each memory node.Due to more
A memory node can't conflict mutually to the read operation of the same memory block, and multiple memory nodes write a memory block simultaneously
It is easy to happen conflict, therefore, in another embodiment, can be each memory node can only write self-administered memory block
Domain, but the storage region of self-administered storage region and other memory node management can be read, i.e. write operation is local
Property, but read operation can be it is of overall importance.
In one embodiment, storage system can also include storage control node, be connected to storage network, be used for
Determine the storage region of each memory node management.In another embodiment, each memory node may include storage point
With module, the storage region managed for determining the memory node, this can be by each included by each memory node
Communication and Coordination Treatment algorithm between distribution module are stored to realize.
In one embodiment, it when monitoring that a memory node breaks down, can be deposited to other parts or all
Storage node is configured, so that by the memory block of the memory node management broken down before these memory nodes adapter tube
Domain.For example, the storage region for the memory node management broken down can be taken over by one of memory node, alternatively, can be with
It is taken over by other at least two memory nodes, wherein the portion for the memory node management that each memory node adapter tube breaks down
The storage region divided, such as other at least two memory nodes take over the different storage groups in the storage region respectively.
In one embodiment, storage medium can include but is not limited to hard disk, flash memory, SRAM, DRAM, NVME or its
Its form, the access interface of storage medium can include but is not limited to SAS interface, SATA interface, PCI/e interface, DIMM and connect
Mouth, NVMe interface, scsi interface, ahci interface.
In an embodiment of the present invention, storage network may include at least one storage switching equipment, by including
Storage switching equipment between data exchange realize access of the memory node to storage medium.Specifically, memory node
Pass through memory channel respectively with storage medium to connect with storage switching equipment.
In an embodiment of the present invention, storage switching equipment can be SAS switch or PCI/e interchanger is accordingly deposited
Storage channel can be (Serial Attached SCSI (SAS)) channel SAS or the channel PCI/e.
By taking the channel SAS as an example, possessed compared to traditional storage scheme based on IP agreement based on the scheme of SAS exchange
Performance is high, with the advantages that roomy, single device number of disks is more.With on host adapter (HBA) or server master board
After SAS interface is used in combination, storage provided by SAS system can easily connected multiple servers access simultaneously.
Specifically, SAS switch is connected between storage equipment by a SAS line, equipment and storage medium are stored
Between be also to be connected by SAS interface, for example, the channel SAS is connected to each storage medium (can be set in storage inside storage equipment
One SAS exchange chip of standby internal setting).It is gigabit Ethernet since the bandwidth of SAS network can achieve 24Gb or 48Gb
Tens times, and several times of ten thousand expensive mbit ethernets;There is mentioning for about an order of magnitude than IP network in link layer SAS simultaneously
It rises, in transport layer, is closed due to Transmission Control Protocol three-way handshake four times, expense is very high and delayed acknowledgement mechanism of TCP and slow turn-on have
When will lead to the delays of 100 Milliseconds, the delay of SAS protocol only has 1/the tens of TCP, and performance has bigger promotion.Always
It, SAS network has huge advantage than the TCP/IP based on Ethernet in terms of bandwidth, time delay.Those skilled in the art can
To understand, the performance in the channel PCI/e is also adapted to the demand of system.
In an embodiment of the present invention, storage network may include at least two storage switching equipment, each storage
Node can be connected to any one storage equipment by any one storage switching equipment, and then be connected to storage medium.
When any one storage switching equipment or when being connected to the memory channel failure of a storage switching equipment, memory node is logical
Cross the data in other storage switching equipment read-write storage equipment.
With reference to Fig. 3 B, it illustrates a constructed according to one embodiment of the present invention specific storage systems 30.
Storage equipment in storage system 30 is built into more JBOD 307-310, is connected to two SAS by SAS data line respectively
Interchanger 305 and 306, the two SAS switches constitute the exchcange core that network is stored included by storage system.Front end is
At least two servers 301 and 302, every server are connected to this by SAS interface on HBA equipment (not shown) or mainboard
Two SAS switches 305 and 306.It is used to monitor and communicate there are basic network connection between server.In every server
There is a memory node, using the information obtained from SAS link, some or all of manages in all JBOD disks disk.
JBOD disk is divided into specifically, can use present specification storage region described above, storage group, memory block
Different storage groups.Each memory node manages one or more groups of such storage groups.Using superfluous inside each storage group
When the mode of balance storage, the metadata of redundant storage can be present on disk, redundant storage is deposited by other
Node is stored up directly to identify from disk.
In the exemplary memory system 30 shown in, memory node can install monitoring and management module, be responsible for monitoring originally
The state of ground storage and other servers.When some disk exception on the whole abnormal or JBOD of certain JBOD, data can
Ensured by property by redundant storage.Memory node when certain server failure, on another pre-set server
In management module locally identifying according to the data on disk and taking over the memory node institute by failed server originally
The disk of management.The storage service that the memory node of failed server externally provides originally, also by depositing on new server
Storage node is continued.So far, a kind of pool of global storage structure of completely new High Availabitity is realized.
As it can be seen that constructed exemplary memory system 30 provides, a kind of multiple spot is controllable, storage pool of global access.Firmly
Service externally is provided using multiple servers in terms of part, stores disk using JBOD.More JBOD are respectively connected two
SAS switch, two interchangers are separately connected the HBA card of server again, so that it is guaranteed that all disks on JBOD, can be owned
Server access.SAS redundant link also ensures the high availability of chain road.
In every server local, using redundant memory technology, disk is chosen from every JBOD and forms redundant storage, is kept away
The loss for exempting from separate unit JBOD causes data unavailable.When a server failure, the module being monitored to integrality will
Another server is dispatched, the disk managed by the memory node of SAS channel access failed server, rapid pipe connecting other side
These responsible disks realize the global storage of High Availabitity.
Although be illustrated so that JBOD stores disk as an example in figure 3b, but it is to be understood that as shown in Figure 3B
Embodiments of the present invention also support the storage equipment other than JBOD.In addition, being with one piece of storage medium (entire) work above
For a memory block, it is applied equally to using a part of a storage medium as the situation of a memory block.
Fig. 4 shows the process of the access control method 40 for exemplary memory system of embodiment according to the present invention
Figure.
In step S401, the load condition between at least two memory nodes included by storage system is monitored.
In step S402, when the load for monitoring a memory node exceeds predetermined threshold, at least two storages are saved
The storage region that associated storage node in point is managed is adjusted.Associated storage node can be the unevenness for causing the load
The memory node of weighing apparatus state possibly relies on the adjustable strategies of storage region and determines.To the adjustment of storage region can be by
The memory block being related to is redistributed between memory node, or increase, merging or deletion for can be storage region etc..
The allocation list for the storage region that associated storage node is managed can be adjusted, at least two memory node is according to institute
Allocation list is stated to determine its storage region managed.Can include by storage system above-mentioned to the adjustment of aforementioned arrangements table
Storage control node or memory node include storage distribution module carry out.
In one embodiment, following property can be directed to the monitoring of the load condition between at least two memory nodes
Can one or more progress in parameter: read-write operation number (IOPS) number of request per second of memory node, memory node gulp down
The amount of spitting, the CPU usage of memory node, the memory usage of memory node and the storage medium of memory node management account for
With rate.
In one embodiment, the performance parameter of each node regular monitoring oneself can be made, while periodically inquiring it
Then the data of his node generate a global unification again by equalization scheme again predetermined or by algorithm dynamic
Equalization scheme, last each node execute the program.It include independently of memory node in another embodiment, storage system
The monitoring node of S1, memory node S2 and memory node S3 or storage control node above-mentioned or storage distribution module,
To monitor the performance parameter of each memory node.
In one embodiment, for unbalanced judgement can by threshold values predetermined (configurable) Lai Shixian,
Such as when the deviation of the IOPS number between each node is more than that a certain range then triggers equilibrating mechanism again.For example, for IOPS,
Can comparing with the IOPS number of the smallest memory node of IOPS number by the maximum memory node of IOPS number, determine both
Between deviation be greater than the latter 30% when, triggering storage region is adjusted.For example, the maximum storage of IOPS number is saved
The storage medium that a managed storage medium of point is managed with the smallest memory node of IOPS number is exchanged, for example is selected
The highest storage medium of occupancy that the maximum memory node of IOPS number is managed is selected to be managed with the smallest memory node of IOPS number
The highest storage medium of the occupancy of reason.
It is alternatively possible to being averaged the IOPS number of the maximum memory node of IOPS number and the IOPS number of each memory node
Value compares, and when determination deviation between the two is greater than the 20% of the latter, triggering is adjusted storage region, so that adjustment
Storage region allocation plan afterwards will not trigger again balanced immediately.
It should be appreciated that being previously described for indicating that the predetermined threshold 20%, 30% of the imbalance of load is only exemplary
, it can also the other threshold value of different definition according to application and for demand.Similarly, other performances are joined
Number, such as handling capacity, the CPU utilization rate of memory node, the memory usage of memory node and the storage section of memory node
The occupancy of the storage medium of point management, also definition, which is pre-defined, loads threshold value balanced again between memory node for triggering.
It is also understood that although discussed above can join the predetermined threshold of unbalanced judgement by multinomial performance
One specified threshold of the respective specified threshold in number, such as IOPS number indicate, but it is envisioned that arrive the predetermined threshold
Being worth it can also be indicated by the combination of the multinomial specified threshold of the respective specified threshold in multinomial performance parameter.For example,
When the handling capacity that the IOPS number of memory node reaches its specified threshold and memory node reaches its specified threshold, just triggering is deposited
The load for storing up node is balanced again.
In an embodiment, adjustment (balanced again) for storage region can will load high memory node and be managed
The storage medium of reason, which is assigned to, to be loaded in the storage region that low memory node is managed, such as may include the friendship of storage medium
It changes or from the deletion loaded in the storage region that high memory node is managed and is loading what low memory node was managed
Increase in storage region or will access storage network new storage medium or new storage region be fifty-fifty added to
In few two storage regions (for example, storage system dilatation) or by the partial memory area domain at least two storage regions into
Row merges (for example, a memory node failure).In an embodiment, adjustment (balanced again) for storage region can be with
Development behavior algorithm, for example, the various load datas of each storage medium and each memory node are weighted to obtain one
Then single loading index calculates an equalization scheme again, by the minimal number of disk group of movement, keep system no longer super
Reservation threshold out.
In one embodiment, the performance for the storage medium that each memory node regular monitoring oneself can be made to be managed
Parameter, while the performance parameter for the storage medium that other nodes are managed periodically is inquired, it is fixed for the performance parameter of storage medium
The threshold value of imbalance of the justice for indicating load, for example, the threshold value can use for the memory space of any storage medium
Rate be 0% (thering is new disk to be added), any storage medium memory space utilization rate be 90% (having disk space that will expire) or
The difference of the highest storage medium of memory space utilization rate and the minimum storage medium of memory space utilization rate is big in person's storage system
In the 20% of the latter.It should be appreciated that being previously described for indicating the predetermined threshold 0%, 90%, 30% of the imbalance of load
It is merely exemplary.
Fig. 5 show according to an embodiment of the present invention, in the storage system shown in Fig. 3 A realize that load is balanced again
Schematic illustration.Assuming that at a time, the load of the memory node S1 in the storage system is very high, and what is managed deposits
Storage media includes positioned at the storage medium 1 stored at equipment 34, the storage medium 1 at storage equipment 35 and being located at storage
Storage medium 1 (as shown in Figure 3A) at equipment 36, and its total memory space will be used up quickly, while memory node
3 load is very low, and the memory space in storage medium managed is big.
In traditional storage network, each memory node can only access the storage region for being directly connected to itself.Therefore
During rebalancing, need to copy to the data on heavy duty memory node on light load node, in the process, meeting
There is mass data duplication operation, additional load is caused to storage region and network, influences the I O access of regular traffic data.
For example, it is desired to which the one or more storage mediums managed from memory node 1 read data, then the data of reading are written to
The one or more that memory node 3 manages finally discharges the disk that the data are stored in the storage medium that memory node 1 manages
Load balancing is realized in space.
However, embodiment according to the present invention, as included by storage system in each memory node S1, S2 and S3
All storage regions can be accessed by storing network, it therefore, can be by way of shifting storage medium access right come real
Existing migration of the storage region between each memory node, it can the storage region managed to associated storage node is again
Grouping.During rebalancing, the data in each storage region no longer need to do duplication operation.For example, as shown in Figure 5,
At storage equipment 34, the original storage medium 2 for having memory node 3 to manage is allocated to memory node 1 to manage, simultaneously will
Storage medium 1 at storage equipment 34, originally having memory node 1 to manage is allocated to memory node 3 and manages, and is realized with this
The load balancing of residual memory space between memory node 1 and memory node 3.In the process, it is only necessary to memory node 1
Configuration with memory node 3 is modified, and can be completed within a very short time, will not the business datum readwrite performance to user make
At influence.
Fig. 6 show another embodiment according to the present invention, in the storage system shown in Fig. 3 A realize that load is equal again
The schematic illustration of weighing apparatus.It is different from Fig. 5, in Fig. 6, in the load for monitoring the load of memory node S1 and memory node S2
When lower, it can will be located at storage medium 2 at storage equipment 35, originally thering is memory node 2 to manage and be allocated to memory node 1
Management, while at storage equipment 34, the original storage medium 1 for having memory node 1 to manage is allocated to memory node 2 and is managed
Reason, the load balancing of the residual memory space between memory node 1 and memory node 2 is realized with this.
In monitoring the another embodiment for being storage medium dilatation, for example, can be by the flat of newly-increased storage medium
It is assigned on each memory node and is managed by it, for example according to the sequence of addition, remain negative between memory node with this
It carries balanced.
Although it should be appreciated that above-mentioned two embodiment with storage medium is scheduled between different memory nodes with
Realize that load is balanced again, but it can be applicable to dispatch storage region between memory node to realize that load is balanced again,
For example, in the case of storage medium dilatation, it, can depositing addition when monitor to be added is the situation of a storage region
Storage area domain is assigned to each memory node by addition sequence.
Additionally, as shown in Figure 5 and Figure 6, very high in the load for monitoring memory node S1, storage can also be modified
The configuration between calculate node and memory node in system, so that originally passing through at least one of memory node S1 storing data
One or more calculate nodes, such as C12 in calculate node, can by other memory nodes, such as memory node S2,
Carry out storing data.At this point, calculate node can need to access the memory node in place of the physical server locating for it to store
Data, then can not physically mobile computing node, but accessed by remote access protocol, such as iSCSI protocol remote
Storage region (as shown in Figure 5) on journey memory node;Alternatively, can the storage region that associated storage node is managed into
While capable adjustment, calculate node is migrated (as shown in Fig. 6), may need first to close during this to be moved
Calculate node.
It should be appreciated that memory node included by the storage system of above-mentioned reference Fig. 3-Fig. 6 discussion, storage equipment, storage
The number of medium and storage region is only illustrative, and the storage system of embodiment may include at least two according to the present invention
Memory node, storage network and at least one the storage equipment being connect by storing network at least two memory nodes, institute
The each storage equipment stated at least one storage equipment may include at least one storage medium, and storage network can be configured
To make each memory node that can access all storage mediums without other memory nodes.
Embodiment according to the present invention, each storage region are managed by a memory node in multiple memory nodes
Reason, after memory node starting, memory node connects the storage region managed by it automatically, is then imported, after completion
Storage service can be provided to upper layer calculate node.
When occurring load imbalance state between monitoring memory node, it is thus necessary to determine that saved for loading higher storage
The part for the storage region that point, needs migrate, and the memory node for needing to move to the storage region.
Determination for the part for the storage region for needing to migrate, can be there are many embodiment.In an embodiment
In, can need which storage region migrated by administrative staff's artificial judgment.It in one embodiment, can be using configuration text
Part mode is pre-configured with migration priority for each storage region, the memory node is selected to work as when needing to migrate
One or more memory block, storage group or the storage medium of highest priority in the storage region of preceding management is moved
It moves.It in one embodiment, can be according to the load feelings of memory block, storage group or storage medium included by storage region
Condition is migrated;For example, each memory node can monitor the included memory block of storage region at one's disposal, storage group
Or the loading condition of storage medium, for example the information such as IOPS, handling capacity, IO delay are collected, all these information are added
Power synthesis, to select the storage region part for needing to migrate.
Determination for the memory node that needs move to the storage region, can be there are many embodiment.At one
In embodiment, the memory node that can be moved to by administrative staff's artificial judgment.In one embodiment, it can use and match
File mode is set, i.e., is pre-configured with migration object listing, such as the storage section according to priority arrangement for each storage region
Point list successively selects move target according to object listing after determining that the storage region (or part) needs are migrated
Ground.It should be noted that using such mode, it shall be guaranteed that not will cause target storage node load too high after migration.In a reality
It applies in mode, the memory node to be moved to can be selected according to the loading condition of memory node, each storage section can be monitored
Point loading condition, such as collect CPU usage, memory usage, the information such as network bandwidth utilization rate, by all these information into
Row weighted comprehensive, to select the memory node for needing to move to storage region.For example, each memory node can periodically or
Person aperiodically reports the loading condition of itself to other memory nodes, when needing to migrate, needs depositing for migrating data
Other memory nodes that storage node preferentially selects load minimum are migrated as target storage node.
The storage region (or its part) that needs migrate and the target storage node that its administrative power moves to is being determined
Afterwards, specific transition process can be confirmed and started by the administrative staff of storage system, or the migration can also be opened by program
Process.It should be noted that transition process needs to reduce the influence to upper layer calculate node to the greatest extent, such as can choose in application load
It is migrated when minimum, for example carries out (assuming that period load is minimum) at midnight;It needs to close in transition process determining
In the case where calculate node, it should be carried out in the case where the low utilization rate of the calculate node as far as possible;Migration can be pre-configured with
Strategy is determining the case where needing the multiple portions to multiple storage regions or a storage region to migrate to handle
Under migration sequence and concurrent quantity control;It, can be to associated storage section when starting to migrate storage region
Point to associated memory region write or read operation carries out necessary configuration, to guarantee the integrality of data, such as will own
Data cached write-in disk;After storage region moves to target storage node, memory node needs to carry out the memory node
Necessary initial work, then the storage region can just be accessed by upper layer calculate node;It should be again after the completion of transition process
Whether secondary monitoring loading condition, confirmation load balance.
As previously mentioned, storage system may include storage control node, it is connected to the storage network, for determining
State the storage region of each memory node management at least two memory nodes;Alternatively, the memory node can also include
Distribution module is stored, the storage region managed for determining the memory node, number can be shared between distribution module by storing
According to.
In one embodiment, control node or storage distribution module are stored, each memory node is had recorded and is responsible for
Storage region list.Self-administered storage is inquired to storage control node or storage distribution module after memory node starting
Then these storage regions are scanned in region, complete initial work.When determining that needing to occur storage region migrates, storage control
The storage region list of node processed or storage distribution module modification associated storage node, then notifies memory node as requested
Complete actual switch operating.
For example, it is assumed that need storage region 1 moving to storage section from memory node A in SAS storage system 30
Point B, then transition process may include steps of:
1) storage region 1 is deleted from the list of managing storage area of memory node A;
2) all data cached pressures are brushed into storage region 1 on memory node A;
3) all in (or resetting) memory node A and storage region 1 deposit is closed by SAS instruction on memory node A
SAS link between storage media;
4) storage region 1 is added in the list of managing storage area on memory node B;
5) it is deposited on memory node B by all in SAS instruction unpack (or resetting) memory node B and storage region 1
SAS link between storage media;
6) all storage mediums in memory node B-scan storage region 1 complete initial work;And
7) application program passes through the data in the access storage areas memory node B domain 1.
It should be noted that method of the present invention is illustrated and described as while for purposes of simplicity of explanation a succession of dynamic
Make, it should be understood that with recognizing that claimed subject content will not be limited by the execution sequence that these are acted, because one
A little movements can concurrently occur according to order in a different order appearance shown and described herein or with other movements, together
When some movements be also possible that several sub-steps, and the possibility for intersecting execution in timing is likely to occur between these sub-steps.
Additionally, it is possible to which and the movement of not all diagram is necessary to implementing the method according to the appended claims.Furthermore it is preceding
It can also include the additional step that may obtain additional effect that the description for stating step, which is not excluded for this method,.It is also understood that difference
Embodiment or process described in method and step can be combined with each other or replace.
Fig. 7 shows the frame for loading again balancer 70 for storage system according to embodiment of the present invention
Figure.Balancer 70 may include: monitoring modular 701 again for load, negative between at least two memory node for monitoring
Load state;And adjustment module 702, in the case where monitoring the imbalance of load beyond predetermined threshold, to institute
The storage region that the associated storage node at least two memory nodes is managed is stated to be adjusted.
It should be appreciated that each module recorded in device 70 is opposite with each step in the method 40 with reference to Fig. 4 description
It answers.The operation above with respect to Fig. 4 description and feature are equally applicable to device 70 and module wherein included as a result, duplicate interior
Details are not described herein for appearance.
Embodiment according to the present invention, device 70 can be implemented at each memory node, can also be implemented in
In the dispatching device of multiple memory nodes.
The teachings of the present invention is also implemented as a kind of computer program product of computer readable storage medium, including meter
Calculation machine program code is enabled a processor to when computer program code is executed by processor according to embodiment party of the present invention
The method of formula realizes the load for storage system as the embodiment described herein equalization scheme again.Computer storage medium
It can be any tangible media, such as floppy disk, CD-ROM, DVD, hard disk drive, even network medium etc..
Embodiment according to the present invention provides a kind of storage section of migration for supporting storage medium or storage region
Point loads equalization scheme, the directly control by redistributing storage medium or storage region between each memory node again
Power is balanced to realize again, avoids the influence in transition process to regular traffic data, improves memory node load significantly
Balanced efficiency again.
Although being produced it should be appreciated that can be computer program the foregoing describe a kind of way of realization of embodiment of the present invention
Product, but the method or apparatus of embodiments of the present invention can be come in fact according to the combination of software, hardware or software and hardware
It is existing.Hardware components can use special logic to realize;Software section can store in memory, by instruction execution appropriate
System, such as microprocessor or special designs hardware execute.It will be understood by those skilled in the art that above-mentioned side
Method and equipment can be used computer executable instructions and/or is included in the processor control code to realize, such as such as
Disk, the mounting medium of CD or DVD-ROM, the programmable memory of such as read-only memory (firmware) or such as optics or
Such code is provided in the data medium of electrical signal carrier.Methods and apparatus of the present invention can be by such as ultra-large
The semiconductor or such as field programmable gate array of integrated circuit or gate array, logic chip, transistor etc. can be compiled
The hardware circuit realization of the programmable hardware device of journey logical device etc., can also be soft with being executed by various types of processors
Part is realized, can also be realized by the combination such as firmware of above-mentioned hardware circuit and software.
It will be appreciated that though it is referred to several modules or submodule of device in the detailed description above, but it is this
Division is only exemplary rather than enforceable.In fact, according to an illustrative embodiment of the invention, above-described two
Or more the feature and function of module can be realized in a module.Conversely, the feature and function of an above-described module
It can be able to be to be realized by multiple modules with further division.
It is also understood that in order not to obscure embodiments of the present invention, specification only to it is some it is crucial, may not necessary skill
Art and feature are described, and the feature that may do not can be realized to some those skilled in the art is explained.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, made any modification, equivalent replacement etc. be should all be included in the protection scope of the present invention.