CN103384267A

CN103384267A - Parastor200 parallel storage management node high availability method based on distributed block device

Info

Publication number: CN103384267A
Application number: CN2013102262108A
Authority: CN
Inventors: 刘冠川; 秦东明; 杨亮; 曹振南; 王勇; 何牧君; 张新风; 陈飞; 刘超; 龚超; 明立波; 王慧; 吕永安
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Dawning Information Industry Beijing Co Ltd; Dawning Information Industry Co Ltd
Priority date: 2013-06-07
Filing date: 2013-06-07
Publication date: 2013-11-06
Anticipated expiration: 2033-06-07
Also published as: CN103384267B

Abstract

The invention relates to a parastor200 parallel storage management node high availability method based on a distributed block device. The method can be achieved through synchronization of management node storage system information and management node fault switching. Due to high availability of a parastor200 management node, the full redundancy design of the parastor200 management node is achieved in the real sense, and use of a storage system is not influenced by damage of any part in the system. When any part of the management node is damaged, services can be switched to a standby node in a few seconds. Therefore, normal use is not influenced and people have enough time to eliminate faults.

Description

The parallel high methods availalbe of storage administration node of a kind of Parastor200 based on distributed block equipment

Technical field

The present invention relates to the parallel high methods availalbe of storage administration node of a kind of Parastor200 based on distributed block equipment.

Background technology

The ParaStor200 parallel memory system has adopted the parallel architectural framework that represents memory technology, the network communications technology and data management technique developing direction, is a high-end storage systems nowadays of processing, have independent intellectual property right towards the magnanimity unstructured data.It can provide the high speed bandwidth of TB/s level and the massive storage space of EB level, can satisfy in the fields such as aircraft automobile Ship Design, biological gene research, material science research, weather forecast, seismic monitoring, environmental monitoring and analysis, energy exploration, ecommerce, online game, social and video sharing Web Hosting, animation rendering, video editing processing for memory capacity and the high application of I/O performance requirement, can be widely used in the industries such as government, education, scientific research, manufacturing, enterprise, medical treatment, oil, broadcasting and TV, the Internet.

MGR represents the management node of Parastor200, and unified control and management interface is provided, and the keeper is by the whole storage system of this node administration.

OPara represents the Parastor200 metadata node, is used for all index datas and the NameSpace of management storage systems, and single overall situation reflection externally is provided, and supports a plurality of nodes to work with the Active-Active cluster mode.

OStor represents the Parastor200 back end, and being used for provides data space, embedded high-performance data access engine, and the data access request of all clients of parallel processing supports a plurality of oStor fault-tolerant in copy mode (1-3 copy).

The management node of Parastor200 provides unified control and management interface, topological structure and configuration information that its in store whole system is important, and the keeper is by the whole storage system of this node administration.In whole storage system, the usage frequency of management node is relatively low, only has when the carry client, checks storage system status, adds memory cell, just can use management node during the bookkeeping such as deletion memory cell.Usually management is comparatively simple in the small-scale cluster, bookkeeping is also fewer, this moment, the importance of management node was relatively low, even management node breaks down, we also have the sufficient time to remove the remediation management node, we also are unlikely to occur catastrophic effect even if management node disk permanent damages occurs, because can come by the configuration information on metadata node, back end the important information on the reconfiguration management node.And just some historical datas and the client authorization information of losing can not cause too much influence to storage system.At present, be by administration interface schedule backup management node configuration information for this solution of problem way, when management node breaks down, can use secondary node installation administration node graphical interface program, the information that then imports backup is completed.Also have in addition a kind of technology to use exactly shared dish battle array to be mounted on active and standby management node by optical fiber switch.When the main management node broke down, standby management node was preserved all information of the subregion acquisition storage administration node of storing system information by carry.

Existing scheme has several potential risks.At first, even if frequency of your backups is higher, but can't avoid the possibility that between twice backup, system configuration is modified.Particularly carried out increasing or reduced the operations such as memory cell, change client authorization information, information and real information after recovery are different, will affect the normal operation of system.Secondly, even if without any information dropout, the time that management node of reconstruct expends or long, larger for those, the user is more, and the system that need to often manage operation is obviously unacceptable.Use is shared the dish battle array and can be overcome the above problems, but it is too high to share the cost that coils battle array.

Summary of the invention

For the deficiencies in the prior art, the invention provides the parallel high methods availalbe of storage administration node of a kind of Parastor200 based on distributed block equipment; The present invention has realized that by the high available Parastor200 of making that realizes the Parastor200 management node full redundancy on complete meaning designs, and in system, the damage of any parts does not affect the use of storage system.The damage of any parts of management node can switch to service within the several seconds on standby management node.So neither impact is normal uses, and has again the sufficient time to go to repair fault.Use the distributed block equipment and technology in the situation that very little cost is realized real real-time synchronization, to guarantee that active and standby management node storing system information is in full accord.

The objective of the invention is to adopt following technical proposals to realize:

The parallel high methods availalbe of storage administration node of a kind of Parastor200 based on distributed block equipment, its improvements be, described method is by the realization of following two aspects:

(1) management node storing system information file is synchronous: adopt distributed block equipment to realize.

(2) management node failover.

Wherein, in described (1), described management node storing system information is synchronously to realize when change occurs the storing system information on management node, and the main management node is with consistent for the information under the management node respective directories.

Wherein, described distributed block equipment be one with software realize, without share, the storage replication solution of mirror image block device content between server;

When the file system that data write on the local host distributed apparatus, data can be sent on an other distance host in network simultaneously, and are recorded in a file system with identical form; The establishment of described file system is synchronously realizing by distributed block equipment;

Wherein in, when distance host and local host all return when writing successfully, the process that whole data are write is just returned successfully; When the main management node breaks down, remain with a identical data on standby management node.

Wherein, in described (2), adopt heartbeat mechanism failure judgement management node, namely connect with being connected the monitoring that heartbeat sends information and replys the other side by the online management node between management node, and also automatically realize failover by ping third party's node mode failure judgement management node.

Wherein, the migration in conjunction with resource and service realizes when carrying out failover; Described resource and service comprise:

1) management node storing system information file;

2) management node managing I P;

3) Parastor200 management service and Parastor200 graphical interfaces service;

4) data synchronization service.

Wherein, described 1) in, management node storing system information file resource by backed up in synchronization to standby management node.

Wherein, described 2) in, described management node managing I P is that management node sends to metadata node, back end the IP that administration order is walked, described management node managing I P moves on standby management node from the online management node when failover.

Wherein, described 3) in, described Parastor200 management service and the service of Parastor200 graphical interfaces switch on standby management node from the online management node when failover.

Wherein, described 4) in, after switching, standby management node becomes main management node (the main management node is relative concept with standby management node, and online management node is namely the main management node), and the information of standby management node is backuped on original main management node conversely.

Compared with the prior art, the beneficial effect that reaches of the present invention is:

The invention provides the parallel high methods availalbe of storage administration node based on the Parastor200 of distributed block equipment, realized that by the high available Parastor200 of making that realizes the Parastor200 management node full redundancy on complete meaning designs, in system, the damage of any parts does not affect the use of storage system.The damage of any parts of management node can switch to service within the several seconds on standby management node.So neither impact is normal uses, and has again the sufficient time to go to repair fault.Use the distributed block equipment and technology in the situation that very little cost is realized real real-time synchronization, to guarantee that active and standby management node storing system information is in full accord.

Embodiment

The below is described in further detail the specific embodiment of the present invention.

The present invention is will realize the Parastor200 management node high available.We just know by analyzing problem that prior art exists, and the present invention will solve following two problems: (1) management node storing system information synchronous; (2) management node failover.

Solve management node storing system information file synchronization, in the time of will realizing that exactly any change occurs management node storage message file, storing system information file under the corresponding catalogue of standby management node also changes simultaneously, and the information under the main-standby nodes respective directories is in full accord.This patent adopts distributed block equipment to solve this problem.Distributed block equipment be one with software realize, without share, the storage replication solution of mirror image block device content between server.When you, data being write the file system on local distributed apparatus, data can be sent on an other main frame in network simultaneously, and are recorded in (in fact the establishment of file system is also synchronously realizing by distributed block equipment) in a file system with identical form.When all returning to when writing successfully the whole process of writing, distance host and local host just return successfully.Therefore local node can guarantee real-time synchronizeing with the data of remote node, and guarantees the consistency of IO.So when the main management node breaks down, also can remain with a identical data on standby management node, can continue to use, to reach high available order

During the management node failover, the problem that at first failover needs to solve is exactly failure judgement how, here we adopt heartbeat mechanism, connect with being connected the monitoring that heartbeat sends information and replys the other side by management node between management node, and also automatically realize failover by mode failure judgement nodes such as ping third party's nodes.Carry out failover and also need to solve the migration that an important problem is exactly service, resource.Resource and service comprise in this present invention: 1) management node storing system information file, these resources by backed up in synchronization to secondary node.2) management node managing I P, this IP are different from the IP of the network that between two nodes, synchronous documents is walked.It is that management node sends to metadata node, back end the IP that administration order is walked.This IP need to move on standby management node from the main management node when failover.3) Parastor200 management service and Parastor200 graphical interfaces service, these two services also when failover, switch on secondary node from management node.4) data synchronization service, after namely switching, standby management node becomes the main management node, and it need to backup to the information above it on original main management node conversely.

Should be noted that at last: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although with reference to above-described embodiment, the present invention is had been described in detail, those of ordinary skill in the field are to be understood that: still can modify or be equal to replacement the specific embodiment of the present invention, and do not break away from any modification of spirit and scope of the invention or be equal to replacement, it all should be encompassed in the middle of claim scope of the present invention.

Claims

1. the parallel high methods availalbe of storage administration node of the Parastor200 based on distributed block equipment, is characterized in that, described method realizes by following two aspects:

(2) management node failover.

2. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 1, it is characterized in that, in described (1), described management node storing system information is synchronously to realize when change occurs the storing system information on management node, and the main management node is with consistent for the information under the management node respective directories.

3. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 1, is characterized in that, described distributed block equipment be one with software realize, without share, the storage replication solution of mirror image block device content between server;

When the file system that data write on the local host distributed apparatus, data can be sent on an other distance host in network simultaneously, and are recorded in a file system with identical form; The establishment of described file system is synchronously realizing by distributed block equipment.

4. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 3, is characterized in that, when distance host and local host all return when writing successfully, the process that whole data are write is just returned successfully; When the main management node breaks down, remain with a identical data on standby management node.

5. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 1, it is characterized in that, in described (2), adopt heartbeat mechanism failure judgement management node, namely connect with being connected the monitoring that heartbeat sends information and replys the other side by the online management node between management node, and also automatically realize failover by ping third party's node mode failure judgement management node.

6. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 5, is characterized in that, the migration in conjunction with resource and service when carrying out failover realizes; Described resource and service comprise:

1) management node storing system information file;

2) management node managing I P;

3) Parastor200 management service and Parastor200 graphical interfaces service;

4) data synchronization service.

7. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 6, is characterized in that described 1) in, management node storing system information file resource by backed up in synchronization to standby management node.

8. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 6, it is characterized in that, described 2) in, described management node managing I P is that management node sends to metadata node, back end the IP that administration order is walked, and described management node managing I P moves on standby management node from the online management node when failover.

9. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 6, it is characterized in that, described 3) in, described Parastor200 management service and the service of Parastor200 graphical interfaces switch on standby management node from the online management node when failover.

10. the parallel high methods availalbe of storage administration node of Parastor200 as claimed in claim 6, it is characterized in that, described 4) in, after switching, standby management node becomes the main management node, and the information of standby management node is backuped on original main management node conversely.