CN105490847A

CN105490847A - Real-time detecting and processing method of node failure in private cloud storage system

Info

Publication number: CN105490847A
Application number: CN201510897964.5A
Authority: CN
Inventors: 刘树发; 温晋英; 杨连群; 王莹; 宋津旭; 王鹏; 李翔宇; 卢鑫刚
Original assignee: TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Current assignee: TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Priority date: 2015-12-08
Filing date: 2015-12-08
Publication date: 2016-04-13
Anticipated expiration: 2035-12-08
Also published as: CN105490847B

Abstract

The present invention relates to a real-time detecting and processing method of node failure in a private cloud storage system, characterized in that storage nodes are connected by a data synchronization network and are connected with a cloud computing server through a data service network; a management terminal is arranged in the storage nodes, and is used for checking work states of all the storage nodes; and self-check is performed on each storage node. According to the present invention, various data services in the private cloud storage system can be managed effectively; when the server breaks down, a way of automatically restoring the data services effectively facilitates the operation of a user and reduces the labor cost of a user side. The data service interruption caused by each equipment failure is avoided by automatically restoring the data services, thereby reducing the loss caused by the interruption of application business using the data services.

Description

A kind of privately owned cloud storage system interior joint fault detects and processing method in real time

Technical field

The invention belongs to cloud storage system error correcting technique field, especially a kind of privately owned cloud storage system interior joint fault detects and processing method in real time.

Background technology

Cloud storage is in the conceptive extension of cloud computing and the new concept of development out one, it is a kind of emerging Network storage technology, refer to by functions such as cluster application, network technology or distributed file systems, various dissimilar memory device a large amount of in network is gathered collaborative work by application software, the common system that data storage and Operational Visit function are externally provided, the core of this system is that application software combines with memory device, realizes the transformation of memory device to stores service by application software.Compared with conventional memory device, cloud storage system not only relates to hardware, but the complication system of multiple part composition such as the network equipment, memory device, server, application software, a public access interface, each several part take memory device as core, externally provides data to store and Operational Visit service by application software.Such as: the places such as school, enterprise, government, information centre, data center, it is deepened day by day to the dependence of data, and data have become numerous business activity and to have rely the basis of carrying out.

Some only provide the structure of corresponding stores service to be referred to as privately owned cloud storage system to limited users, it is a kind of is government department or the customized cloud stores service scheme of corporate client, top quality service next to the skin can not only be provided for client, and security risk can also be reduced on certain procedures.But, for data, services fault and equipment fault, allow user manually carry out fault location and respective handling is unpractical, therefore for privately owned cloud storage system, how data service fault and equipment fault are positioned and to be processed, be user-friendly to, become a problem needing to solve.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, provide Real-Time Monitoring and the corresponding one of Different treatments privately owned cloud storage system interior joint fault of taking detects and processing method in real time.

The technical scheme that the present invention takes is:

Advantage of the present invention and good effect are:

In the present invention, memory node is connected by data sync network, and memory node is connected with cloud computing server by data services network, in memory node, management end is set, the operating state of management end to all memory nodes is adopted to check, each memory node checks oneself self store status, data services network state, data sync network state, data service state, independent these contents of IP state, how to be a place by the inspection of whole and part thus, all processing method is provided with for the different conditions occurred in each step simultaneously, it can manage the various data, services in privately owned cloud storage system effectively, when server fail, the mode of automation recovery data, services facilitates the operation of user effectively, reduce the cost of labor of user side.By automatically recovering data, services, avoid each data, services interruption equipment fault occurring and causes, thus the loss that the applied business reducing usage data service interrupts and causes.

Accompanying drawing explanation

Fig. 1 is structural representation of the present invention.

Embodiment

Below in conjunction with embodiment, the present invention is further described, and following embodiment is illustrative, is not determinate, can not limit protection scope of the present invention with following embodiment.

A kind of privately owned cloud storage system interior joint fault detects and processing method in real time, as shown in Figure 1, innovation of the present invention is: comprise multiple memory node and multiple cloud computing server that several data can be provided to serve, inner exchanges data is completed by data sync network between multiple memory node, multiple memory node completes the data, services with cloud computing server by data services network, in memory node, arrange a management end, described method comprises initialization procedure, management end detection and processing procedure and memory node detects and processing procedure;

Described initialization procedure comprises the following steps:

(1) management end preserves the preparation of the stored configuration of all memory nodes, network configuration and data, services in advance;

(2) memory node only preserves the storage preparation of this node, network configuration and data, services preparation;

(3) for each data, services selects any two memory nodes mirror image distribute separate tP address each other;

(4) the detection time of management end and memory node is set;

Described management end detects and processing procedure comprises the following steps:

(1) the connection status of each memory node is automatically checked successively according to detection time;

(2) when certain memory node is without response, this memory node is set to unavailable, illustrate that delay machine or network of the equipment at this place connects and disconnect, the data, services on the mirrored storage node that on current memory node, all data, services of original configuration are corresponding provides service, enters step (4); All data, services of above-mentioned memory node can configure on another one memory node, mirror image each other; Also multiple data, services can be configured respectively on other multiple memory node, mirror image each other;

When certain memory node normal response, enter next step;

The method of operation whether responded is: directly this memory node of PING or detect program corresponding on this memory node and whether normally run; The program of above-mentioned correspondence refers to the program of state in the detection memory device that memory node below detects and runs in advance in processing procedure;

(3) obtain the store status of this memory node; The store status at this place refers to the feedback record of management end themselves capture, the source of these feedback records be memory node below detect and in processing procedure during different conditions to the feedback record that management end sends;

When store status is abnormal, this memory node is set to unavailable, stop the data, services on this memory node, the data, services on the mirrored storage node that on current memory node, all data, services of original configuration are corresponding provides service, enters step (4) simultaneously; All data, services of above-mentioned memory node can configure on another one memory node, mirror image each other; Also multiple data, services can be configured respectively on other multiple memory node, mirror image each other;

(4) continue to detect next memory node, until complete the detection of all memory nodes;

(5) after management end receives the disabled information of memory node, meeting mail or other known mode reporting system keeper, system manager can attempt recovering or contact technical staff voluntarily and recover, and after node to be stored reverts to upstate, starts the data, services on this memory node; Above-mentioned recovery voluntarily can be: machine delay machine time restart equipment in memory node, network connect to detect when disconnecting netting twine, network interface card or, the network equipment such as switch or router;

Described memory node detects and processing procedure comprises the following steps:

(1) according to the store status checking this memory node detection time;

(2), when the memory device of this memory node is without response, by the information feed back of this memory device to management end, this has detected;

Above-mentioned memory device can be that common hard disc, disk array etc. are for storing the equipment of data;

When the memory device normal response of this memory node, enter step (3);

Three kinds of situations are divided into without response:

(1) the plant: scanning system, when chkdsk label does not exist, attempts reloading storage (system carries the program reloaded, and can attempt reconnecting memory device after operation), feed back to management end when cannot reload;

(2) the plant: disk failures, directly feeds back to management end;

(3) the plant: subregion is inconsistent, directly feeds back to management end;

Subregion is inconsistent is two kinds of situations:

1. the plant: memory device has been deleted without response or subregion;

2. the plant: subregion is modified.

(3) check the data services network of this memory node, when data services network disconnects, suspend all data, services on this memory node, this has detected;

When the data services network of this memory node is normal, enter step (4);

Data services network when disconnecting is: accessed the some memory nodes preset by data services network, if all cannot access, thinks disconnection.

(4) check the data sync network of this memory node, when data sync network disconnects, directly terminate this and detect and processing procedure, do not do any operation, this has detected;

When data sync network is normal, enter step (5);

Data sync network when disconnecting is: accessed the some memory nodes preset by data sync network, if all cannot access, thinks disconnection.

(5) check the data service state of this memory node, when this memory node data service state is halted state, (halted state comprises: 1. current data service is set to not use, data belong to the legacy data that abandons or data, services is normally closed; 2. the non-responsive state (2) related in step), enter step (7);

When this memory node data service state is halted state, detect the data service state of mirror image each other, when the data service state of mirror image starts, enter step (7), if the data service state of mirror image does not start, the data service state of this memory node is started, enters step (6);

When this data service state starts, enter step (6);

(6) the independent IP state of this memory node data, services checked, when independent IP loses, recovers independent IP, enters next step;

When independent IP is normal, enter next step;

(7) jump to the inspection that (5) step carries out next data service state, until complete the detection of the data, services of all memory devices.

Embodiment 1

In a certain laboratory of school, number of servers is limited, only has two to be used as memory node, directly uses big capacity hard disk to install on the server as storage medium.

For this situation, management end is arranged on one of them memory node, the situation of machine, another memory node normal operation if the memory node occurring just to serve in service data is delayed, and the testing process of memory node part is such:

The memory node of machine of delaying cannot run, thus cannot detect.

Normal memory node, can detect that memory device normal response, data services network are normal, data sync network is normal successively.When detecting that data service state is halted state, detect the data service state of another memory node of mirror image each other, the machine because memory node has been delayed, so data service state does not start, then can start the data, services of this node, thus ensure normally to provide data, services.

If management end program operates in normal node, then can the memory node of the machine of delaying be set to unavailable, and reporting system keeper.

Embodiment 2

In a certain data center, there is special memory device, use this memory device to be connected to server as storage medium.Now, if there is the connection fault of memory device and memory node, the testing process of memory node part is such:

On the node broken down, can detect that memory device is without response, then feed back to management end, management end can stop all data, services of this memory node.

Normal memory node, can detect that memory device normal response, data services network are normal, data sync network is normal successively.When detecting that data service state is halted state, detecting the data service state of another memory node of mirror image each other, because data, services stops, then can start the data, services of this node, thus ensure normally to provide data, services.

Claims

1. a privately owned cloud storage system interior joint fault detects and processing method in real time, it is characterized in that: comprise multiple memory node and multiple cloud computing server that several data can be provided to serve, inner exchanges data is completed by data sync network between multiple memory node, multiple memory node completes the data, services with cloud computing server by data services network, in memory node, arrange a management end, described method comprises initialization procedure, management end detection and processing procedure and memory node detects and processing procedure;

Described initialization procedure comprises the following steps:

(4) the detection time of management end and memory node is set;

When certain memory node normal response, enter next step;

(1) according to the store status checking this memory node detection time;

(2), when the memory device of this memory node is without response, by the information feed back of this memory device to management end, this has detected; Above-mentioned memory device can be that common hard disc, disk array etc. are for storing the equipment of data;

When the memory device normal response of this memory node, enter step (3);

When the data services network of this memory node is normal, enter step (4);

When data sync network is normal, enter step (5);

When this data service state starts, enter step (6);

When independent IP is normal, enter next step;

2. one according to claim 1 privately owned cloud storage system interior joint fault detects and processing method in real time, it is characterized in that: described management end detects and in processing procedure, the step method of operation whether responded (2) is: directly this memory node of PING or detect program corresponding on this memory node and whether normally run.

3. one according to claim 1 privately owned cloud storage system interior joint fault detects and processing method in real time, it is characterized in that: described memory node detects and in processing procedure, step nothing response is (2) divided into three kinds of situations:

(1) the plant: scanning system, when chkdsk label does not exist, attempts reloading storage, feed back to management end when cannot reload;

(2) the plant: disk failures, directly feeds back to management end;

Subregion is inconsistent is two kinds of situations:

1. the plant: memory device has been deleted without response or subregion;

2. the plant: subregion is modified.

4. one according to claim 1 privately owned cloud storage system interior joint fault detects and processing method in real time, it is characterized in that: when described memory node detects and in processing procedure, step data services network (3) disconnects be: accessed the some memory nodes preset by data services network, if all cannot access, think disconnection.

5. one according to claim 1 privately owned cloud storage system interior joint fault detects and processing method in real time, it is characterized in that: when described memory node detects and in processing procedure, step data sync network (4) disconnects be: accessed the some memory nodes preset by data sync network, if all cannot access, think disconnection.