CN105490847B

CN105490847B - A kind of private cloud storage system interior joint failure real-time detection and processing method

Info

Publication number: CN105490847B
Application number: CN201510897964.5A
Authority: CN
Inventors: 刘树发; 温晋英; 杨连群; 王莹; 宋津旭; 王鹏; 李翔宇; 卢鑫刚
Original assignee: TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Current assignee: TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Priority date: 2015-12-08
Filing date: 2015-12-08
Publication date: 2019-03-29
Anticipated expiration: 2035-12-08
Also published as: CN105490847A

Abstract

The present invention relates to a kind of private cloud storage system interior joint failure real-time detection and processing methods, memory node is connected by data sync network, and memory node is connect by data services network with cloud computing server, management end is set in memory node, check that each memory node checks oneself itself using working condition of the management end to all memory nodes.The present invention can effectively manage the various data services in private cloud storage system, and when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduce the cost of labor of user side.By restoring data service automatically, avoid the interruption of data service caused by each occurrence of equipment failure, thus reduce using the applied business of data service interrupt and caused by loss.

Description

A kind of private cloud storage system interior joint failure real-time detection and processing method

Technical field

The invention belongs to cloud storage system error correcting technique field, especially a kind of private cloud storage system interior joint failure is real When detection and processing method.

Background technique

Cloud storage is in new concept cloud computing conceptive extension and developed, and is that the emerging network of one kind is deposited Storage technology refers to through functions such as cluster application, network technology or distributed file systems, by various inhomogeneities a large amount of in network The storage equipment of type gathers collaborative work by application software, common externally to provide data storage and business access function One system, the core of the system is application software to be combined with storage equipment, realized by application software storage equipment to The transformation of storage service.Compared with conventional memory device, cloud storage system not exclusively to hardware, but a network equipment, The complication system of the multiple portions such as equipment, server, application software, public access interface composition is stored, each section is set to store Standby is core, externally provides data storage and business access service by application software.Such as: school, enterprise, government, letter The dependence of data is increasingly deepened at breath center, data center etc., and data have become numerous business activities and rely development Basis.

The some structures for providing corresponding storage service to limited users are referred to as private cloud storage system, are one Kind it is government department or the customized cloud storage service scheme of corporate client, top quality skintight suit can not only be provided for client Business, and security risk can also be reduced on certain procedures.But for data service failure and equipment fault, user hand is allowed Dynamic progress fault location and respective handling are unpractical, therefore for private cloud storage system, how to data service fault It is positioned and is handled with equipment fault, it is user-friendly, become a problem to be solved.

Summary of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide real-time monitoring and accordingly take Different treatments A kind of private cloud storage system interior joint failure real-time detection and processing method.

The technical solution adopted by the present invention is that:

The advantages and positive effects of the present invention are:

In the present invention, memory node is connected by data sync network, and memory node is passed through into data service net Network is connect with cloud computing server, and management end is arranged in memory node, using management end to the work shape of all memory nodes State checked, each memory node check oneself the storage state of itself, data services network state, data sync network state, Data service state, independent these contents of IP state, thus by whole and part inspection, what is at one, simultaneously for each step The different conditions occurred in rapid are provided with processing method, can effectively manage the various data clothes in private cloud storage system Business, when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduces use The cost of labor of family side.By restoring data service automatically, the interruption of data service caused by each occurrence of equipment failure is avoided, To reduce using data service applied business interrupt and caused by loss.

Detailed description of the invention

Fig. 1 is structural schematic diagram of the invention.

Specific embodiment

Below with reference to embodiment, the present invention is further described, following embodiments be it is illustrative, be not restrictive, It cannot be limited the scope of protection of the present invention with following embodiments.

A kind of private cloud storage system interior joint failure real-time detection and processing method, as shown in Figure 1, innovation of the invention It is: including that can provide the multiple memory nodes and multiple cloud computing servers of a variety of data services, between multiple memory nodes By the data exchange inside data sync network completion, multiple memory nodes, which pass through data services network, to be completed to take with cloud computing The data service of business device, is arranged a management end in memory node, and the method includes initialization procedure, management end detection and places Reason process and memory node detection and treatment process；

The initialization procedure the following steps are included:

(1) the storage configuration, network configuration and data service that management end pre-saves all memory nodes are prepared；

(2) storage preparation, network configuration and the data service that memory node only saves this node are prepared；

(3) any two memory node is selected for each data service mirror image and distribute separate tP address each other；

(4) the detection time of management end and memory node is set；

Management end detection and treatment process the following steps are included:

(1) the connection status of each memory node is examined successively automatically according to detection time；

(2) when some memory node is without response, which is set as unavailable, illustrate equipment delay machine at this or Network connection disconnects, the data clothes on currently stored node on the corresponding mirrored storage node of all data services of original configuration (4) business offer service, enters step；All data services of above-mentioned memory node can configure on another memory node, Mirror image each other；Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image；

When some memory node normal response, into next step；

The operating method whether responded is: direct PING memory node detects corresponding program on the memory node Whether normal operation；Above-mentioned corresponding program refers to that the detection being previously run in following memory node detection and treatment process is deposited Store up the program of state in equipment；

(3) the storage state of the memory node is obtained；Storage state at this refers to the feedback note of management end themselves capture Record, the source of these feedback records are sent when being different conditions in following memory node detection and treatment process to management end Feedback record；

When storage state exception, set unavailable for the memory node, the data being simultaneously stopped on the memory node It services, the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides clothes (4) business, enters step；All data services of above-mentioned memory node can configure on another memory node, each other mirror Picture；Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image；

(4) next memory node is continued to test, until completing the detection of all memory nodes；

(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system pipes Reason person, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, open Move the data service on the memory node；It is above-mentioned voluntarily to restore to restart the equipment in memory node, net when may is that machine delay machine The network equipments such as detection cable, network interface card or switch or router when network connection disconnects；

Memory node detection and treatment process the following steps are included:

(1) the storage state of this memory node is checked according to detection time；

(2) when the storage equipment of the memory node is without response, the information of the storage equipment is fed back to management end, this Detection is completed；

Above-mentioned storage equipment can be the equipment for storing data such as common hard disc, disk array；

When the storage equipment normal response of the memory node, enter step (3)；

No response is divided into three kinds of situations:

(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, and (system is come with again The program of load can be attempted to reconnect storage equipment after operation), it feeds back when can not reload to management end；

(2) the plants: disk failures are directly fed back to management end；

(3) the plants: subregion is inconsistent, is directly fed back to management end；

Inconsistent subregion is two kinds of situations:

1. the plants: storage equipment has been deleted without response or subregion；

2. the plants: subregion is modified.

(3) the data services network for checking the memory node suspends on the memory node when data services network disconnects All data services, this detection are completed；

When the data services network of the memory node is normal, enter step (4)；

Data services network is when disconnecting: preset several memory nodes are accessed by data services network, if It can not access, it is believed that disconnect.

(4) the data sync network for checking the memory node directly terminates this detection when data sync network disconnects And treatment process, any operation is not done, this detection is completed；

When data sync network is normal, enter step (5)；

Data sync network is when disconnecting: preset several memory nodes are accessed by data sync network, if It can not access, it is believed that disconnect.

(5) the data service state for checking the memory node, when the memory node data service state is halted state (halted state include: 1. current data services be arranged to not use, the legacy data that data belong to discarding or data service just Often close；2. the (2) non-responsive state involved in step), it enters step (7)；

When the memory node data service state is halted state, the data service state of mirror image each other is detected, mirror is worked as The data service state of picture has been turned on, and enters step (7), makes the memory node if the data service state of mirror image is inactive Data service state starting, enter step (6)；

When the data service state has been turned on, enter step (6)；

(6) the independent IP state of the memory node data service checked restores independent IP when independent IP loses, and enters Next step；

When independent IP is normal, into next step；

(7) go to step and (5) carry out the inspection of next data service state, until completing the data of all storage equipment The detection of service.

Embodiment 1

In a certain laboratory of school, number of servers is limited, and only two are used as memory node, directly hard using large capacity Disk installation is used as storage medium on the server.

For this situation, management end is installed therein on a memory node, is if it happens currently running data service The detection process of the case where memory node delay machine, another memory node normal operation, memory node part are such that

The memory node of delay machine has been unable to run, thus can not be detected.

Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected Service state, because of memory node delay machine, data service state is inactive, then will start the data service of this node, To guarantee normal offer data service.

If management end program operates in normal node, it can set unavailable for the memory node of delay machine, and lead to Know system manager.

Embodiment 2

In a certain data center, there is dedicated storage equipment, uses this storage equipment to be connected to server and be situated between as storage Matter.At this point, the detection process of memory node part is such that if there is the connecting fault of storage equipment and memory node

On the node of failure, storage equipment can be detected without response, then feedback arrives management end, and management end can stop All data services of this memory node.

Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected Service state then will start the data service of this node because data service has stopped, to guarantee that the normal data that provide take Business.

Claims

1. a kind of private cloud storage system interior joint failure real-time detection and processing method, it is characterised in that: more including that can provide The multiple memory nodes and multiple cloud computing servers of data service are planted, it is complete to pass through data sync network between multiple memory nodes At internal data exchange, multiple memory nodes complete the data service with cloud computing server by data services network, One management end is set in memory node, and the method includes initialization procedure, management end detection and treatment process and memory nodes Detection and treatment process；

The initialization procedure the following steps are included:

(4) the detection time of management end and memory node is set；

(2) when some memory node is without response, which is set as unavailable, illustrates equipment delay machine or network at this Connection disconnects, and the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration mentions For service, enter step (4)；

All data services of the memory node configure on another memory node, each other mirror image；

When some memory node normal response, into next step；

(3) the storage state of the memory node is obtained；Storage state at this refers to the feedback record of management end themselves capture, this The feedback that the source of a little feedback records is sent when being different conditions in following memory node detection and treatment process to management end Record；

When storage state exception, set unavailable for the memory node, the data service being simultaneously stopped on the memory node, Data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides service, into Enter step (4)；All data services of above-mentioned memory node can configure on another memory node, each other mirror image；It can also Multiple data services are respectively configured on other multiple memory nodes, mirror image each other；

(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system administration Member, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, starting Data service on the memory node；It is above-mentioned voluntarily to restore to restart the equipment in memory node, network connection when being machine delay machine Cable, network interface card or switch or router are detected when disconnection；

Memory node detection and treatment process the following steps are included:

(2) when the storage equipment of the memory node is without response, by the information feedback of the storage equipment to management end, this detection It completes；Above-mentioned storage equipment is the common hard disc or disk array of equipment for storing data；

(3) the data services network for checking the memory node suspends on the memory node and owns when data services network disconnects Data service, this detection are completed；

When the data services network of the memory node is normal, enter step (4)；

(4) the data sync network for checking the memory node directly terminates this detection and place when data sync network disconnects Reason process does not do any operation, this detection is completed；

When data sync network is normal, enter step (5)；

(5) the data service state for checking the memory node enters when the memory node data service state is halted state Step is (7)；

The halted state includes: that 1. current data services are arranged to not use, data belong to the legacy data or number of discarding According to service normal switching-off；2. the (2) non-responsive state involved in step；

When the memory node data service state is halted state, the data service state of mirror image each other is detected, when mirror image Data service state has been turned on, and enters step (7), the number of the memory node is made if the data service state of mirror image is inactive Start according to service state, enters step (6)；

When the data service state has been turned on, enter step (6)；

(6) the independent IP state of the memory node data service checked restores independent IP, entrance is next when independent IP loses A step；

When independent IP is normal, into next step；

(7) go to step and (5) carry out the inspection of next data service state, until completing the data service of all storage equipment Detection.

2. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: the operating method whether responded of step (2) is in the management end detection and treatment process: the direct PING storage Node detects on the memory node corresponding program whether normal operation.

3. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: in memory node detection and treatment process step (2) be divided into three kinds of situations without response:

(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, feedback when can not reload To management end；

(2) the plants: disk failures are directly fed back to management end；

Inconsistent subregion is two kinds of situations:

2. the plants: subregion is modified.

4. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: being when the data services network of step (3) disconnects in the memory node detection and treatment process: passing through data service Network accesses preset several memory nodes, if can not access, it is believed that disconnects.

5. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: being when the data sync network of step (4) disconnects in the memory node detection and treatment process: synchronous by data Network accesses preset several memory nodes, if can not access, it is believed that disconnects.