CN105490847B - A kind of private cloud storage system interior joint failure real-time detection and processing method - Google Patents
A kind of private cloud storage system interior joint failure real-time detection and processing method Download PDFInfo
- Publication number
- CN105490847B CN105490847B CN201510897964.5A CN201510897964A CN105490847B CN 105490847 B CN105490847 B CN 105490847B CN 201510897964 A CN201510897964 A CN 201510897964A CN 105490847 B CN105490847 B CN 105490847B
- Authority
- CN
- China
- Prior art keywords
- memory node
- data
- detection
- data service
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Hardware Redundancy (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention relates to a kind of private cloud storage system interior joint failure real-time detection and processing methods, memory node is connected by data sync network, and memory node is connect by data services network with cloud computing server, management end is set in memory node, check that each memory node checks oneself itself using working condition of the management end to all memory nodes.The present invention can effectively manage the various data services in private cloud storage system, and when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduce the cost of labor of user side.By restoring data service automatically, avoid the interruption of data service caused by each occurrence of equipment failure, thus reduce using the applied business of data service interrupt and caused by loss.
Description
Technical field
The invention belongs to cloud storage system error correcting technique field, especially a kind of private cloud storage system interior joint failure is real
When detection and processing method.
Background technique
Cloud storage is in new concept cloud computing conceptive extension and developed, and is that the emerging network of one kind is deposited
Storage technology refers to through functions such as cluster application, network technology or distributed file systems, by various inhomogeneities a large amount of in network
The storage equipment of type gathers collaborative work by application software, common externally to provide data storage and business access function
One system, the core of the system is application software to be combined with storage equipment, realized by application software storage equipment to
The transformation of storage service.Compared with conventional memory device, cloud storage system not exclusively to hardware, but a network equipment,
The complication system of the multiple portions such as equipment, server, application software, public access interface composition is stored, each section is set to store
Standby is core, externally provides data storage and business access service by application software.Such as: school, enterprise, government, letter
The dependence of data is increasingly deepened at breath center, data center etc., and data have become numerous business activities and rely development
Basis.
The some structures for providing corresponding storage service to limited users are referred to as private cloud storage system, are one
Kind it is government department or the customized cloud storage service scheme of corporate client, top quality skintight suit can not only be provided for client
Business, and security risk can also be reduced on certain procedures.But for data service failure and equipment fault, user hand is allowed
Dynamic progress fault location and respective handling are unpractical, therefore for private cloud storage system, how to data service fault
It is positioned and is handled with equipment fault, it is user-friendly, become a problem to be solved.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide real-time monitoring and accordingly take Different treatments
A kind of private cloud storage system interior joint failure real-time detection and processing method.
The technical solution adopted by the present invention is that:
The advantages and positive effects of the present invention are:
In the present invention, memory node is connected by data sync network, and memory node is passed through into data service net
Network is connect with cloud computing server, and management end is arranged in memory node, using management end to the work shape of all memory nodes
State checked, each memory node check oneself the storage state of itself, data services network state, data sync network state,
Data service state, independent these contents of IP state, thus by whole and part inspection, what is at one, simultaneously for each step
The different conditions occurred in rapid are provided with processing method, can effectively manage the various data clothes in private cloud storage system
Business, when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduces use
The cost of labor of family side.By restoring data service automatically, the interruption of data service caused by each occurrence of equipment failure is avoided,
To reduce using data service applied business interrupt and caused by loss.
Detailed description of the invention
Fig. 1 is structural schematic diagram of the invention.
Specific embodiment
Below with reference to embodiment, the present invention is further described, following embodiments be it is illustrative, be not restrictive,
It cannot be limited the scope of protection of the present invention with following embodiments.
A kind of private cloud storage system interior joint failure real-time detection and processing method, as shown in Figure 1, innovation of the invention
It is: including that can provide the multiple memory nodes and multiple cloud computing servers of a variety of data services, between multiple memory nodes
By the data exchange inside data sync network completion, multiple memory nodes, which pass through data services network, to be completed to take with cloud computing
The data service of business device, is arranged a management end in memory node, and the method includes initialization procedure, management end detection and places
Reason process and memory node detection and treatment process;
The initialization procedure the following steps are included:
(1) the storage configuration, network configuration and data service that management end pre-saves all memory nodes are prepared;
(2) storage preparation, network configuration and the data service that memory node only saves this node are prepared;
(3) any two memory node is selected for each data service mirror image and distribute separate tP address each other;
(4) the detection time of management end and memory node is set;
Management end detection and treatment process the following steps are included:
(1) the connection status of each memory node is examined successively automatically according to detection time;
(2) when some memory node is without response, which is set as unavailable, illustrate equipment delay machine at this or
Network connection disconnects, the data clothes on currently stored node on the corresponding mirrored storage node of all data services of original configuration
(4) business offer service, enters step;All data services of above-mentioned memory node can configure on another memory node,
Mirror image each other;Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image;
When some memory node normal response, into next step;
The operating method whether responded is: direct PING memory node detects corresponding program on the memory node
Whether normal operation;Above-mentioned corresponding program refers to that the detection being previously run in following memory node detection and treatment process is deposited
Store up the program of state in equipment;
(3) the storage state of the memory node is obtained;Storage state at this refers to the feedback note of management end themselves capture
Record, the source of these feedback records are sent when being different conditions in following memory node detection and treatment process to management end
Feedback record;
When storage state exception, set unavailable for the memory node, the data being simultaneously stopped on the memory node
It services, the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides clothes
(4) business, enters step;All data services of above-mentioned memory node can configure on another memory node, each other mirror
Picture;Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image;
(4) next memory node is continued to test, until completing the detection of all memory nodes;
(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system pipes
Reason person, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, open
Move the data service on the memory node;It is above-mentioned voluntarily to restore to restart the equipment in memory node, net when may is that machine delay machine
The network equipments such as detection cable, network interface card or switch or router when network connection disconnects;
Memory node detection and treatment process the following steps are included:
(1) the storage state of this memory node is checked according to detection time;
(2) when the storage equipment of the memory node is without response, the information of the storage equipment is fed back to management end, this
Detection is completed;
Above-mentioned storage equipment can be the equipment for storing data such as common hard disc, disk array;
When the storage equipment normal response of the memory node, enter step (3);
No response is divided into three kinds of situations:
(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, and (system is come with again
The program of load can be attempted to reconnect storage equipment after operation), it feeds back when can not reload to management end;
(2) the plants: disk failures are directly fed back to management end;
(3) the plants: subregion is inconsistent, is directly fed back to management end;
Inconsistent subregion is two kinds of situations:
1. the plants: storage equipment has been deleted without response or subregion;
2. the plants: subregion is modified.
(3) the data services network for checking the memory node suspends on the memory node when data services network disconnects
All data services, this detection are completed;
When the data services network of the memory node is normal, enter step (4);
Data services network is when disconnecting: preset several memory nodes are accessed by data services network, if
It can not access, it is believed that disconnect.
(4) the data sync network for checking the memory node directly terminates this detection when data sync network disconnects
And treatment process, any operation is not done, this detection is completed;
When data sync network is normal, enter step (5);
Data sync network is when disconnecting: preset several memory nodes are accessed by data sync network, if
It can not access, it is believed that disconnect.
(5) the data service state for checking the memory node, when the memory node data service state is halted state
(halted state include: 1. current data services be arranged to not use, the legacy data that data belong to discarding or data service just
Often close;2. the (2) non-responsive state involved in step), it enters step (7);
When the memory node data service state is halted state, the data service state of mirror image each other is detected, mirror is worked as
The data service state of picture has been turned on, and enters step (7), makes the memory node if the data service state of mirror image is inactive
Data service state starting, enter step (6);
When the data service state has been turned on, enter step (6);
(6) the independent IP state of the memory node data service checked restores independent IP when independent IP loses, and enters
Next step;
When independent IP is normal, into next step;
(7) go to step and (5) carry out the inspection of next data service state, until completing the data of all storage equipment
The detection of service.
Embodiment 1
In a certain laboratory of school, number of servers is limited, and only two are used as memory node, directly hard using large capacity
Disk installation is used as storage medium on the server.
For this situation, management end is installed therein on a memory node, is if it happens currently running data service
The detection process of the case where memory node delay machine, another memory node normal operation, memory node part are such that
The memory node of delay machine has been unable to run, thus can not be detected.
Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous
Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected
Service state, because of memory node delay machine, data service state is inactive, then will start the data service of this node,
To guarantee normal offer data service.
If management end program operates in normal node, it can set unavailable for the memory node of delay machine, and lead to
Know system manager.
Embodiment 2
In a certain data center, there is dedicated storage equipment, uses this storage equipment to be connected to server and be situated between as storage
Matter.At this point, the detection process of memory node part is such that if there is the connecting fault of storage equipment and memory node
On the node of failure, storage equipment can be detected without response, then feedback arrives management end, and management end can stop
All data services of this memory node.
Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous
Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected
Service state then will start the data service of this node because data service has stopped, to guarantee that the normal data that provide take
Business.
Claims (5)
1. a kind of private cloud storage system interior joint failure real-time detection and processing method, it is characterised in that: more including that can provide
The multiple memory nodes and multiple cloud computing servers of data service are planted, it is complete to pass through data sync network between multiple memory nodes
At internal data exchange, multiple memory nodes complete the data service with cloud computing server by data services network,
One management end is set in memory node, and the method includes initialization procedure, management end detection and treatment process and memory nodes
Detection and treatment process;
The initialization procedure the following steps are included:
(1) the storage configuration, network configuration and data service that management end pre-saves all memory nodes are prepared;
(2) storage preparation, network configuration and the data service that memory node only saves this node are prepared;
(3) any two memory node is selected for each data service mirror image and distribute separate tP address each other;
(4) the detection time of management end and memory node is set;
Management end detection and treatment process the following steps are included:
(1) the connection status of each memory node is examined successively automatically according to detection time;
(2) when some memory node is without response, which is set as unavailable, illustrates equipment delay machine or network at this
Connection disconnects, and the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration mentions
For service, enter step (4);
All data services of the memory node configure on another memory node, each other mirror image;
When some memory node normal response, into next step;
(3) the storage state of the memory node is obtained;Storage state at this refers to the feedback record of management end themselves capture, this
The feedback that the source of a little feedback records is sent when being different conditions in following memory node detection and treatment process to management end
Record;
When storage state exception, set unavailable for the memory node, the data service being simultaneously stopped on the memory node,
Data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides service, into
Enter step (4);All data services of above-mentioned memory node can configure on another memory node, each other mirror image;It can also
Multiple data services are respectively configured on other multiple memory nodes, mirror image each other;
(4) next memory node is continued to test, until completing the detection of all memory nodes;
(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system administration
Member, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, starting
Data service on the memory node;It is above-mentioned voluntarily to restore to restart the equipment in memory node, network connection when being machine delay machine
Cable, network interface card or switch or router are detected when disconnection;
Memory node detection and treatment process the following steps are included:
(1) the storage state of this memory node is checked according to detection time;
(2) when the storage equipment of the memory node is without response, by the information feedback of the storage equipment to management end, this detection
It completes;Above-mentioned storage equipment is the common hard disc or disk array of equipment for storing data;
When the storage equipment normal response of the memory node, enter step (3);
(3) the data services network for checking the memory node suspends on the memory node and owns when data services network disconnects
Data service, this detection are completed;
When the data services network of the memory node is normal, enter step (4);
(4) the data sync network for checking the memory node directly terminates this detection and place when data sync network disconnects
Reason process does not do any operation, this detection is completed;
When data sync network is normal, enter step (5);
(5) the data service state for checking the memory node enters when the memory node data service state is halted state
Step is (7);
The halted state includes: that 1. current data services are arranged to not use, data belong to the legacy data or number of discarding
According to service normal switching-off;2. the (2) non-responsive state involved in step;
When the memory node data service state is halted state, the data service state of mirror image each other is detected, when mirror image
Data service state has been turned on, and enters step (7), the number of the memory node is made if the data service state of mirror image is inactive
Start according to service state, enters step (6);
When the data service state has been turned on, enter step (6);
(6) the independent IP state of the memory node data service checked restores independent IP, entrance is next when independent IP loses
A step;
When independent IP is normal, into next step;
(7) go to step and (5) carry out the inspection of next data service state, until completing the data service of all storage equipment
Detection.
2. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special
Sign is: the operating method whether responded of step (2) is in the management end detection and treatment process: the direct PING storage
Node detects on the memory node corresponding program whether normal operation.
3. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special
Sign is: in memory node detection and treatment process step (2) be divided into three kinds of situations without response:
(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, feedback when can not reload
To management end;
(2) the plants: disk failures are directly fed back to management end;
(3) the plants: subregion is inconsistent, is directly fed back to management end;
Inconsistent subregion is two kinds of situations:
1. the plants: storage equipment has been deleted without response or subregion;
2. the plants: subregion is modified.
4. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special
Sign is: being when the data services network of step (3) disconnects in the memory node detection and treatment process: passing through data service
Network accesses preset several memory nodes, if can not access, it is believed that disconnects.
5. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special
Sign is: being when the data sync network of step (4) disconnects in the memory node detection and treatment process: synchronous by data
Network accesses preset several memory nodes, if can not access, it is believed that disconnects.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510897964.5A CN105490847B (en) | 2015-12-08 | 2015-12-08 | A kind of private cloud storage system interior joint failure real-time detection and processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510897964.5A CN105490847B (en) | 2015-12-08 | 2015-12-08 | A kind of private cloud storage system interior joint failure real-time detection and processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105490847A CN105490847A (en) | 2016-04-13 |
CN105490847B true CN105490847B (en) | 2019-03-29 |
Family
ID=55677591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510897964.5A Expired - Fee Related CN105490847B (en) | 2015-12-08 | 2015-12-08 | A kind of private cloud storage system interior joint failure real-time detection and processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105490847B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331642B (en) * | 2016-08-31 | 2020-05-26 | 浙江大华技术股份有限公司 | Data processing method and device in video cloud system |
CN108933798B (en) * | 2017-05-23 | 2022-02-18 | 杭州海康威视数字技术股份有限公司 | Data storage method, storage server and system |
CN109361777B (en) * | 2018-12-18 | 2021-08-10 | 广东浪潮大数据研究有限公司 | Synchronization method, synchronization system and related device for distributed cluster node states |
CN111866054A (en) * | 2019-12-16 | 2020-10-30 | 北京小桔科技有限公司 | Cloud host building method and device, electronic equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1529426A (en) * | 2003-10-10 | 2004-09-15 | 清华大学 | SAN dual-node image schooling method and system based on FCP protocol |
CN101022363A (en) * | 2007-03-23 | 2007-08-22 | 杭州华为三康技术有限公司 | Network storage equipment fault protecting method and device |
CN103354503A (en) * | 2013-05-23 | 2013-10-16 | 浙江闪龙科技有限公司 | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof |
CN103685481A (en) * | 2013-11-29 | 2014-03-26 | 深圳市安云信息科技有限公司 | Cloud storage clustering system and cloud storage method |
CN104699566A (en) * | 2013-12-16 | 2015-06-10 | 杭州海康威视数字技术股份有限公司 | Data redundant backup method, data redundant backup system and storage node server based on cloud storage |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8145945B2 (en) * | 2010-01-04 | 2012-03-27 | Avaya Inc. | Packet mirroring between primary and secondary virtualized software images for improved system failover performance |
-
2015
- 2015-12-08 CN CN201510897964.5A patent/CN105490847B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1529426A (en) * | 2003-10-10 | 2004-09-15 | 清华大学 | SAN dual-node image schooling method and system based on FCP protocol |
CN101022363A (en) * | 2007-03-23 | 2007-08-22 | 杭州华为三康技术有限公司 | Network storage equipment fault protecting method and device |
CN103354503A (en) * | 2013-05-23 | 2013-10-16 | 浙江闪龙科技有限公司 | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof |
CN103685481A (en) * | 2013-11-29 | 2014-03-26 | 深圳市安云信息科技有限公司 | Cloud storage clustering system and cloud storage method |
CN104699566A (en) * | 2013-12-16 | 2015-06-10 | 杭州海康威视数字技术股份有限公司 | Data redundant backup method, data redundant backup system and storage node server based on cloud storage |
Also Published As
Publication number | Publication date |
---|---|
CN105490847A (en) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11323307B2 (en) | Method and system of a dynamic high-availability mode based on current wide area network connectivity | |
US7278055B2 (en) | System and method for virtual router failover in a network routing system | |
WO2018036148A1 (en) | Server cluster system | |
WO2020147331A1 (en) | Micro-service monitoring method and system | |
JP5102901B2 (en) | Method and system for maintaining data integrity between multiple data servers across a data center | |
CN109151045B (en) | Distributed cloud system and monitoring method | |
CN105490847B (en) | A kind of private cloud storage system interior joint failure real-time detection and processing method | |
CN106850260A (en) | A kind of dispositions method and device of virtual resources management platform | |
WO2017050254A1 (en) | Hot backup method, device and system | |
CN102710457B (en) | A kind of N+1 backup method of cross-network segment and device | |
CN103973424B (en) | Failure in caching system solves method and apparatus | |
CN111949444A (en) | Data backup and recovery system and method based on distributed service cluster | |
CN111176888B (en) | Disaster recovery method, device and system for cloud storage | |
WO2017107827A1 (en) | Method and apparatus for isolating environment | |
CN103905247B (en) | Two-unit standby method and system based on multi-client judgment | |
CN109286529A (en) | A kind of method and system for restoring RabbitMQ network partition | |
CN110677282B (en) | Hot backup method of distributed system and distributed system | |
CN105553783A (en) | Automated testing method for switching of configuration two-computer resources | |
CN107404394A (en) | A kind of IPTV system disaster recovery method and IPTV disaster tolerance systems | |
WO2021185169A1 (en) | Switching method and apparatus, and device and storage medium | |
CN104506372A (en) | Method and system for realizing host-backup server switching | |
TW201824826A (en) | Method and apparatus for realizing message mirror image of dynamic flow in cloud network environment | |
US10721135B1 (en) | Edge computing system for monitoring and maintaining data center operations | |
CN114301763A (en) | Distributed cluster fault processing method and system, electronic device and storage medium | |
CN110569303B (en) | MySQL application layer high-availability system and method suitable for various cloud environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190329 Termination date: 20191208 |
|
CF01 | Termination of patent right due to non-payment of annual fee |