CN105490847B - A kind of private cloud storage system interior joint failure real-time detection and processing method - Google Patents

A kind of private cloud storage system interior joint failure real-time detection and processing method Download PDF

Info

Publication number
CN105490847B
CN105490847B CN201510897964.5A CN201510897964A CN105490847B CN 105490847 B CN105490847 B CN 105490847B CN 201510897964 A CN201510897964 A CN 201510897964A CN 105490847 B CN105490847 B CN 105490847B
Authority
CN
China
Prior art keywords
memory node
data
detection
data service
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510897964.5A
Other languages
Chinese (zh)
Other versions
CN105490847A (en
Inventor
刘树发
温晋英
杨连群
王莹
宋津旭
王鹏
李翔宇
卢鑫刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Original Assignee
TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd filed Critical TIANJIN CITY CHUZHI TECHNOLOGY Co Ltd
Priority to CN201510897964.5A priority Critical patent/CN105490847B/en
Publication of CN105490847A publication Critical patent/CN105490847A/en
Application granted granted Critical
Publication of CN105490847B publication Critical patent/CN105490847B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention relates to a kind of private cloud storage system interior joint failure real-time detection and processing methods, memory node is connected by data sync network, and memory node is connect by data services network with cloud computing server, management end is set in memory node, check that each memory node checks oneself itself using working condition of the management end to all memory nodes.The present invention can effectively manage the various data services in private cloud storage system, and when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduce the cost of labor of user side.By restoring data service automatically, avoid the interruption of data service caused by each occurrence of equipment failure, thus reduce using the applied business of data service interrupt and caused by loss.

Description

A kind of private cloud storage system interior joint failure real-time detection and processing method
Technical field
The invention belongs to cloud storage system error correcting technique field, especially a kind of private cloud storage system interior joint failure is real When detection and processing method.
Background technique
Cloud storage is in new concept cloud computing conceptive extension and developed, and is that the emerging network of one kind is deposited Storage technology refers to through functions such as cluster application, network technology or distributed file systems, by various inhomogeneities a large amount of in network The storage equipment of type gathers collaborative work by application software, common externally to provide data storage and business access function One system, the core of the system is application software to be combined with storage equipment, realized by application software storage equipment to The transformation of storage service.Compared with conventional memory device, cloud storage system not exclusively to hardware, but a network equipment, The complication system of the multiple portions such as equipment, server, application software, public access interface composition is stored, each section is set to store Standby is core, externally provides data storage and business access service by application software.Such as: school, enterprise, government, letter The dependence of data is increasingly deepened at breath center, data center etc., and data have become numerous business activities and rely development Basis.
The some structures for providing corresponding storage service to limited users are referred to as private cloud storage system, are one Kind it is government department or the customized cloud storage service scheme of corporate client, top quality skintight suit can not only be provided for client Business, and security risk can also be reduced on certain procedures.But for data service failure and equipment fault, user hand is allowed Dynamic progress fault location and respective handling are unpractical, therefore for private cloud storage system, how to data service fault It is positioned and is handled with equipment fault, it is user-friendly, become a problem to be solved.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide real-time monitoring and accordingly take Different treatments A kind of private cloud storage system interior joint failure real-time detection and processing method.
The technical solution adopted by the present invention is that:
The advantages and positive effects of the present invention are:
In the present invention, memory node is connected by data sync network, and memory node is passed through into data service net Network is connect with cloud computing server, and management end is arranged in memory node, using management end to the work shape of all memory nodes State checked, each memory node check oneself the storage state of itself, data services network state, data sync network state, Data service state, independent these contents of IP state, thus by whole and part inspection, what is at one, simultaneously for each step The different conditions occurred in rapid are provided with processing method, can effectively manage the various data clothes in private cloud storage system Business, when server fail, the mode that automation restores data service effectively facilitates the operation of user, reduces use The cost of labor of family side.By restoring data service automatically, the interruption of data service caused by each occurrence of equipment failure is avoided, To reduce using data service applied business interrupt and caused by loss.
Detailed description of the invention
Fig. 1 is structural schematic diagram of the invention.
Specific embodiment
Below with reference to embodiment, the present invention is further described, following embodiments be it is illustrative, be not restrictive, It cannot be limited the scope of protection of the present invention with following embodiments.
A kind of private cloud storage system interior joint failure real-time detection and processing method, as shown in Figure 1, innovation of the invention It is: including that can provide the multiple memory nodes and multiple cloud computing servers of a variety of data services, between multiple memory nodes By the data exchange inside data sync network completion, multiple memory nodes, which pass through data services network, to be completed to take with cloud computing The data service of business device, is arranged a management end in memory node, and the method includes initialization procedure, management end detection and places Reason process and memory node detection and treatment process;
The initialization procedure the following steps are included:
(1) the storage configuration, network configuration and data service that management end pre-saves all memory nodes are prepared;
(2) storage preparation, network configuration and the data service that memory node only saves this node are prepared;
(3) any two memory node is selected for each data service mirror image and distribute separate tP address each other;
(4) the detection time of management end and memory node is set;
Management end detection and treatment process the following steps are included:
(1) the connection status of each memory node is examined successively automatically according to detection time;
(2) when some memory node is without response, which is set as unavailable, illustrate equipment delay machine at this or Network connection disconnects, the data clothes on currently stored node on the corresponding mirrored storage node of all data services of original configuration (4) business offer service, enters step;All data services of above-mentioned memory node can configure on another memory node, Mirror image each other;Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image;
When some memory node normal response, into next step;
The operating method whether responded is: direct PING memory node detects corresponding program on the memory node Whether normal operation;Above-mentioned corresponding program refers to that the detection being previously run in following memory node detection and treatment process is deposited Store up the program of state in equipment;
(3) the storage state of the memory node is obtained;Storage state at this refers to the feedback note of management end themselves capture Record, the source of these feedback records are sent when being different conditions in following memory node detection and treatment process to management end Feedback record;
When storage state exception, set unavailable for the memory node, the data being simultaneously stopped on the memory node It services, the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides clothes (4) business, enters step;All data services of above-mentioned memory node can configure on another memory node, each other mirror Picture;Multiple data services can also be respectively configured on other multiple memory nodes, each other mirror image;
(4) next memory node is continued to test, until completing the detection of all memory nodes;
(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system pipes Reason person, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, open Move the data service on the memory node;It is above-mentioned voluntarily to restore to restart the equipment in memory node, net when may is that machine delay machine The network equipments such as detection cable, network interface card or switch or router when network connection disconnects;
Memory node detection and treatment process the following steps are included:
(1) the storage state of this memory node is checked according to detection time;
(2) when the storage equipment of the memory node is without response, the information of the storage equipment is fed back to management end, this Detection is completed;
Above-mentioned storage equipment can be the equipment for storing data such as common hard disc, disk array;
When the storage equipment normal response of the memory node, enter step (3);
No response is divided into three kinds of situations:
(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, and (system is come with again The program of load can be attempted to reconnect storage equipment after operation), it feeds back when can not reload to management end;
(2) the plants: disk failures are directly fed back to management end;
(3) the plants: subregion is inconsistent, is directly fed back to management end;
Inconsistent subregion is two kinds of situations:
1. the plants: storage equipment has been deleted without response or subregion;
2. the plants: subregion is modified.
(3) the data services network for checking the memory node suspends on the memory node when data services network disconnects All data services, this detection are completed;
When the data services network of the memory node is normal, enter step (4);
Data services network is when disconnecting: preset several memory nodes are accessed by data services network, if It can not access, it is believed that disconnect.
(4) the data sync network for checking the memory node directly terminates this detection when data sync network disconnects And treatment process, any operation is not done, this detection is completed;
When data sync network is normal, enter step (5);
Data sync network is when disconnecting: preset several memory nodes are accessed by data sync network, if It can not access, it is believed that disconnect.
(5) the data service state for checking the memory node, when the memory node data service state is halted state (halted state include: 1. current data services be arranged to not use, the legacy data that data belong to discarding or data service just Often close;2. the (2) non-responsive state involved in step), it enters step (7);
When the memory node data service state is halted state, the data service state of mirror image each other is detected, mirror is worked as The data service state of picture has been turned on, and enters step (7), makes the memory node if the data service state of mirror image is inactive Data service state starting, enter step (6);
When the data service state has been turned on, enter step (6);
(6) the independent IP state of the memory node data service checked restores independent IP when independent IP loses, and enters Next step;
When independent IP is normal, into next step;
(7) go to step and (5) carry out the inspection of next data service state, until completing the data of all storage equipment The detection of service.
Embodiment 1
In a certain laboratory of school, number of servers is limited, and only two are used as memory node, directly hard using large capacity Disk installation is used as storage medium on the server.
For this situation, management end is installed therein on a memory node, is if it happens currently running data service The detection process of the case where memory node delay machine, another memory node normal operation, memory node part are such that
The memory node of delay machine has been unable to run, thus can not be detected.
Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected Service state, because of memory node delay machine, data service state is inactive, then will start the data service of this node, To guarantee normal offer data service.
If management end program operates in normal node, it can set unavailable for the memory node of delay machine, and lead to Know system manager.
Embodiment 2
In a certain data center, there is dedicated storage equipment, uses this storage equipment to be connected to server and be situated between as storage Matter.At this point, the detection process of memory node part is such that if there is the connecting fault of storage equipment and memory node
On the node of failure, storage equipment can be detected without response, then feedback arrives management end, and management end can stop All data services of this memory node.
Normal memory node can successively detect that storage equipment normal response, data services network is normal, data are synchronous Network is normal.When detecting data service state is halted state, the data of another memory node of mirror image each other are detected Service state then will start the data service of this node because data service has stopped, to guarantee that the normal data that provide take Business.

Claims (5)

1. a kind of private cloud storage system interior joint failure real-time detection and processing method, it is characterised in that: more including that can provide The multiple memory nodes and multiple cloud computing servers of data service are planted, it is complete to pass through data sync network between multiple memory nodes At internal data exchange, multiple memory nodes complete the data service with cloud computing server by data services network, One management end is set in memory node, and the method includes initialization procedure, management end detection and treatment process and memory nodes Detection and treatment process;
The initialization procedure the following steps are included:
(1) the storage configuration, network configuration and data service that management end pre-saves all memory nodes are prepared;
(2) storage preparation, network configuration and the data service that memory node only saves this node are prepared;
(3) any two memory node is selected for each data service mirror image and distribute separate tP address each other;
(4) the detection time of management end and memory node is set;
Management end detection and treatment process the following steps are included:
(1) the connection status of each memory node is examined successively automatically according to detection time;
(2) when some memory node is without response, which is set as unavailable, illustrates equipment delay machine or network at this Connection disconnects, and the data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration mentions For service, enter step (4);
All data services of the memory node configure on another memory node, each other mirror image;
When some memory node normal response, into next step;
(3) the storage state of the memory node is obtained;Storage state at this refers to the feedback record of management end themselves capture, this The feedback that the source of a little feedback records is sent when being different conditions in following memory node detection and treatment process to management end Record;
When storage state exception, set unavailable for the memory node, the data service being simultaneously stopped on the memory node, Data service on currently stored node on the corresponding mirrored storage node of all data services of original configuration provides service, into Enter step (4);All data services of above-mentioned memory node can configure on another memory node, each other mirror image;It can also Multiple data services are respectively configured on other multiple memory nodes, mirror image each other;
(4) next memory node is continued to test, until completing the detection of all memory nodes;
(5) after management end receives the not available information of memory node, meeting mail or other known mode notify system administration Member, system manager can attempt voluntarily to restore or contact technical staff's recovery, after memory node reverts to available mode, starting Data service on the memory node;It is above-mentioned voluntarily to restore to restart the equipment in memory node, network connection when being machine delay machine Cable, network interface card or switch or router are detected when disconnection;
Memory node detection and treatment process the following steps are included:
(1) the storage state of this memory node is checked according to detection time;
(2) when the storage equipment of the memory node is without response, by the information feedback of the storage equipment to management end, this detection It completes;Above-mentioned storage equipment is the common hard disc or disk array of equipment for storing data;
When the storage equipment normal response of the memory node, enter step (3);
(3) the data services network for checking the memory node suspends on the memory node and owns when data services network disconnects Data service, this detection are completed;
When the data services network of the memory node is normal, enter step (4);
(4) the data sync network for checking the memory node directly terminates this detection and place when data sync network disconnects Reason process does not do any operation, this detection is completed;
When data sync network is normal, enter step (5);
(5) the data service state for checking the memory node enters when the memory node data service state is halted state Step is (7);
The halted state includes: that 1. current data services are arranged to not use, data belong to the legacy data or number of discarding According to service normal switching-off;2. the (2) non-responsive state involved in step;
When the memory node data service state is halted state, the data service state of mirror image each other is detected, when mirror image Data service state has been turned on, and enters step (7), the number of the memory node is made if the data service state of mirror image is inactive Start according to service state, enters step (6);
When the data service state has been turned on, enter step (6);
(6) the independent IP state of the memory node data service checked restores independent IP, entrance is next when independent IP loses A step;
When independent IP is normal, into next step;
(7) go to step and (5) carry out the inspection of next data service state, until completing the data service of all storage equipment Detection.
2. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: the operating method whether responded of step (2) is in the management end detection and treatment process: the direct PING storage Node detects on the memory node corresponding program whether normal operation.
3. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: in memory node detection and treatment process step (2) be divided into three kinds of situations without response:
(1) the plants: scanning system, in the absence of checking disk volumn, storage is reloaded in trial, feedback when can not reload To management end;
(2) the plants: disk failures are directly fed back to management end;
(3) the plants: subregion is inconsistent, is directly fed back to management end;
Inconsistent subregion is two kinds of situations:
1. the plants: storage equipment has been deleted without response or subregion;
2. the plants: subregion is modified.
4. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: being when the data services network of step (3) disconnects in the memory node detection and treatment process: passing through data service Network accesses preset several memory nodes, if can not access, it is believed that disconnects.
5. a kind of private cloud storage system interior joint failure real-time detection according to claim 1 and processing method, special Sign is: being when the data sync network of step (4) disconnects in the memory node detection and treatment process: synchronous by data Network accesses preset several memory nodes, if can not access, it is believed that disconnects.
CN201510897964.5A 2015-12-08 2015-12-08 A kind of private cloud storage system interior joint failure real-time detection and processing method Expired - Fee Related CN105490847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510897964.5A CN105490847B (en) 2015-12-08 2015-12-08 A kind of private cloud storage system interior joint failure real-time detection and processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510897964.5A CN105490847B (en) 2015-12-08 2015-12-08 A kind of private cloud storage system interior joint failure real-time detection and processing method

Publications (2)

Publication Number Publication Date
CN105490847A CN105490847A (en) 2016-04-13
CN105490847B true CN105490847B (en) 2019-03-29

Family

ID=55677591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510897964.5A Expired - Fee Related CN105490847B (en) 2015-12-08 2015-12-08 A kind of private cloud storage system interior joint failure real-time detection and processing method

Country Status (1)

Country Link
CN (1) CN105490847B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331642B (en) * 2016-08-31 2020-05-26 浙江大华技术股份有限公司 Data processing method and device in video cloud system
CN108933798B (en) * 2017-05-23 2022-02-18 杭州海康威视数字技术股份有限公司 Data storage method, storage server and system
CN109361777B (en) * 2018-12-18 2021-08-10 广东浪潮大数据研究有限公司 Synchronization method, synchronization system and related device for distributed cluster node states
CN111866054A (en) * 2019-12-16 2020-10-30 北京小桔科技有限公司 Cloud host building method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529426A (en) * 2003-10-10 2004-09-15 清华大学 SAN dual-node image schooling method and system based on FCP protocol
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
CN103354503A (en) * 2013-05-23 2013-10-16 浙江闪龙科技有限公司 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof
CN103685481A (en) * 2013-11-29 2014-03-26 深圳市安云信息科技有限公司 Cloud storage clustering system and cloud storage method
CN104699566A (en) * 2013-12-16 2015-06-10 杭州海康威视数字技术股份有限公司 Data redundant backup method, data redundant backup system and storage node server based on cloud storage

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8145945B2 (en) * 2010-01-04 2012-03-27 Avaya Inc. Packet mirroring between primary and secondary virtualized software images for improved system failover performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529426A (en) * 2003-10-10 2004-09-15 清华大学 SAN dual-node image schooling method and system based on FCP protocol
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
CN103354503A (en) * 2013-05-23 2013-10-16 浙江闪龙科技有限公司 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof
CN103685481A (en) * 2013-11-29 2014-03-26 深圳市安云信息科技有限公司 Cloud storage clustering system and cloud storage method
CN104699566A (en) * 2013-12-16 2015-06-10 杭州海康威视数字技术股份有限公司 Data redundant backup method, data redundant backup system and storage node server based on cloud storage

Also Published As

Publication number Publication date
CN105490847A (en) 2016-04-13

Similar Documents

Publication Publication Date Title
US11323307B2 (en) Method and system of a dynamic high-availability mode based on current wide area network connectivity
US7278055B2 (en) System and method for virtual router failover in a network routing system
WO2018036148A1 (en) Server cluster system
WO2020147331A1 (en) Micro-service monitoring method and system
JP5102901B2 (en) Method and system for maintaining data integrity between multiple data servers across a data center
CN109151045B (en) Distributed cloud system and monitoring method
CN105490847B (en) A kind of private cloud storage system interior joint failure real-time detection and processing method
CN106850260A (en) A kind of dispositions method and device of virtual resources management platform
WO2017050254A1 (en) Hot backup method, device and system
CN102710457B (en) A kind of N+1 backup method of cross-network segment and device
CN103973424B (en) Failure in caching system solves method and apparatus
CN111949444A (en) Data backup and recovery system and method based on distributed service cluster
CN111176888B (en) Disaster recovery method, device and system for cloud storage
WO2017107827A1 (en) Method and apparatus for isolating environment
CN103905247B (en) Two-unit standby method and system based on multi-client judgment
CN109286529A (en) A kind of method and system for restoring RabbitMQ network partition
CN110677282B (en) Hot backup method of distributed system and distributed system
CN105553783A (en) Automated testing method for switching of configuration two-computer resources
CN107404394A (en) A kind of IPTV system disaster recovery method and IPTV disaster tolerance systems
WO2021185169A1 (en) Switching method and apparatus, and device and storage medium
CN104506372A (en) Method and system for realizing host-backup server switching
TW201824826A (en) Method and apparatus for realizing message mirror image of dynamic flow in cloud network environment
US10721135B1 (en) Edge computing system for monitoring and maintaining data center operations
CN114301763A (en) Distributed cluster fault processing method and system, electronic device and storage medium
CN110569303B (en) MySQL application layer high-availability system and method suitable for various cloud environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190329

Termination date: 20191208

CF01 Termination of patent right due to non-payment of annual fee