CN103354503A - Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof - Google Patents

Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof Download PDF

Info

Publication number
CN103354503A
CN103354503A CN2013101937604A CN201310193760A CN103354503A CN 103354503 A CN103354503 A CN 103354503A CN 2013101937604 A CN2013101937604 A CN 2013101937604A CN 201310193760 A CN201310193760 A CN 201310193760A CN 103354503 A CN103354503 A CN 103354503A
Authority
CN
China
Prior art keywords
server
standby
storage system
cloud storage
monitoring management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101937604A
Other languages
Chinese (zh)
Inventor
陈清华
杜国娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHEJIANG SANLOGIC TECHNOLOGY Co Ltd
Original Assignee
ZHEJIANG SANLOGIC TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG SANLOGIC TECHNOLOGY Co Ltd filed Critical ZHEJIANG SANLOGIC TECHNOLOGY Co Ltd
Priority to CN2013101937604A priority Critical patent/CN103354503A/en
Publication of CN103354503A publication Critical patent/CN103354503A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a cloud storage system capable of automatically detecting and replacing failure nodes and a method thereof, and aims to provide a cloud storage system with a self-repairing capability. The cloud storage system comprises a storage cluster, a monitoring and management server and a plurality of standby servers, wherein the storage cluster comprises a plurality of storage servers, the monitoring and management server is connected with all of the standby servers and all of the storage servers in the storage cluster, and the monitoring and management server is provided with an input/output interface which is communicated with the outside. Users store or read data in the storage servers through the input/output interface of the monitoring and management server. Meanwhile, the monitoring sever monitors health conditions of each of the storage servers, and if a certain storage server breaks down, the storage sever is replaced by using the standby server, thereby ensuring normal operations of the cloud storage system. The cloud storage system disclosed by the invention is applicable to all cloud storage architectures.

Description

A kind of cloud storage system and method thereof that can automatically detect and replace malfunctioning node
 
Technical field
The present invention relates to a kind of cloud storage system, especially relate to a kind of cloud storage system and method thereof that can automatically detect and replace malfunctioning node.
Background technology
The cloud storage is in the conceptive extension of cloud computing and development new ideas out, refer to by functions such as cluster application, grid or distributed file systems, a large amount of various dissimilar memory devices in the network are gathered collaborative work by application software, a system of data storage and Operational Visit function externally is provided jointly.The cloud memory technology is the direction of IT future development.
Because cloud storage system is in large scale, number of nodes is many, the situation of memory node fault inevitably can occur.
It is the patent documentation of CN101753617A that State Intellectual Property Office of the People's Republic of China discloses publication number on 06 23rd, 2010, title is a kind of cloud storage system and method, this system comprises overall scheduling layer and cloud accumulation layer, wherein: described overall scheduling layer, be used for according to the access request that receives, according to the resource of described access request, locate the position of the described cloud accumulation layer in described resource place; Described overall scheduling layer is comprised of one or more server; Described cloud accumulation layer is comprised of at least one cloud memory node.By using overall scheduling layer and cloud accumulation layer, so that can either utilize the advantage of the conventional store framework that the overall scheduling layer has, the extensibility that also can utilize the cloud accumulation layer to have simultaneously is strong, the advantage that cost is low.But certain node (server) in the cloud accumulation layer is difficult to effectively process when breaking down, and can affect follow-up use, even causes irremediable loss.
Summary of the invention
The present invention mainly be solve prior art existing be difficult to the node that breaks down process, to the technical problem that follow-up use can exert an influence, a kind of cloud storage system and the method thereof that can automatically detect and replace malfunctioning node that can replace malfunctioning node, ensure the normal operation of cloud storage system is provided.
The present invention is directed to above-mentioned technical problem is mainly solved by following technical proposals: a kind of cloud storage system that can automatically detect and replace malfunctioning node, comprise storage cluster and monitoring management server, also comprise several standby servers, described storage cluster comprises several storage servers, described monitoring management server connects respectively all storage servers in all standby servers and the storage cluster, and described monitoring management server is provided with the input/output interface with PERCOM peripheral communication.
Each storage server is a memory node.The user deposits in or reading out data in storage server by the input/output interface of monitoring management server.Monitoring server is monitored the health status of each storage server simultaneously, if certain storage server breaks down, then uses standby server to replace, and ensures the normal operation of cloud storage system.
As preferably, cloud storage system also comprises Alarm Server, and described Alarm Server is connected with described monitoring management server.After certain storage server broke down, the monitoring management server was reported to the police by Alarm Server, notified administrative staff that failed server is keeped in repair.
As preferably, described Alarm Server comprises wireless communication unit.Alarm Server can break away from the constraint of cable by wireless communication unit, realizes long-distance alarm.Wireless communication unit can support mobile communications network or/and WLAN (wireless local area network).
As preferably, described each standby server comprises a power control module, and described power control module is connected with described monitoring management server.Power control module can be controlled standby server and be in sleep state or wake-up states.When standby server is not activated replacement, be in sleep state, only have minimum electric current to pass through, power consumption is little, and is energy-conservation; When certain standby server replacement failed server of needs was carried out work, power control module woke this standby server up, provides normal operation required electric current, guarantees to store working properly carrying out.
As preferably, cloud storage system also comprises caching server, and described caching server is connected with described monitoring management server.When the user deposited file in to cloud storage system, file was temporary in caching server first, waits in the storage server that re-sends to appointment after finishing receiving and stores.Just in case in storing process, be transfused to the data storage server and break down, then whole file intactly can be deposited in the standby server that is replaced again, reduce the risk of File lose or partial loss like this.
As preferably, cloud storage system also comprises fire compartment wall, and described fire compartment wall is serially connected on the input/output interface of monitoring management server.Fire compartment wall prevents that cloud storage system is subject to external attack.
A kind of cloud storage system detects and replaces the method for malfunctioning node automatically, may further comprise the steps: step 1, monitoring management server detect the state of each storage server, when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
As preferably, failed server replaced with standby server after, report to the police by Alarm Server.
As preferably, when cloud storage system starts for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, and all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state.
The substantial effect that the present invention brings is, can in time replace the server that breaks down, and guarantees that cloud storage system normally moves; Failure condition in time can be circulated a notice of to administrative staff; Can reduce the risk of File lose or partial loss.
Description of drawings
Fig. 1 is the structural representation of a kind of cloud storage system of the present invention;
Fig. 2 is a kind of method flow diagram that detects and replace failed server of the present invention;
Among the figure: 1, monitoring management server, 2, storage server, 3, standby server, 4, Alarm Server, 5, fire compartment wall, 6, caching server.
Embodiment
Below by embodiment, and by reference to the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment: a kind of cloud storage system that can automatically detect and replace malfunctioning node of present embodiment as shown in Figure 1, comprises storage cluster, monitoring management server 1, Alarm Server 4, fire compartment wall 5, caching server 6 and two standby servers 3.Storage cluster comprises several storage servers 2.Monitoring management server 1 connects respectively all storage server 2 and all standby servers 3.Monitoring management server 1 also connects respectively Alarm Server 4 and caching server 6.Fire compartment wall 5 is serially connected on the input/output interface of monitoring management server 1.The external data of all turnover cloud storage systems all will first through the filtration of fire compartment wall 5, prevent that external attack from destroying cloud storage system.
Storage server 2, standby server 3 and caching server 6 are referred to as the storage class server.Monitoring management server 1 and the transmission that being connected of each storage server 2, each standby server 3 and caching server 6 comprises data-signal and the transmission of control signal.Data-signal is the file data that deposits in or the file data that reads from cloud storage system in cloud storage system; Control signal is the signal of each server operation of control and the status signal of each storage class server feedback.Each storage class server feedback comprises heartbeat signal to the signal of monitoring management server 1, and monitoring management server 1 can obtain the health status of each storage class server from heartbeat signal.
Each standby server 3 comprises a power control module, and power control module is connected with monitoring management server 1.Power control module can be controlled standby server 3 and be in sleep state or wake-up states.When standby server 3 is not activated replacement, be in sleep state, only have minimum electric current to pass through, power consumption is little, and is energy-conservation; When certain standby server 3 replacement failed server of needs were carried out work, power control module woke this standby server 3 up, provides normal operation required electric current, guarantees to store working properly carrying out.
When the user deposited file in to cloud storage system, file was temporary in caching server 6 first, waits in the storage server 2 that re-sends to appointment after finishing receiving and stores.Just in case in storing process, be transfused to data storage server 2 and break down, then file intactly can be deposited in the standby server 3 that is replaced again, reduce the risk of File lose or partial loss like this.
A kind of cloud storage system detects and replace the method for malfunctioning node automatically, and is specific as follows:
The automatic replacement module is divided into monitoring server end and data server (comprising the preliminary data server) end two parts:
The data server end:
The effect of the module of data server end has: periodic test data server running status; Periodically send heartbeat message to monitoring server; Send role's task that book server is served as to monitoring server.
The data server running status comprises the system CPU temperature detection, the disk array state-detection, and hard disk S.M.A.R.T information detects, the key messages such as network condition detection.
System temperature and cpu temperature obtain by the transducer that carries on the mainboard, and temperature surpasses the threshold value of setting, and will send abnormal information to monitoring server, makes corresponding processing mode by monitoring server.
System disk S.M.A.R.T information can according to the frequency that sets, detect the hard disk in the system.Can judge the health status of hard disk by S.M.A.R.T information.Notify the keeper to change hard disk during very low at the hard disk health degree, that damage is arranged risk.
Array status detects, and each back end is set up disk array with the hard disk in the system with raid5 or raid6 pattern, and the Redundant backup dish is set in array.Under this pattern, under this pattern, in the situation of the disk failures in the array 1 (raid6 can damage 2), array still can work; And system uses HotSpare disk and replaces the hard disk that has damaged, and by notifying the keeper to change the hard disk of damage to management node transmission information.System can be added to new hard disk the HotSpare disk of array automatically.After HotSpare disk replace to damage hard disk, array will enter degraded mode, and return to by the data that algorithm will be replaced dish and to replace the disk of coming in.In this case, can advise that the keeper reduces the load of this node, has accelerated reparation speed.Reduce the risk that array damages.If other disk failures occur in this process, array will quit work fully.Node will be judged as fault, and monitor node comes the normal operation of the whole storage cluster in position with the starter node replacement operation.
The role that node server is served as in storage cluster need to be saved in the monitoring server.In case node breaks down, monitoring server replaces malfunctioning node to continue to bear corresponding role's task secondary node according to these Role Informations.Role Information comprises the teaming method of disk array in the node and serve as brick role in which logical volume.The change of these information occurs in the back end, will be synchronized in the monitor node immediately and preserve.
Back end when network failure occurs, can't be communicated by letter with monitor node the reporting system state.Node server will carry out alarm by modes such as indicator light flickers.If the monitor node overstepping the time limit can't obtain this back end server heartbeat message, will think that this node breaks down, starter node is replaced program.
The monitoring server end:
The monitoring client server is accepted the heartbeat message that the back end server sends, and the time of record heartbeat message.Heartbeat message sent once in per 2 minutes.The information such as running status that comprise the back end server in the heartbeat message.The heartbeat message unification of each node is kept in the status file of monitoring server.
Monitoring server is preserved the Role Information of Servers-all in the cluster, is respectively: normal operation, standby for subsequent use, fault, four kinds of states of role are not set.In the storage cluster of newly building, all back end all can send heartbeat message to monitor node.The role that serves as that keeper's need are registered each node server according to plan of distribution: workspace server, standby server.The server that is set to standby for subsequent use will enter holding state.Workspace server creates corresponding disk array with the node disk as required, and configuration forms the cluster stores logical volumes.The array information of each back end and also will be saved in the monitoring server synchronously in the assignment information of cluster logical storage volumes simultaneously.
The heartbeat message that monitoring server System reliability node sends according to the condition information of each back end of reporting in the heartbeat message and the alert level of setting, is reported the state information mail of cluster to the keeper.
If detect the back end fault, can't work, monitor node is replaced program with starter node, and as shown in Figure 2, the replacement program step is as follows:
Step 1, monitoring management server detect the state of each storage server, and when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
Monitoring server will reconfigure the array of replacing server according to the disk array information that is replaced node, be configured as consistent with the malfunctioning node array pattern.And the role who serves as in the cluster logical volume before according to malfunctioning node replaces.Cluster will be rebuild the data that are replaced in the node according to algorithm, preserve situation to recover fault data before.At the same time, monitoring server will be notified the keeper, in order in time repair malfunctioning node, and add new standby server to guarantee the normal operation of fault automatic replacement mechanism.
In the running of cloud storage system, the state-detection of storage server is continued always, can replace after guaranteeing to break down at once, will lose and impact is reduced to minimum.
When system detects a plurality of storage servers and makes mistakes, at first the storage server number of fault can be checked by system, then system can will wake the standby server of respective numbers up and be its load store node procedure, make its role with storage server add system to substitute the storage server of former fault, and then the state-detection of execution storage server, confirm its health.
Basic step is as follows:
1. system detects the storage server fault;
2. the failed storage number of servers is checked by system;
3. the standby server of system wake-up respective numbers and load store node procedure;
4. replace the storage server of fault;
5. detection of stored server state again;
6. determine that its healthy rear system continues operation.
After failed server replaced with standby server, report to the police by Alarm Server.Alarm Server comprises wireless communication unit.Alarm Server can break away from the constraint of cable by wireless communication unit, realizes long-distance alarm.Wireless communication unit can support mobile communications network or/and WLAN (wireless local area network).
When cloud storage system started for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, and all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state, and report to the police by Alarm Server.
Specific embodiment described herein only is to the explanation for example of the present invention's spirit.Those skilled in the art can make various modifications or replenish or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.
Although this paper has more used the terms such as cloud storage, standby server, monitoring management server, do not get rid of the possibility of using other term.Using these terms only is in order to describe more easily and explain essence of the present invention; They are construed to any additional restriction all is contrary with spirit of the present invention.

Claims (9)

1. cloud storage system that can automatically detect and replace malfunctioning node, comprise storage cluster and monitoring management server, it is characterized in that, also comprise several standby servers, described storage cluster comprises several storage servers, described monitoring management server connects respectively all storage servers in all standby servers and the storage cluster, and described monitoring management server is provided with the input/output interface with PERCOM peripheral communication.
2. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 is characterized in that, also comprises Alarm Server, and described Alarm Server is connected with described monitoring management server.
3. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 2 is characterized in that described Alarm Server comprises wireless communication unit.
4. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 and 2, it is characterized in that, described each standby server comprises a power control module, and described power control module is connected with described monitoring management server.
5. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 3 is characterized in that, also comprises caching server, and described caching server is connected with described monitoring management server.
6. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 is characterized in that also comprise fire compartment wall, described fire compartment wall is serially connected on the input/output interface of monitoring management server.
7. a cloud storage system detects and replaces the method for malfunctioning node automatically, it is characterized in that may further comprise the steps: step 1, monitoring management server detect the state of each storage server, when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
8. a kind of cloud storage system according to claim 7 automatically detects and replaces the method for malfunctioning node, it is characterized in that, failed server is replaced with standby server after, report to the police by Alarm Server.
9. a kind of cloud storage system according to claim 7 detects and replaces the method for malfunctioning node automatically, it is characterized in that, when cloud storage system starts for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state.
CN2013101937604A 2013-05-23 2013-05-23 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof Pending CN103354503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101937604A CN103354503A (en) 2013-05-23 2013-05-23 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101937604A CN103354503A (en) 2013-05-23 2013-05-23 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof

Publications (1)

Publication Number Publication Date
CN103354503A true CN103354503A (en) 2013-10-16

Family

ID=49310819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101937604A Pending CN103354503A (en) 2013-05-23 2013-05-23 Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof

Country Status (1)

Country Link
CN (1) CN103354503A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104639358A (en) * 2013-11-13 2015-05-20 中国石油化工股份有限公司 Batched network port switching method and system
CN105187482A (en) * 2015-07-20 2015-12-23 深圳供电局有限公司 PaaS platform fault self-healing realization method and message server
CN105490847A (en) * 2015-12-08 2016-04-13 天津市初志科技有限公司 Real-time detecting and processing method of node failure in private cloud storage system
CN105808391A (en) * 2016-04-05 2016-07-27 浪潮电子信息产业股份有限公司 Method and device for hot replacing CPU nodes
CN106331642A (en) * 2016-08-31 2017-01-11 浙江大华技术股份有限公司 Method and device for processing data in video cloud system
CN106407081A (en) * 2016-09-30 2017-02-15 郑州云海信息技术有限公司 Chassis management system and server
CN106789257A (en) * 2016-12-23 2017-05-31 航天星图科技(北京)有限公司 A kind of cloud system server state visual management method
CN108156040A (en) * 2018-01-30 2018-06-12 北京交通大学 A kind of central control node in distribution cloud storage system
CN108319523A (en) * 2017-12-13 2018-07-24 创新科存储技术(深圳)有限公司 A kind of adding method of storage HotSpare disk
CN108369503A (en) * 2015-12-15 2018-08-03 微软技术许可有限责任公司 Automatic system response to external field replaceable units (FRU) process
CN108509143A (en) * 2017-02-23 2018-09-07 杭州海康威视数字技术股份有限公司 A kind of data detection method and device based on cloud storage
CN109361560A (en) * 2018-01-24 2019-02-19 广州Tcl智能家居科技有限公司 A kind of clustered node Communication processing method, system, storage medium and server
CN109460194A (en) * 2018-11-16 2019-03-12 郑州云海信息技术有限公司 A kind of storage array monitoring system and method
CN110445811A (en) * 2019-09-16 2019-11-12 秒针信息技术有限公司 For the data management system of non-cloud storage, method, server and storage medium
CN111045845A (en) * 2019-11-29 2020-04-21 苏州浪潮智能科技有限公司 Data returning method, device, equipment and computer readable storage medium
CN111176888A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Cloud storage disaster recovery method, device and system
CN111352773A (en) * 2020-02-28 2020-06-30 佛山科学技术学院 Cloud computing server monitoring control method and system based on big data
CN111787284A (en) * 2020-07-16 2020-10-16 济南浪潮数据技术有限公司 Data acquisition system
CN111897697A (en) * 2020-08-11 2020-11-06 腾讯科技(深圳)有限公司 Server hardware fault repairing method and device
CN112256498A (en) * 2020-11-17 2021-01-22 珠海大横琴科技发展有限公司 Fault processing method and device
CN112945304A (en) * 2021-02-05 2021-06-11 广东海洋大学 Aquaculture sea area environmental information acquisition system
CN114189429A (en) * 2021-11-25 2022-03-15 山东云海国创云计算装备产业创新中心有限公司 System, method, device and medium for monitoring server cluster faults

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
US8145945B2 (en) * 2010-01-04 2012-03-27 Avaya Inc. Packet mirroring between primary and secondary virtualized software images for improved system failover performance
CN102710766A (en) * 2012-05-30 2012-10-03 浪潮电子信息产业股份有限公司 Real-time access load evaluation-based cluster storage interface node configuration method
CN103067740A (en) * 2012-12-31 2013-04-24 浙江元亨通信技术股份有限公司 Trouble intelligent detecting method for video surveillance device and detecting system thereof
CN203289491U (en) * 2013-05-23 2013-11-13 浙江闪龙科技有限公司 Cluster storage system capable of automatically repairing fault node

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022363A (en) * 2007-03-23 2007-08-22 杭州华为三康技术有限公司 Network storage equipment fault protecting method and device
US8145945B2 (en) * 2010-01-04 2012-03-27 Avaya Inc. Packet mirroring between primary and secondary virtualized software images for improved system failover performance
CN102710766A (en) * 2012-05-30 2012-10-03 浪潮电子信息产业股份有限公司 Real-time access load evaluation-based cluster storage interface node configuration method
CN103067740A (en) * 2012-12-31 2013-04-24 浙江元亨通信技术股份有限公司 Trouble intelligent detecting method for video surveillance device and detecting system thereof
CN203289491U (en) * 2013-05-23 2013-11-13 浙江闪龙科技有限公司 Cluster storage system capable of automatically repairing fault node

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104639358A (en) * 2013-11-13 2015-05-20 中国石油化工股份有限公司 Batched network port switching method and system
CN104639358B (en) * 2013-11-13 2018-03-09 中国石油化工股份有限公司 batch network port switching method and switching system
CN105187482A (en) * 2015-07-20 2015-12-23 深圳供电局有限公司 PaaS platform fault self-healing realization method and message server
CN105187482B (en) * 2015-07-20 2018-09-28 深圳供电局有限公司 PaaS platform fault self-healing realization method and message server
CN105490847A (en) * 2015-12-08 2016-04-13 天津市初志科技有限公司 Real-time detecting and processing method of node failure in private cloud storage system
CN105490847B (en) * 2015-12-08 2019-03-29 天津市初志科技有限公司 A kind of private cloud storage system interior joint failure real-time detection and processing method
CN108369503A (en) * 2015-12-15 2018-08-03 微软技术许可有限责任公司 Automatic system response to external field replaceable units (FRU) process
CN105808391A (en) * 2016-04-05 2016-07-27 浪潮电子信息产业股份有限公司 Method and device for hot replacing CPU nodes
CN106331642A (en) * 2016-08-31 2017-01-11 浙江大华技术股份有限公司 Method and device for processing data in video cloud system
CN106331642B (en) * 2016-08-31 2020-05-26 浙江大华技术股份有限公司 Data processing method and device in video cloud system
CN106407081B (en) * 2016-09-30 2020-05-26 苏州浪潮智能科技有限公司 Case management system and server
CN106407081A (en) * 2016-09-30 2017-02-15 郑州云海信息技术有限公司 Chassis management system and server
CN106789257B (en) * 2016-12-23 2019-03-05 中科星图股份有限公司 A kind of cloud system server state visual management method
CN106789257A (en) * 2016-12-23 2017-05-31 航天星图科技(北京)有限公司 A kind of cloud system server state visual management method
CN108509143A (en) * 2017-02-23 2018-09-07 杭州海康威视数字技术股份有限公司 A kind of data detection method and device based on cloud storage
CN108509143B (en) * 2017-02-23 2020-11-06 杭州海康威视数字技术股份有限公司 Data detection method and device based on cloud storage
CN108319523A (en) * 2017-12-13 2018-07-24 创新科存储技术(深圳)有限公司 A kind of adding method of storage HotSpare disk
CN109361560A (en) * 2018-01-24 2019-02-19 广州Tcl智能家居科技有限公司 A kind of clustered node Communication processing method, system, storage medium and server
CN108156040A (en) * 2018-01-30 2018-06-12 北京交通大学 A kind of central control node in distribution cloud storage system
CN111176888B (en) * 2018-11-13 2023-09-15 浙江宇视科技有限公司 Disaster recovery method, device and system for cloud storage
CN111176888A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Cloud storage disaster recovery method, device and system
CN109460194A (en) * 2018-11-16 2019-03-12 郑州云海信息技术有限公司 A kind of storage array monitoring system and method
CN110445811A (en) * 2019-09-16 2019-11-12 秒针信息技术有限公司 For the data management system of non-cloud storage, method, server and storage medium
CN111045845A (en) * 2019-11-29 2020-04-21 苏州浪潮智能科技有限公司 Data returning method, device, equipment and computer readable storage medium
CN111352773A (en) * 2020-02-28 2020-06-30 佛山科学技术学院 Cloud computing server monitoring control method and system based on big data
CN111787284A (en) * 2020-07-16 2020-10-16 济南浪潮数据技术有限公司 Data acquisition system
CN111897697A (en) * 2020-08-11 2020-11-06 腾讯科技(深圳)有限公司 Server hardware fault repairing method and device
CN112256498A (en) * 2020-11-17 2021-01-22 珠海大横琴科技发展有限公司 Fault processing method and device
CN112945304A (en) * 2021-02-05 2021-06-11 广东海洋大学 Aquaculture sea area environmental information acquisition system
CN114189429A (en) * 2021-11-25 2022-03-15 山东云海国创云计算装备产业创新中心有限公司 System, method, device and medium for monitoring server cluster faults

Similar Documents

Publication Publication Date Title
CN103354503A (en) Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof
US5875290A (en) Method and program product for synchronizing operator initiated commands with a failover process in a distributed processing system
EP2128766B1 (en) Electronic apparatus system having a plurality of rack-mounted electronic apparatuses, and a method for identifying electronic apparatus in electronic apparatus system
US6012150A (en) Apparatus for synchronizing operator initiated commands with a failover process in a distributed processing system
CN106802854B (en) Fault monitoring system of multi-controller system
CN102360324B (en) Failure recovery method and equipment for failure recovery
CN105302661A (en) System and method for implementing virtualization management platform high availability
CN107729185B (en) Fault processing method and device
US20120036387A1 (en) Storage system, control apparatus, and control method
CN107508694B (en) Node management method and node equipment in cluster
CN203289491U (en) Cluster storage system capable of automatically repairing fault node
CN112181660A (en) High-availability method based on server cluster
CN107480014A (en) A kind of High Availabitity equipment switching method and device
WO2013145325A1 (en) Information processing system, problem detection method and information processing device
CN111104283B (en) Fault detection method, device, equipment and medium of distributed storage system
CN112601216B (en) Zigbee-based trusted platform alarm method and system
CN104679623A (en) Server hard disk maintaining method, system and server monitoring equipment
US7428655B2 (en) Smart card for high-availability clustering
CN107071189A (en) A kind of connection method of communication apparatus physical interface
CN108833189A (en) A kind of memory node management system and method
JP2010244463A (en) Event detection control method and system
CN106027661A (en) Data cluster storage terminal
CN111309515B (en) Disaster recovery control method, device and system
US20110187404A1 (en) Method of detecting failure and monitoring apparatus
CN104158843A (en) Storage unit invalidation detecting method and device for distributed file storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20131016