CN103354503A - Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof - Google Patents
Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof Download PDFInfo
- Publication number
- CN103354503A CN103354503A CN2013101937604A CN201310193760A CN103354503A CN 103354503 A CN103354503 A CN 103354503A CN 2013101937604 A CN2013101937604 A CN 2013101937604A CN 201310193760 A CN201310193760 A CN 201310193760A CN 103354503 A CN103354503 A CN 103354503A
- Authority
- CN
- China
- Prior art keywords
- server
- standby
- storage system
- cloud storage
- monitoring management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a cloud storage system capable of automatically detecting and replacing failure nodes and a method thereof, and aims to provide a cloud storage system with a self-repairing capability. The cloud storage system comprises a storage cluster, a monitoring and management server and a plurality of standby servers, wherein the storage cluster comprises a plurality of storage servers, the monitoring and management server is connected with all of the standby servers and all of the storage servers in the storage cluster, and the monitoring and management server is provided with an input/output interface which is communicated with the outside. Users store or read data in the storage servers through the input/output interface of the monitoring and management server. Meanwhile, the monitoring sever monitors health conditions of each of the storage servers, and if a certain storage server breaks down, the storage sever is replaced by using the standby server, thereby ensuring normal operations of the cloud storage system. The cloud storage system disclosed by the invention is applicable to all cloud storage architectures.
Description
Technical field
The present invention relates to a kind of cloud storage system, especially relate to a kind of cloud storage system and method thereof that can automatically detect and replace malfunctioning node.
Background technology
The cloud storage is in the conceptive extension of cloud computing and development new ideas out, refer to by functions such as cluster application, grid or distributed file systems, a large amount of various dissimilar memory devices in the network are gathered collaborative work by application software, a system of data storage and Operational Visit function externally is provided jointly.The cloud memory technology is the direction of IT future development.
Because cloud storage system is in large scale, number of nodes is many, the situation of memory node fault inevitably can occur.
It is the patent documentation of CN101753617A that State Intellectual Property Office of the People's Republic of China discloses publication number on 06 23rd, 2010, title is a kind of cloud storage system and method, this system comprises overall scheduling layer and cloud accumulation layer, wherein: described overall scheduling layer, be used for according to the access request that receives, according to the resource of described access request, locate the position of the described cloud accumulation layer in described resource place; Described overall scheduling layer is comprised of one or more server; Described cloud accumulation layer is comprised of at least one cloud memory node.By using overall scheduling layer and cloud accumulation layer, so that can either utilize the advantage of the conventional store framework that the overall scheduling layer has, the extensibility that also can utilize the cloud accumulation layer to have simultaneously is strong, the advantage that cost is low.But certain node (server) in the cloud accumulation layer is difficult to effectively process when breaking down, and can affect follow-up use, even causes irremediable loss.
Summary of the invention
The present invention mainly be solve prior art existing be difficult to the node that breaks down process, to the technical problem that follow-up use can exert an influence, a kind of cloud storage system and the method thereof that can automatically detect and replace malfunctioning node that can replace malfunctioning node, ensure the normal operation of cloud storage system is provided.
The present invention is directed to above-mentioned technical problem is mainly solved by following technical proposals: a kind of cloud storage system that can automatically detect and replace malfunctioning node, comprise storage cluster and monitoring management server, also comprise several standby servers, described storage cluster comprises several storage servers, described monitoring management server connects respectively all storage servers in all standby servers and the storage cluster, and described monitoring management server is provided with the input/output interface with PERCOM peripheral communication.
Each storage server is a memory node.The user deposits in or reading out data in storage server by the input/output interface of monitoring management server.Monitoring server is monitored the health status of each storage server simultaneously, if certain storage server breaks down, then uses standby server to replace, and ensures the normal operation of cloud storage system.
As preferably, cloud storage system also comprises Alarm Server, and described Alarm Server is connected with described monitoring management server.After certain storage server broke down, the monitoring management server was reported to the police by Alarm Server, notified administrative staff that failed server is keeped in repair.
As preferably, described Alarm Server comprises wireless communication unit.Alarm Server can break away from the constraint of cable by wireless communication unit, realizes long-distance alarm.Wireless communication unit can support mobile communications network or/and WLAN (wireless local area network).
As preferably, described each standby server comprises a power control module, and described power control module is connected with described monitoring management server.Power control module can be controlled standby server and be in sleep state or wake-up states.When standby server is not activated replacement, be in sleep state, only have minimum electric current to pass through, power consumption is little, and is energy-conservation; When certain standby server replacement failed server of needs was carried out work, power control module woke this standby server up, provides normal operation required electric current, guarantees to store working properly carrying out.
As preferably, cloud storage system also comprises caching server, and described caching server is connected with described monitoring management server.When the user deposited file in to cloud storage system, file was temporary in caching server first, waits in the storage server that re-sends to appointment after finishing receiving and stores.Just in case in storing process, be transfused to the data storage server and break down, then whole file intactly can be deposited in the standby server that is replaced again, reduce the risk of File lose or partial loss like this.
As preferably, cloud storage system also comprises fire compartment wall, and described fire compartment wall is serially connected on the input/output interface of monitoring management server.Fire compartment wall prevents that cloud storage system is subject to external attack.
A kind of cloud storage system detects and replaces the method for malfunctioning node automatically, may further comprise the steps: step 1, monitoring management server detect the state of each storage server, when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
As preferably, failed server replaced with standby server after, report to the police by Alarm Server.
As preferably, when cloud storage system starts for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, and all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state.
The substantial effect that the present invention brings is, can in time replace the server that breaks down, and guarantees that cloud storage system normally moves; Failure condition in time can be circulated a notice of to administrative staff; Can reduce the risk of File lose or partial loss.
Description of drawings
Fig. 1 is the structural representation of a kind of cloud storage system of the present invention;
Fig. 2 is a kind of method flow diagram that detects and replace failed server of the present invention;
Among the figure: 1, monitoring management server, 2, storage server, 3, standby server, 4, Alarm Server, 5, fire compartment wall, 6, caching server.
Embodiment
Below by embodiment, and by reference to the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment: a kind of cloud storage system that can automatically detect and replace malfunctioning node of present embodiment as shown in Figure 1, comprises storage cluster, monitoring management server 1, Alarm Server 4, fire compartment wall 5, caching server 6 and two standby servers 3.Storage cluster comprises several storage servers 2.Monitoring management server 1 connects respectively all storage server 2 and all standby servers 3.Monitoring management server 1 also connects respectively Alarm Server 4 and caching server 6.Fire compartment wall 5 is serially connected on the input/output interface of monitoring management server 1.The external data of all turnover cloud storage systems all will first through the filtration of fire compartment wall 5, prevent that external attack from destroying cloud storage system.
Storage server 2, standby server 3 and caching server 6 are referred to as the storage class server.Monitoring management server 1 and the transmission that being connected of each storage server 2, each standby server 3 and caching server 6 comprises data-signal and the transmission of control signal.Data-signal is the file data that deposits in or the file data that reads from cloud storage system in cloud storage system; Control signal is the signal of each server operation of control and the status signal of each storage class server feedback.Each storage class server feedback comprises heartbeat signal to the signal of monitoring management server 1, and monitoring management server 1 can obtain the health status of each storage class server from heartbeat signal.
Each standby server 3 comprises a power control module, and power control module is connected with monitoring management server 1.Power control module can be controlled standby server 3 and be in sleep state or wake-up states.When standby server 3 is not activated replacement, be in sleep state, only have minimum electric current to pass through, power consumption is little, and is energy-conservation; When certain standby server 3 replacement failed server of needs were carried out work, power control module woke this standby server 3 up, provides normal operation required electric current, guarantees to store working properly carrying out.
When the user deposited file in to cloud storage system, file was temporary in caching server 6 first, waits in the storage server 2 that re-sends to appointment after finishing receiving and stores.Just in case in storing process, be transfused to data storage server 2 and break down, then file intactly can be deposited in the standby server 3 that is replaced again, reduce the risk of File lose or partial loss like this.
A kind of cloud storage system detects and replace the method for malfunctioning node automatically, and is specific as follows:
The automatic replacement module is divided into monitoring server end and data server (comprising the preliminary data server) end two parts:
The data server end:
The effect of the module of data server end has: periodic test data server running status; Periodically send heartbeat message to monitoring server; Send role's task that book server is served as to monitoring server.
The data server running status comprises the system CPU temperature detection, the disk array state-detection, and hard disk S.M.A.R.T information detects, the key messages such as network condition detection.
System temperature and cpu temperature obtain by the transducer that carries on the mainboard, and temperature surpasses the threshold value of setting, and will send abnormal information to monitoring server, makes corresponding processing mode by monitoring server.
System disk S.M.A.R.T information can according to the frequency that sets, detect the hard disk in the system.Can judge the health status of hard disk by S.M.A.R.T information.Notify the keeper to change hard disk during very low at the hard disk health degree, that damage is arranged risk.
Array status detects, and each back end is set up disk array with the hard disk in the system with raid5 or raid6 pattern, and the Redundant backup dish is set in array.Under this pattern, under this pattern, in the situation of the disk failures in the array 1 (raid6 can damage 2), array still can work; And system uses HotSpare disk and replaces the hard disk that has damaged, and by notifying the keeper to change the hard disk of damage to management node transmission information.System can be added to new hard disk the HotSpare disk of array automatically.After HotSpare disk replace to damage hard disk, array will enter degraded mode, and return to by the data that algorithm will be replaced dish and to replace the disk of coming in.In this case, can advise that the keeper reduces the load of this node, has accelerated reparation speed.Reduce the risk that array damages.If other disk failures occur in this process, array will quit work fully.Node will be judged as fault, and monitor node comes the normal operation of the whole storage cluster in position with the starter node replacement operation.
The role that node server is served as in storage cluster need to be saved in the monitoring server.In case node breaks down, monitoring server replaces malfunctioning node to continue to bear corresponding role's task secondary node according to these Role Informations.Role Information comprises the teaming method of disk array in the node and serve as brick role in which logical volume.The change of these information occurs in the back end, will be synchronized in the monitor node immediately and preserve.
Back end when network failure occurs, can't be communicated by letter with monitor node the reporting system state.Node server will carry out alarm by modes such as indicator light flickers.If the monitor node overstepping the time limit can't obtain this back end server heartbeat message, will think that this node breaks down, starter node is replaced program.
The monitoring server end:
The monitoring client server is accepted the heartbeat message that the back end server sends, and the time of record heartbeat message.Heartbeat message sent once in per 2 minutes.The information such as running status that comprise the back end server in the heartbeat message.The heartbeat message unification of each node is kept in the status file of monitoring server.
Monitoring server is preserved the Role Information of Servers-all in the cluster, is respectively: normal operation, standby for subsequent use, fault, four kinds of states of role are not set.In the storage cluster of newly building, all back end all can send heartbeat message to monitor node.The role that serves as that keeper's need are registered each node server according to plan of distribution: workspace server, standby server.The server that is set to standby for subsequent use will enter holding state.Workspace server creates corresponding disk array with the node disk as required, and configuration forms the cluster stores logical volumes.The array information of each back end and also will be saved in the monitoring server synchronously in the assignment information of cluster logical storage volumes simultaneously.
The heartbeat message that monitoring server System reliability node sends according to the condition information of each back end of reporting in the heartbeat message and the alert level of setting, is reported the state information mail of cluster to the keeper.
If detect the back end fault, can't work, monitor node is replaced program with starter node, and as shown in Figure 2, the replacement program step is as follows:
Step 1, monitoring management server detect the state of each storage server, and when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
Monitoring server will reconfigure the array of replacing server according to the disk array information that is replaced node, be configured as consistent with the malfunctioning node array pattern.And the role who serves as in the cluster logical volume before according to malfunctioning node replaces.Cluster will be rebuild the data that are replaced in the node according to algorithm, preserve situation to recover fault data before.At the same time, monitoring server will be notified the keeper, in order in time repair malfunctioning node, and add new standby server to guarantee the normal operation of fault automatic replacement mechanism.
In the running of cloud storage system, the state-detection of storage server is continued always, can replace after guaranteeing to break down at once, will lose and impact is reduced to minimum.
When system detects a plurality of storage servers and makes mistakes, at first the storage server number of fault can be checked by system, then system can will wake the standby server of respective numbers up and be its load store node procedure, make its role with storage server add system to substitute the storage server of former fault, and then the state-detection of execution storage server, confirm its health.
Basic step is as follows:
1. system detects the storage server fault;
2. the failed storage number of servers is checked by system;
3. the standby server of system wake-up respective numbers and load store node procedure;
4. replace the storage server of fault;
5. detection of stored server state again;
6. determine that its healthy rear system continues operation.
After failed server replaced with standby server, report to the police by Alarm Server.Alarm Server comprises wireless communication unit.Alarm Server can break away from the constraint of cable by wireless communication unit, realizes long-distance alarm.Wireless communication unit can support mobile communications network or/and WLAN (wireless local area network).
When cloud storage system started for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, and all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state, and report to the police by Alarm Server.
Specific embodiment described herein only is to the explanation for example of the present invention's spirit.Those skilled in the art can make various modifications or replenish or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.
Although this paper has more used the terms such as cloud storage, standby server, monitoring management server, do not get rid of the possibility of using other term.Using these terms only is in order to describe more easily and explain essence of the present invention; They are construed to any additional restriction all is contrary with spirit of the present invention.
Claims (9)
1. cloud storage system that can automatically detect and replace malfunctioning node, comprise storage cluster and monitoring management server, it is characterized in that, also comprise several standby servers, described storage cluster comprises several storage servers, described monitoring management server connects respectively all storage servers in all standby servers and the storage cluster, and described monitoring management server is provided with the input/output interface with PERCOM peripheral communication.
2. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 is characterized in that, also comprises Alarm Server, and described Alarm Server is connected with described monitoring management server.
3. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 2 is characterized in that described Alarm Server comprises wireless communication unit.
4. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 and 2, it is characterized in that, described each standby server comprises a power control module, and described power control module is connected with described monitoring management server.
5. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 3 is characterized in that, also comprises caching server, and described caching server is connected with described monitoring management server.
6. a kind of cloud storage system that can automatically detect and replace malfunctioning node according to claim 1 is characterized in that also comprise fire compartment wall, described fire compartment wall is serially connected on the input/output interface of monitoring management server.
7. a cloud storage system detects and replaces the method for malfunctioning node automatically, it is characterized in that may further comprise the steps: step 1, monitoring management server detect the state of each storage server, when finding to enter step 2 after certain storage server breaks down, the storage server that breaks down is failed server;
Step 2, monitoring management server wake a standby server up and are promoted to the grade of the standby server that is waken up identical with failed server;
Step 3, monitoring management server are low to moderate fault level with the level down of failed server, the grade that fault level possesses less than the storage server of all normal operations;
Step 4, monitoring management server detect the state of the standby server that is waken up, if normal then the follow-up transfer of data that should store failed server into is stored in the standby server that is waken up, if undesired then the standby server that is waken up is set as new failed server, and repeat step 2 to step 4.
8. a kind of cloud storage system according to claim 7 automatically detects and replaces the method for malfunctioning node, it is characterized in that, failed server is replaced with standby server after, report to the police by Alarm Server.
9. a kind of cloud storage system according to claim 7 detects and replaces the method for malfunctioning node automatically, it is characterized in that, when cloud storage system starts for the first time, whether the monitoring management module detects first all storage servers, standby server, caching server and fire compartment wall normal, all normal later on control standby server enters resting state, and enters step 1; If a not normal operation is arranged in the detected equipment, then enters holding state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013101937604A CN103354503A (en) | 2013-05-23 | 2013-05-23 | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013101937604A CN103354503A (en) | 2013-05-23 | 2013-05-23 | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103354503A true CN103354503A (en) | 2013-10-16 |
Family
ID=49310819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013101937604A Pending CN103354503A (en) | 2013-05-23 | 2013-05-23 | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103354503A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104639358A (en) * | 2013-11-13 | 2015-05-20 | 中国石油化工股份有限公司 | Batched network port switching method and system |
CN105187482A (en) * | 2015-07-20 | 2015-12-23 | 深圳供电局有限公司 | PaaS platform fault self-healing realization method and message server |
CN105490847A (en) * | 2015-12-08 | 2016-04-13 | 天津市初志科技有限公司 | Real-time detecting and processing method of node failure in private cloud storage system |
CN105808391A (en) * | 2016-04-05 | 2016-07-27 | 浪潮电子信息产业股份有限公司 | Method and device for hot replacing CPU nodes |
CN106331642A (en) * | 2016-08-31 | 2017-01-11 | 浙江大华技术股份有限公司 | Method and device for processing data in video cloud system |
CN106407081A (en) * | 2016-09-30 | 2017-02-15 | 郑州云海信息技术有限公司 | Chassis management system and server |
CN106789257A (en) * | 2016-12-23 | 2017-05-31 | 航天星图科技(北京)有限公司 | A kind of cloud system server state visual management method |
CN108156040A (en) * | 2018-01-30 | 2018-06-12 | 北京交通大学 | A kind of central control node in distribution cloud storage system |
CN108319523A (en) * | 2017-12-13 | 2018-07-24 | 创新科存储技术(深圳)有限公司 | A kind of adding method of storage HotSpare disk |
CN108369503A (en) * | 2015-12-15 | 2018-08-03 | 微软技术许可有限责任公司 | Automatic system response to external field replaceable units (FRU) process |
CN108509143A (en) * | 2017-02-23 | 2018-09-07 | 杭州海康威视数字技术股份有限公司 | A kind of data detection method and device based on cloud storage |
CN109361560A (en) * | 2018-01-24 | 2019-02-19 | 广州Tcl智能家居科技有限公司 | A kind of clustered node Communication processing method, system, storage medium and server |
CN109460194A (en) * | 2018-11-16 | 2019-03-12 | 郑州云海信息技术有限公司 | A kind of storage array monitoring system and method |
CN110445811A (en) * | 2019-09-16 | 2019-11-12 | 秒针信息技术有限公司 | For the data management system of non-cloud storage, method, server and storage medium |
CN111045845A (en) * | 2019-11-29 | 2020-04-21 | 苏州浪潮智能科技有限公司 | Data returning method, device, equipment and computer readable storage medium |
CN111176888A (en) * | 2018-11-13 | 2020-05-19 | 浙江宇视科技有限公司 | Cloud storage disaster recovery method, device and system |
CN111352773A (en) * | 2020-02-28 | 2020-06-30 | 佛山科学技术学院 | Cloud computing server monitoring control method and system based on big data |
CN111787284A (en) * | 2020-07-16 | 2020-10-16 | 济南浪潮数据技术有限公司 | Data acquisition system |
CN111897697A (en) * | 2020-08-11 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Server hardware fault repairing method and device |
CN112256498A (en) * | 2020-11-17 | 2021-01-22 | 珠海大横琴科技发展有限公司 | Fault processing method and device |
CN112945304A (en) * | 2021-02-05 | 2021-06-11 | 广东海洋大学 | Aquaculture sea area environmental information acquisition system |
CN114189429A (en) * | 2021-11-25 | 2022-03-15 | 山东云海国创云计算装备产业创新中心有限公司 | System, method, device and medium for monitoring server cluster faults |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101022363A (en) * | 2007-03-23 | 2007-08-22 | 杭州华为三康技术有限公司 | Network storage equipment fault protecting method and device |
US8145945B2 (en) * | 2010-01-04 | 2012-03-27 | Avaya Inc. | Packet mirroring between primary and secondary virtualized software images for improved system failover performance |
CN102710766A (en) * | 2012-05-30 | 2012-10-03 | 浪潮电子信息产业股份有限公司 | Real-time access load evaluation-based cluster storage interface node configuration method |
CN103067740A (en) * | 2012-12-31 | 2013-04-24 | 浙江元亨通信技术股份有限公司 | Trouble intelligent detecting method for video surveillance device and detecting system thereof |
CN203289491U (en) * | 2013-05-23 | 2013-11-13 | 浙江闪龙科技有限公司 | Cluster storage system capable of automatically repairing fault node |
-
2013
- 2013-05-23 CN CN2013101937604A patent/CN103354503A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101022363A (en) * | 2007-03-23 | 2007-08-22 | 杭州华为三康技术有限公司 | Network storage equipment fault protecting method and device |
US8145945B2 (en) * | 2010-01-04 | 2012-03-27 | Avaya Inc. | Packet mirroring between primary and secondary virtualized software images for improved system failover performance |
CN102710766A (en) * | 2012-05-30 | 2012-10-03 | 浪潮电子信息产业股份有限公司 | Real-time access load evaluation-based cluster storage interface node configuration method |
CN103067740A (en) * | 2012-12-31 | 2013-04-24 | 浙江元亨通信技术股份有限公司 | Trouble intelligent detecting method for video surveillance device and detecting system thereof |
CN203289491U (en) * | 2013-05-23 | 2013-11-13 | 浙江闪龙科技有限公司 | Cluster storage system capable of automatically repairing fault node |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104639358A (en) * | 2013-11-13 | 2015-05-20 | 中国石油化工股份有限公司 | Batched network port switching method and system |
CN104639358B (en) * | 2013-11-13 | 2018-03-09 | 中国石油化工股份有限公司 | batch network port switching method and switching system |
CN105187482A (en) * | 2015-07-20 | 2015-12-23 | 深圳供电局有限公司 | PaaS platform fault self-healing realization method and message server |
CN105187482B (en) * | 2015-07-20 | 2018-09-28 | 深圳供电局有限公司 | PaaS platform fault self-healing realization method and message server |
CN105490847A (en) * | 2015-12-08 | 2016-04-13 | 天津市初志科技有限公司 | Real-time detecting and processing method of node failure in private cloud storage system |
CN105490847B (en) * | 2015-12-08 | 2019-03-29 | 天津市初志科技有限公司 | A kind of private cloud storage system interior joint failure real-time detection and processing method |
CN108369503A (en) * | 2015-12-15 | 2018-08-03 | 微软技术许可有限责任公司 | Automatic system response to external field replaceable units (FRU) process |
CN105808391A (en) * | 2016-04-05 | 2016-07-27 | 浪潮电子信息产业股份有限公司 | Method and device for hot replacing CPU nodes |
CN106331642A (en) * | 2016-08-31 | 2017-01-11 | 浙江大华技术股份有限公司 | Method and device for processing data in video cloud system |
CN106331642B (en) * | 2016-08-31 | 2020-05-26 | 浙江大华技术股份有限公司 | Data processing method and device in video cloud system |
CN106407081B (en) * | 2016-09-30 | 2020-05-26 | 苏州浪潮智能科技有限公司 | Case management system and server |
CN106407081A (en) * | 2016-09-30 | 2017-02-15 | 郑州云海信息技术有限公司 | Chassis management system and server |
CN106789257B (en) * | 2016-12-23 | 2019-03-05 | 中科星图股份有限公司 | A kind of cloud system server state visual management method |
CN106789257A (en) * | 2016-12-23 | 2017-05-31 | 航天星图科技(北京)有限公司 | A kind of cloud system server state visual management method |
CN108509143A (en) * | 2017-02-23 | 2018-09-07 | 杭州海康威视数字技术股份有限公司 | A kind of data detection method and device based on cloud storage |
CN108509143B (en) * | 2017-02-23 | 2020-11-06 | 杭州海康威视数字技术股份有限公司 | Data detection method and device based on cloud storage |
CN108319523A (en) * | 2017-12-13 | 2018-07-24 | 创新科存储技术(深圳)有限公司 | A kind of adding method of storage HotSpare disk |
CN109361560A (en) * | 2018-01-24 | 2019-02-19 | 广州Tcl智能家居科技有限公司 | A kind of clustered node Communication processing method, system, storage medium and server |
CN108156040A (en) * | 2018-01-30 | 2018-06-12 | 北京交通大学 | A kind of central control node in distribution cloud storage system |
CN111176888B (en) * | 2018-11-13 | 2023-09-15 | 浙江宇视科技有限公司 | Disaster recovery method, device and system for cloud storage |
CN111176888A (en) * | 2018-11-13 | 2020-05-19 | 浙江宇视科技有限公司 | Cloud storage disaster recovery method, device and system |
CN109460194A (en) * | 2018-11-16 | 2019-03-12 | 郑州云海信息技术有限公司 | A kind of storage array monitoring system and method |
CN110445811A (en) * | 2019-09-16 | 2019-11-12 | 秒针信息技术有限公司 | For the data management system of non-cloud storage, method, server and storage medium |
CN111045845A (en) * | 2019-11-29 | 2020-04-21 | 苏州浪潮智能科技有限公司 | Data returning method, device, equipment and computer readable storage medium |
CN111352773A (en) * | 2020-02-28 | 2020-06-30 | 佛山科学技术学院 | Cloud computing server monitoring control method and system based on big data |
CN111787284A (en) * | 2020-07-16 | 2020-10-16 | 济南浪潮数据技术有限公司 | Data acquisition system |
CN111897697A (en) * | 2020-08-11 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Server hardware fault repairing method and device |
CN112256498A (en) * | 2020-11-17 | 2021-01-22 | 珠海大横琴科技发展有限公司 | Fault processing method and device |
CN112945304A (en) * | 2021-02-05 | 2021-06-11 | 广东海洋大学 | Aquaculture sea area environmental information acquisition system |
CN114189429A (en) * | 2021-11-25 | 2022-03-15 | 山东云海国创云计算装备产业创新中心有限公司 | System, method, device and medium for monitoring server cluster faults |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103354503A (en) | Cloud storage system capable of automatically detecting and replacing failure nodes and method thereof | |
US5875290A (en) | Method and program product for synchronizing operator initiated commands with a failover process in a distributed processing system | |
EP2128766B1 (en) | Electronic apparatus system having a plurality of rack-mounted electronic apparatuses, and a method for identifying electronic apparatus in electronic apparatus system | |
US6012150A (en) | Apparatus for synchronizing operator initiated commands with a failover process in a distributed processing system | |
CN106802854B (en) | Fault monitoring system of multi-controller system | |
CN102360324B (en) | Failure recovery method and equipment for failure recovery | |
CN105302661A (en) | System and method for implementing virtualization management platform high availability | |
CN107729185B (en) | Fault processing method and device | |
US20120036387A1 (en) | Storage system, control apparatus, and control method | |
CN107508694B (en) | Node management method and node equipment in cluster | |
CN203289491U (en) | Cluster storage system capable of automatically repairing fault node | |
CN112181660A (en) | High-availability method based on server cluster | |
CN107480014A (en) | A kind of High Availabitity equipment switching method and device | |
WO2013145325A1 (en) | Information processing system, problem detection method and information processing device | |
CN111104283B (en) | Fault detection method, device, equipment and medium of distributed storage system | |
CN112601216B (en) | Zigbee-based trusted platform alarm method and system | |
CN104679623A (en) | Server hard disk maintaining method, system and server monitoring equipment | |
US7428655B2 (en) | Smart card for high-availability clustering | |
CN107071189A (en) | A kind of connection method of communication apparatus physical interface | |
CN108833189A (en) | A kind of memory node management system and method | |
JP2010244463A (en) | Event detection control method and system | |
CN106027661A (en) | Data cluster storage terminal | |
CN111309515B (en) | Disaster recovery control method, device and system | |
US20110187404A1 (en) | Method of detecting failure and monitoring apparatus | |
CN104158843A (en) | Storage unit invalidation detecting method and device for distributed file storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20131016 |