CN102420710A - Method for positioning fault of server cluster system - Google Patents
Method for positioning fault of server cluster system Download PDFInfo
- Publication number
- CN102420710A CN102420710A CN2011104600595A CN201110460059A CN102420710A CN 102420710 A CN102420710 A CN 102420710A CN 2011104600595 A CN2011104600595 A CN 2011104600595A CN 201110460059 A CN201110460059 A CN 201110460059A CN 102420710 A CN102420710 A CN 102420710A
- Authority
- CN
- China
- Prior art keywords
- server
- signal
- cluster system
- cluster
- fault locating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a method for positioning a fault of a server cluster system. The server cluster system comprises a first server, a transmission channel and a second server, wherein the first server and the second server respectively operate in a unified extensible firmware interface (UEFI) environment; and a signal which is sent to the second server by the first server through the transmission channel is different from the signal which is received by the second server. The method comprises the following steps that: 1, the first server sends a first signal to the second server with an Ethernet network card; and 2, the second server sends the first signal to the first server in reverse direction with an external loop-back function of the Ethernet network card. By adoption of the method for positioning the fault of the server cluster system in the UEFI environment, dependence of network diagnosis on an operating system environment can be eliminated, and interference of different server nodes in a cluster on fault positioning caused by difference in operating systems and operation of various kinds of application programs on the operating systems can be eliminated, so that the fault can be accurately and efficiently positioned. By debugging connection of a cluster network beforehand in the environment of independence of the operating systems, conditions can be provided for subsequent remote deployment and installation of the operating systems on each node of the whole cluster, so that the deployment speed of the whole cluster is greatly improved.
Description
Technical field
The present invention is the design server field basically, more specifically, designs a kind of server cluster system Fault Locating Method.
Background technology
Server is as very important nucleus equipment in the integrated network system, and its environment for use be unable to do without network environment.Present server set group network, often tens of by at least, thousands of server is formed at most.When actual deployment, its operating system can not be one one and go manual installation, but depends on reliable and stable network environment, concentrated automatically by computer lab management software and installs.Before operating system installation, in case network failure occurs, we can find that operable positioning analysis means are relatively very deficient like this.
When machine room carried out actual server cluster deployment, we ran into such-and-such network problem through regular meeting for these.To the location of these network problems, be basically under the applied environment of operating system at present, utilize corresponding diagnosis debugging acid to carry out.Because these diagnosis debugging acids all depend on operating system, for different operating systems, though procotol is a standard, diagnosis debugging acid itself is to the processing of message, and all there is certain difference in parsing.Add the influence of other related softwares under the operating system environment, the positioning analysis of problem is caused interference through regular meeting.
Prior art provides the method for a kind of Long-distance Control and diagnosing fault of server power supply; It is switch through the program interface Control Server power module at telemanagement center; Check operating state, rotation speed of the fan, temperature, electric current, the power data information of server power supply, diagnose power supply to have or not damage effectively.This prior art has improved efficient to a certain extent.
Yet; Above-mentioned prior art can only be used for the inner problem of diagnosis server; Then can't be applied to the communication failure between the server in the diagnosis server group system, also existing diagnostic method all to run under the operating system, and when not having installing operating system, just can't diagnose.
Summary of the invention
Defective according to above-mentioned prior art; The invention provides a kind of server cluster system Fault Locating Method; Through this method; How to have solved the technical problem that the fault to the server in the aggregated server system positions, particularly solved how in the technical problem that does not have to diagnose for server under the situation of installing operating system.
According to an aspect of the present invention; A kind of server cluster system Fault Locating Method is provided; Said server cluster system comprises first server, transmission channel and second server; Said first server and said second server all operate under the UEFI environment; Said first server is different with the signal that said second server receives to the signal that said second server sends through said transmission channel, and it is characterized in that said method comprises: step S1: the said first server via Ethernet network interface card sends first signal to said second server; Said second server sends it back said first server to said first signals reverse through the external loop fuction of ethernet nic; Step S2: said first server receives secondary signal; And step S3: the abort situation of confirming said server cluster system through more said first signal and said secondary signal.
In this server cluster system Fault Locating Method, said step S3 comprises: if said first signal is identical with said secondary signal, then fault occurs in said second server.
In this server cluster system Fault Locating Method, said step S3 comprises: if said first signal is different with said secondary signal, then fault occurs in said first server or said transmission channel.
In this server cluster system Fault Locating Method, not installing operating system and application software in said first server or the said second server.
In this server cluster system Fault Locating Method, said step S1 comprises: step S11: import information that first signal or said administration module gather the coupled functional module that connects as first signal to said administration module; And step S12: said first server sends to said second server through ethernet nic with said first signal under the UEFI environment; Said second server sends it back said first server to said first signals reverse through the external loop fuction of ethernet nic.
In this server cluster system Fault Locating Method, at least one during said step S1 further may further comprise the steps: through said administration module inquiry help information; Generate first message through said administration module, and said first message is sent to said first server; Generate second message through said administration module, and said second message is sent to said second server; Dispose the parameter of said first server through said administration module; And the parameter that disposes said second server through said administration module.
In this server cluster system Fault Locating Method, said administration module is a computer,
In this server cluster system Fault Locating Method, said functional module is said first server.
In this server cluster system Fault Locating Method, said transmission channel is an Ethernet.
Through above-mentioned server cluster system Fault Locating Method; Can get rid of the dependence of network diagnosis to operating system environment; Can get rid of simultaneously different server node in the cluster because operating system difference reaches the various application programs of operation above the operating system, to the interference of fault location; Make accurate positioning, efficient.Under the environment that does not rely on operating system, in advance debug the connectivity of cluster network simultaneously, can carry out remote operating system deployment installation to each node of whole cluster condition is provided for follow-up, thereby improve the deployment speed of whole cluster greatly.
Description of drawings
Accompanying drawing is used to provide further understanding of the present invention, and constitutes the part of specification, is used to explain the present invention with embodiments of the invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the overview flow chart according to server cluster system Fault Locating Method of the present invention;
Fig. 2 is the particular flow sheet according to server cluster system Fault Locating Method of the present invention;
Fig. 3 is the particular flow sheet according to the instance of server cluster system Fault Locating Method of the present invention.
Embodiment
Below in conjunction with accompanying drawing the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein only is used for explanation and explains the present invention, and be not used in qualification the present invention.
Fig. 1 is the overview flow chart according to server cluster system Fault Locating Method of the present invention.Server cluster system using server cluster system Fault Locating Method illustrated in fig. 1 comprises first server, transmission channel and second server; Wherein, First server and second server all operate in UEFI (Unified Extensible Firmware Interface; General extensibility firmware interface) under the software environment, do not get into operating system.First server is different with the signal that second server receives to the signal that second server sends through transmission channel; That is to say; Being connected of this first server, transmission channel and second server has fault, need position this fault through Fault Locating Method, in Fig. 1:
Step S100: the first server via Ethernet network interface card sends first signal to second server, and second server sends it back first server to this first signals reverse through the external loop fuction of ethernet nic.Under the actual application environment server cluster system, software environment is very complicated often, does not say various application programs, only just possibly there are all kinds simultaneously in operating system itself and unify the various version of type.And in contrast, for the hardware environment of server cluster corresponding will be simply unified many.And replacing the UEFI conduct of traditional B IOS and the system firmware that hardware is closely related, its version kind also can be much relatively simply unified.The present invention utilizes UEFI more powerful relatively than traditional B IOS just, simultaneously again than the operating system running environment of " totally ".SHELL environment through UEFI positions the fault of server cluster system.
Step S102: first server receives secondary signal.
Step S104: confirm the abort situation of server cluster system through comparing first signal and secondary signal.Through judging that if first signal is identical with secondary signal, then fault occurs in second server; If first signal is different with secondary signal, then fault occurs in first server or transmission channel.Preferably, transmission channel is an Ethernet.
Through above-mentioned server cluster system Fault Locating Method; Can get rid of the dependence of network diagnosis to operating system environment; Can get rid of simultaneously different server node in the cluster because operating system difference reaches the various application programs of operation above the operating system, to the interference of fault location; Make accurate positioning, efficient.Under the environment that does not rely on operating system, in advance debug the connectivity of cluster network simultaneously, can carry out remote operating system deployment installation to each node of whole cluster condition is provided for follow-up, thereby improve the deployment speed of whole cluster greatly.
Fig. 2 is the particular flow sheet according to server cluster system Fault Locating Method of the present invention.In Fig. 2:
Step S200: import information that first signal or administration module gather the coupled functional module that connects as first signal to administration module.Preferably, this administration module is a computer, is preferably notebook computer.This functional module is first server.In one embodiment, can instruction be input in the notebook computer through keyboard, be first signal by notebook computer through treatment conversion.In another embodiment; Can notebook computer be connected with blade server; The Information Monitoring from first server of this notebook computer (such as; Chassis information, blade information, power information, system fan information, low speed switching module information, high speed switching module information, memory module information or the like), be first signal with this information translation then.
Step S202: first server sends to second server through ethernet nic with first signal under the UEFI environment, and second server sends it back first server to first signals reverse through the external loop fuction of ethernet nic.Wherein, not installing operating system and application software in this first server or the second server.
Step S204: first server receives secondary signal.
Step S206: judge whether first signal is identical with secondary signal.If first signal is identical with secondary signal, then this method proceeds to step S208, and promptly fault occurs in second server; If first signal is different with secondary signal, then this method proceeds to step S210, and fault occurs in first server or said transmission channel.
Through above-mentioned server cluster system Fault Locating Method; Can get rid of the dependence of network diagnosis to operating system environment; Can get rid of simultaneously different server node in the cluster because operating system difference reaches the various application programs of operation above the operating system, to the interference of fault location; Make accurate positioning, efficient.Under the environment that does not rely on operating system, in advance debug the connectivity of cluster network simultaneously, can carry out remote operating system deployment installation to each node of whole cluster condition is provided for follow-up, thereby improve the deployment speed of whole cluster greatly.
According to method involved in the present invention, can be in operating system for before installing, or operating system can't the situation of operate as normal under, for server cluster system provides a kind of effective failure diagnosis means.This will bring into play very big effect in large server clustered deploy(ment) process.With dawn 6000 (nebula) project is example, if in such large server cluster, can before operating system installation, can there be means to carry out the network debugging, networked devices such as cooperation switch have been divided the network segment.So the follow-up network connectivity problem that causes just can well in time be found, and dwindled fault coverage.
Simultaneously can install the condition that provides concentratedly, so just can save early stage in a large number and on production line, carry out the plenty of time that system installs on a small scale by the workman for the operating system in later stage.Can reduce the project cycle greatly.
Fig. 3 is the particular flow sheet according to the instance of server cluster system Fault Locating Method of the present invention.According to shown in Figure 3; Instruction is sent on the notebook computer (administration module),, can carry out following operation: check help information through the analysis of notebook computer; Carry out outloop (as stated), generate message, transmission/reception message, dispose the server that is connected with this notebook.
Through above-mentioned server cluster system Fault Locating Method; Can get rid of the dependence of network diagnosis to operating system environment; Can get rid of simultaneously different server node in the cluster because operating system difference reaches the various application programs of operation above the operating system, to the interference of fault location; Make accurate positioning, efficient.Under the environment that does not rely on operating system, in advance debug the connectivity of cluster network simultaneously, can carry out remote operating system deployment installation to each node of whole cluster condition is provided for follow-up, thereby improve the deployment speed of whole cluster greatly.
According to method involved in the present invention, can be in operating system for before installing, or operating system can't the situation of operate as normal under, for server cluster system provides a kind of effective failure diagnosis means.This will bring into play very big effect in large server clustered deploy(ment) process.With dawn 6000 (nebula) project is example, if in such large server cluster, can before operating system installation, can there be means to carry out the network debugging, networked devices such as cooperation switch have been divided the network segment.So the follow-up network connectivity problem that causes just can well in time be found, and dwindled fault coverage.
Simultaneously can install the condition that provides concentratedly, so just can save early stage in a large number and on production line, carry out the plenty of time that system installs on a small scale by the workman for the operating system in later stage.Can reduce the project cycle greatly.
The above is merely the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.All within spirit of the present invention and principle, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (9)
1. server cluster system Fault Locating Method; Said server cluster system comprises first server, transmission channel and second server; Said first server and said second server all operate under the UEFI environment; Said first server is different with the signal that said second server receives to the signal that said second server sends through said transmission channel, it is characterized in that said method comprises:
Step S1: the said first server via Ethernet network interface card sends first signal to said second server, and said second server sends it back said first server to said first signals reverse through the external loop fuction of ethernet nic;
Step S2: said first server receives secondary signal; And
Step S3: the abort situation of confirming said server cluster system through more said first signal and said secondary signal.
2. server cluster system Fault Locating Method according to claim 1 is characterized in that, said step S3 comprises: if said first signal is identical with said secondary signal, then fault occurs in said second server.
3. server cluster system Fault Locating Method according to claim 1 is characterized in that, said step S3 comprises: if said first signal is different with said secondary signal, then fault occurs in said first server or said transmission channel.
4. according to claim 2 or 3 described server cluster system Fault Locating Methods, it is characterized in that not installing operating system and application software in said first server or the said second server.
5. server cluster system Fault Locating Method according to claim 4 is characterized in that, said step S1 comprises:
Step S11: import information that first signal or said administration module gather the coupled functional module that connects as first signal to said administration module; And
Step S12: said first server sends to said second server through ethernet nic with said first signal under the UEFI environment; Said second server sends it back said first server to said first signals reverse through the external loop fuction of ethernet nic.
6. server cluster system Fault Locating Method according to claim 5 is characterized in that, at least one during said step S1 further may further comprise the steps:
Through said administration module inquiry help information;
Generate first message through said administration module, and said first message is sent to said first server;
Generate second message through said administration module, and said second message is sent to said second server;
Dispose the parameter of said first server through said administration module; And
Dispose the parameter of said second server through said administration module.
7. server cluster system Fault Locating Method according to claim 6 is characterized in that, said administration module is a computer,
8. server cluster system Fault Locating Method according to claim 7 is characterized in that, said functional module is said first server.
9. server cluster system Fault Locating Method according to claim 8 is characterized in that, said transmission channel is an Ethernet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104600595A CN102420710A (en) | 2011-12-31 | 2011-12-31 | Method for positioning fault of server cluster system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104600595A CN102420710A (en) | 2011-12-31 | 2011-12-31 | Method for positioning fault of server cluster system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102420710A true CN102420710A (en) | 2012-04-18 |
Family
ID=45944958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011104600595A Pending CN102420710A (en) | 2011-12-31 | 2011-12-31 | Method for positioning fault of server cluster system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102420710A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103973480A (en) * | 2014-04-09 | 2014-08-06 | 汉柏科技有限公司 | Device and method for increasing cloud computing system user response speed |
CN104468725A (en) * | 2014-11-06 | 2015-03-25 | 浪潮(北京)电子信息产业有限公司 | High-availability cluster software maintaining method, device and system |
CN104812066A (en) * | 2015-05-18 | 2015-07-29 | 百度在线网络技术(北京)有限公司 | Method and device for identifying and positioning faults and server |
CN110262968A (en) * | 2019-06-10 | 2019-09-20 | 天翼电子商务有限公司 | Promote method, system, medium and the electronic equipment of application failure location efficiency |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1859411A (en) * | 2006-03-18 | 2006-11-08 | 华为技术有限公司 | Method for detecting communication device link ringback and communication device |
CN101821990A (en) * | 2007-10-09 | 2010-09-01 | Lm爱立信电话有限公司 | Arrangement and method for handling failures in network |
US20100281295A1 (en) * | 2008-01-07 | 2010-11-04 | Abhay Karandikar | Method for Fast Connectivity Fault Management [CFM] of a Service-Network |
CN102075988A (en) * | 2009-11-24 | 2011-05-25 | 中国移动通信集团浙江有限公司 | System and method for locating end-to-end voice quality fault in mobile communication network |
CN102147763A (en) * | 2010-02-05 | 2011-08-10 | 中国长城计算机深圳股份有限公司 | Method, system and computer for recording weblog |
-
2011
- 2011-12-31 CN CN2011104600595A patent/CN102420710A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1859411A (en) * | 2006-03-18 | 2006-11-08 | 华为技术有限公司 | Method for detecting communication device link ringback and communication device |
CN101821990A (en) * | 2007-10-09 | 2010-09-01 | Lm爱立信电话有限公司 | Arrangement and method for handling failures in network |
US20100281295A1 (en) * | 2008-01-07 | 2010-11-04 | Abhay Karandikar | Method for Fast Connectivity Fault Management [CFM] of a Service-Network |
CN102075988A (en) * | 2009-11-24 | 2011-05-25 | 中国移动通信集团浙江有限公司 | System and method for locating end-to-end voice quality fault in mobile communication network |
CN102147763A (en) * | 2010-02-05 | 2011-08-10 | 中国长城计算机深圳股份有限公司 | Method, system and computer for recording weblog |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103973480A (en) * | 2014-04-09 | 2014-08-06 | 汉柏科技有限公司 | Device and method for increasing cloud computing system user response speed |
CN103973480B (en) * | 2014-04-09 | 2017-10-31 | 汉柏科技有限公司 | Improve the device and method of cloud computing system user reponding time |
CN104468725A (en) * | 2014-11-06 | 2015-03-25 | 浪潮(北京)电子信息产业有限公司 | High-availability cluster software maintaining method, device and system |
CN104468725B (en) * | 2014-11-06 | 2017-12-01 | 浪潮(北京)电子信息产业有限公司 | A kind of method, apparatus and system for realizing high-availability cluster software maintenance |
CN104812066A (en) * | 2015-05-18 | 2015-07-29 | 百度在线网络技术(北京)有限公司 | Method and device for identifying and positioning faults and server |
CN110262968A (en) * | 2019-06-10 | 2019-09-20 | 天翼电子商务有限公司 | Promote method, system, medium and the electronic equipment of application failure location efficiency |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110380907B (en) | Network fault diagnosis method and device, network equipment and storage medium | |
CN102047683B (en) | Dynamic fault analysis for a centrally managed network element in a telecommunications system | |
CN103457761B (en) | Cross-platform command line configuration interface implementation method | |
CN102571498A (en) | Fault injection control method and device | |
CN105656685A (en) | Automatic deployment and operation and maintenance monitoring method based on zabbix system oracle | |
CN111176939B (en) | Multi-node server management system and method based on CPLD | |
CN102420710A (en) | Method for positioning fault of server cluster system | |
EP3051750B1 (en) | Collection adaptor management method and system | |
WO2017193763A1 (en) | Testing method, apparatus and system | |
CN102325036A (en) | Fault diagnosis method for network system, system and device | |
US8816695B2 (en) | Method and system for interoperability testing | |
CN111052087A (en) | Control system, information processing device, and abnormality factor estimation program | |
CN101938369B (en) | Comprehensive network management access management system, management method and network management system applying same | |
CN103605592A (en) | Mechanism of detecting malfunctions of distributed computer system | |
CN111800299A (en) | Operation maintenance system and method of edge cloud | |
CN105404569A (en) | Method for testing remote Power Reset of server | |
CN112636960B (en) | Intranet collaborative maintenance method, system, device, server and storage medium of edge computing equipment | |
CN107707408B (en) | Remote monitoring method and system for digital broadcast transmitter | |
CN106411643B (en) | BMC detection method and device | |
CN109446002B (en) | Jig plate, system and method for grabbing SATA hard disk by server | |
CN116137603A (en) | Link fault detection method and device, storage medium and electronic device | |
US20160156501A1 (en) | Network apparatus with inserted management mechanism, system, and method for management and supervision | |
US11457374B2 (en) | Hub device with diagnostic function and diagnostic method using the same | |
CN112206453A (en) | Fire control system | |
CN106021649A (en) | A model configuration detector used for a virtual circuit verifying platform and a control method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120418 |