CN110427303A - A kind of fault alarming method and device - Google Patents

A kind of fault alarming method and device Download PDF

Info

Publication number
CN110427303A
CN110427303A CN201910695497.6A CN201910695497A CN110427303A CN 110427303 A CN110427303 A CN 110427303A CN 201910695497 A CN201910695497 A CN 201910695497A CN 110427303 A CN110427303 A CN 110427303A
Authority
CN
China
Prior art keywords
failure
screenshotss
server
log
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910695497.6A
Other languages
Chinese (zh)
Inventor
赵俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Technologies Co Ltd Chengdu Branch
Original Assignee
New H3C Technologies Co Ltd Chengdu Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd Chengdu Branch filed Critical New H3C Technologies Co Ltd Chengdu Branch
Priority to CN201910695497.6A priority Critical patent/CN110427303A/en
Publication of CN110427303A publication Critical patent/CN110427303A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a kind of fault alarming method and device, when the BMC on server can be made to detect first kind failure, the first kind failure includes the failure that the CPU of server is detected, then the screenshotss function of remote console is called to obtain screenshotss information to carry out screenshotss to the server OS information that display screen is shown using the screenshotss function of the remote console;Then fault warning is carried out, includes the screenshotss information in the fault warning.Compared with the prior art, the disclosure can realize the screenshotss of OS information when BMC detects first kind failure by the screenshotss function of remote console, operating system be needed not rely on, so as to so that the disclosure is suitable for several operation systems;And when the disclosure detects failure by the CPU of server, by the screenshotss function of BMC triggering remote console so as to carry out screenshotss in time in the event of a failure, is conducive to accident analysis and can stop loss in time, improve the troubleshooting efficiency of server.

Description

A kind of fault alarming method and device
Technical field
This disclosure relates to field of communication technology more particularly to a kind of fault alarming method and device.
Background technique
With the high speed development of informationization technology, server is widely used to major industry, such as internet industry, finance Industry, government organs, education and medical care mechanism etc. key industry, the reliability of server be select server important indicator it One.When server operating system, which is run quickly, bursts, it will usually huge economic loss is caused, and if system runs quickly and bursts, server cannot It quickly orients if failure cause restores again, client will be caused to be difficult to estimate to lose.
The server of installation windows operating system can carry out failure screenshotss when system runs quickly and bursts at present, thus according to Failure screenshotss carry out fault diagnosis, for example, when the windows operating system of server installation is in the process of running because failure causes When blue screen, windows operating system meeting automatic trigger intercepts blue screen snapshot, its screen error code is saved by snapshot screenshot Into the memory of server, user can be by logging in BMC (the Baseboard Management of the server Controller, baseboard management controller) it goes to read the blue screen snapshot in memory, check the relevant mistake letter of the system failure Breath, and then processing system failure.But above-mentioned method for diagnosing faults is only supported to use in windows operating system, applicability compared with Difference.
Summary of the invention
In view of this, the disclosure provides a kind of fault alarming method and device, come solve can only be in windows operating system The problem of middle progress fault diagnosis.
Specifically, the disclosure is achieved by the following technical solution:
The disclosure provides a kind of fault alarming method, and the method is applied to the BMC on server, this method comprises:
When detecting first kind failure, the first kind failure includes the failure that the CPU of server is detected, then calls The server OS information that the screenshotss function of remote console shows display screen with the screenshotss function using the remote console It carries out screenshotss and obtains screenshotss information;
Fault warning is carried out, includes the screenshotss information in the fault warning.
As one embodiment, the server further includes CPLD, and the CPLD is equipped with register, wherein the deposit The mode bit of device is revised as second value from the first value in the first kind failure, and first value indicates normal, and described second Value indicates abnormal;
BMC detects first kind failure, comprising:
When BMC detects that the mode bit of register on CPLD is revised as second value from the first value, first kind failure is determined.
As one embodiment, the failure that the CPU is detected includes at least a kind of following failure:
The internal fault of CPU;
The external connection failure of CPU.
As one embodiment, when detecting first kind failure, this method further comprises:
First kind fault message is collected from CPU, generate log according to the first kind fault message collected and stores log;
The progress fault warning includes:
By calling log download interface to download the log;
The log and the screenshotss information are sent to given client end together to alert.
As one embodiment, the log and the warning information are sent to given client end together and carry out alarm packet It includes:
By calling SMTP interface that the log and warning information carrying are sent to specified destination in mail Location, to alert to the corresponding given client end of specified destination address, the SMTP interface is configured with the specified destination Location.
Based on identical design, the disclosure also provides a kind of fault warning device, and described device is applied on server BMC, the device include:
Screenshotss unit, for when detecting first kind failure, the first kind failure to include that the CPU of server is detected Failure, then call the screenshotss function of remote console to show using the screenshotss function of the remote console to display screen Server OS information carries out screenshotss and obtains screenshotss information;
Alarm Unit includes the screenshotss information in the fault warning for carrying out fault warning.
As one embodiment, the server further includes CPLD, and the CPLD is equipped with register, wherein the deposit The mode bit of device is revised as second value from the first value in the first kind failure, and first value indicates normal, and described second Value indicates abnormal;
The screenshotss unit detects that the mode bit of register on CPLD is revised as second from the first value specifically for BMC When value, first kind failure is determined.
As one embodiment, the failure that the CPU is detected includes at least a kind of following failure:
The internal fault of CPU;
The external connection failure of CPU.
As one embodiment, which further comprises:
Collector unit, for first kind fault message being collected from CPU, according to what is collected when detecting first kind failure First kind fault message generates log and stores log;
The Alarm Unit is specifically used for by calling log download interface to download the log;By the log and institute It states screenshotss information and is sent to given client end together and alerted.
As one embodiment, the Alarm Unit is specifically used for by calling SMTP interface by the log and described Warning information carrying is sent to specified destination address in mail, to accuse to the corresponding given client end of specified destination address Alert, the SMTP interface is configured with the specified destination address.
Based on identical design, the disclosure also provides a kind of computer readable storage medium, the computer-readable storage Dielectric memory contains computer program, and the computer program realizes any of above-mentioned fault alarming method when being executed by processor Step.
Based on identical design, the disclosure also provides a kind of network equipment, and the network equipment includes memory, processing Device, communication interface and communication bus;Wherein, the memory, processor, communication interface carry out phase by the communication bus Communication between mutually;
The memory, for storing computer program;
The processor, for executing the computer program stored on the memory, described in the processor executes The either step of above-mentioned fault alarming method is realized when computer program.
It can be seen that the first kind failure is extremely when the disclosure can make the BMC on server detect first kind failure The failure that CPU including server is detected less calls the screenshotss function of remote console then to utilize the remote console Screenshotss function server OS information that display screen is shown carry out screenshotss and obtain screenshotss information;Then fault warning, institute are carried out Stating includes the screenshotss information in fault warning.Compared with the prior art, the disclosure can be when BMC detects first kind failure The screenshotss that OS information can be realized by the screenshotss function of remote console, need not rely on operating system, so as to so that originally It is open to be suitable for several operation systems;And when the disclosure detects failure by the CPU of server, long-range control is triggered by BMC The screenshotss function of platform processed is conducive to accident analysis and can stop in time so as to carry out screenshotss in time in the event of a failure Damage, improves the troubleshooting efficiency of server.
Detailed description of the invention
Fig. 1 is a kind of process flow diagram of one of illustrative embodiments of disclosure fault alarming method;
Fig. 2 is structural schematic diagram in server in a kind of illustrative embodiments of the disclosure;
Fig. 3 is the process flow diagram of the method for diagnosing faults in a kind of illustrative embodiments of the disclosure;
Fig. 4 is a kind of building-block of logic of one of illustrative embodiments of disclosure fault warning device;
A kind of hardware structure diagram of one of illustrative embodiments of Fig. 5 disclosure network equipment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
It is only to be not intended to be limiting the disclosure merely for for the purpose of describing particular embodiments in the term that the disclosure uses. The "an" of the singular used in disclosure and the accompanying claims book, " described " and "the" are also intended to including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the disclosure A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where disclosure range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".
There is also many problems for existing method for diagnosing faults, such as: the first, existing fault alarming method only has at present Windows system support system run quickly burst forth screen snapshot functions, other mainstream server operating systems such as Linux, Unix, Operating systems such as VMware etc. can not support failure screenshotss;The second, existing fault alarming method only has windows operating system Occur just triggering blue screen snapshot functions when blue screen, the system failure often has reached fatal rank at this time, so as to cause system It can not prevent in advance serious error and switch to fatal error, cause unnecessary loss;Third, when windows operating system blue screen When, system will voluntarily be restarted after having collected blue screen snapshot, when restarting can not prior notice client to can seriously affect Client traffic;4th, after windows operating system generates blue screen snapshot, need artificially to designated storage location active inquiry be No wrong generation, and can not active push fault warning, be unable to learn in time the system failure so as to cause user.
Of the existing technology in order to solve the problems, such as, the disclosure provides a kind of fault alarming method and device, can make to take When BMC on business device detects first kind failure, the first kind failure includes at least the failure that the CPU of server is detected, The server for then calling the screenshotss function of remote console to show with the screenshotss function using the remote console to display screen OS information carries out screenshotss and obtains screenshotss information;Then fault warning is carried out, includes the screenshotss information in the fault warning.Phase Than in the prior art, the disclosure can pass through the screenshotss function reality of remote console when BMC detects first kind failure The screenshotss of existing OS information, need not rely on operating system, so as to so that the disclosure is suitable for several operation systems;And this public affairs When opening the CPU of server and detecting failure, broken down by the screenshotss function of BMC triggering remote console When carry out screenshotss in time, be conducive to accident analysis and can stop loss in time, improve the troubleshooting efficiency of server.
Referring to FIG. 1, be a kind of process flow diagram of one of illustrative embodiments of disclosure fault alarming method, This method is applied to the BMC on server, which comprises
Step 101, when detecting first kind failure, the first kind failure includes the event that the CPU of server is detected Barrier, the then service for calling the screenshotss function of remote console to show with the screenshotss function using the remote console to display screen Device OS information carries out screenshotss and obtains screenshotss information;
In the present embodiment, the server further include CPLD (Complex Programmable Logic Device, Complex Programmable Logic Devices), the CPLD is equipped with register, wherein the mode bit of the register is in the first kind event It is revised as second value from the first value when barrier, first value indicates normal, and the second value indicates abnormal, such as first value Indicate normal for " 0 ", the second value is that " 1 " indicates abnormal.
As one embodiment, BMC detects first kind failure, specifically, BMC detects the state of register on CPLD When second value is revised as from the first value in position, such as the first value of the mode bit is that " 0 " indicates normal, and the second value is " 1 " It indicates abnormal, when the value of mode bit is switched to " 1 " from " 0 ", can be determined that first kind failure.
As one embodiment, the failure that the CPU is detected is including at least a kind of following failure: the internal fault of CPU, The external connection failure of CPU.In the present embodiment, the first kind failure can be MCA (Machine Check Architecture, machine check framework) failure.MCA mechanism is used to carry out server hardware self-test, and wrong in discovery hardware Interruption or abnormal, the hardware error, such as system bus mistake, ECC error, parity error, cache are issued when accidentally Mistake, TLB mistake etc..
Server architecture shown in Fig. 2 is turned next to, is carried out specifically to how BMC confirmly detects first kind failure It is bright.
BMC, CPU and CPLD are included at least in server shown in Fig. 2, the interaction flow of specific breakdown judge includes:
Step 201, CPU detect first kind failure;
Due to the CPU in the server of the disclosure have RAS (Reliability, Availability, Serviceability, reliability, availability, scalability) characteristic, which can make CPU detect server in time First kind failure, such as MCA failure.
Step 202, CPU send error signal to CPLD by the MSMI_N pin of itself, so that CPLD is by the finger of itself Determine the mode bit set of register;
Since CPU can be connected in CPLD by the MSMI_N pin of itself, when CPU internal detection to MCA event After barrier, error signal can be sent by the MSMI_N pin of itself, and (such as carterr error signal or MSMI error believe Number) receive the error signal to CPLD, CPLD after, can determine server occur Error type I, thus CPLD can incite somebody to action from The mode bit set of the specified register (such as 0x28 register) of body, that is, be by state from the first value be changed to second value, such as It is 0 by first value of mode bit, second value 1 can be by the mode bit set of the register after then CPLD receives error signal It is 1, breaks down in server for indicating.
Step 203, BMC poll CPLD specified register mode bit whether set, if so, BMC determines server First kind failure occurs;If it is not, then BMC determines that first kind failure does not occur for server.
BMC can create poll thread by backstage, monitor the above-mentioned specified register (such as 0x28 register) of CPLD State be whether be set to second value, to judge whether the server occurs first kind failure.
When BMC detects first kind failure, the screenshotss function of remote console can be further called, described in utilizing The screenshotss function of remote console carries out screenshotss to the server OS information interface that display screen is shown and obtains screenshotss information.It is specific next It says, the screenshotss function of remote console specifically can use KVM (Keyboard Video Mouse, keyboard, display, mouse) KVM screenshotss function in system, the OS acquisition of information of server is remotely shown into the display screen of client, and to aobvious Display screen carries out screenshotss, to obtain screenshotss information.
Step 102 carries out fault warning, includes the screenshotss information in the fault warning.
It, can be by the screenshotss information preservation to specified memory, the specified memory after BMC obtains screenshotss information It can be the nonvolatile memory in server or other equipment, such as SD card in server etc..BMC can be further Fault warning is carried out, includes the screenshotss information in the fault warning, the mode of alarm, which can be, actively pushes away screenshotss information User is given, or screenshotss information is stored to designated position and is checked for user.
Due to the prior art be determined by way of system blue screen server whether failure, and often when system blue screen The failure for representing system is more serious, or even has reached fatal rank, therefore the prior art is for more slight event The sensing capability of barrier is relatively low, (such as system is stuck, but non-blue screen) can not be known before serious error generation, to can not mention Preceding prevention serious error switchs to fatal error;And since the prior art is needed in equipment fault through detection application layer ability It was found that therefore detection opportunity it is not prompt enough.And the disclosure can by BMC to the hardware state in server detected come Know whether server breaks down in real time, therefore disclosure detection opportunity is more timely, and the meeting when hardware error occurs BMC is set to know hardware error in time, to carry out fault alarm, therefore the disclosure will not be until the fault progression of server is to cause Server failure is just known when (such as when system blue screen) life rank.Compared to as it can be seen that the disclosure can with timely learning fault condition, And it can be not limited to the high level failure of blue screen etc, the disclosure can be in the case where a variety of different degrees of failures all The type of fault identification can be increased with timely learning hardware fault so as to improve fault identification efficiency.
As one embodiment, when detecting first kind failure, BMC may further collect first kind failure from CPU Information, the first kind fault message of collection can generate log according to the first kind fault message of collection and store log;For example, BMC obtains the corresponding first kind fault message of the first kind failure, wherein may include the parameter information etc. of faulty hardware, from And log is generated according to the first kind fault message of collection and stores log.And alerted to screenshotss information specifically can be with by BMC By calling log download interface, such as SDS (Secure Diagnosis System, security diagnostics system) log downloading to connect Mouth downloads the log;Then the log and the screenshotss information given client end is sent to together to alert.Specifically For, it can be by calling SMTP (Simple Mail Transfer Protocol, Simple Mail Transfer protocol) interface by institute It states log and warning information carrying is sent to specified destination address in mail, with corresponding specified to specified destination address Client is alerted, and the SMTP interface is configured with the specified destination address, and the disclosure passes through fault message is corresponding Log and screenshotss information send jointly to user, and family can be used and become more apparent upon current fault condition, and be directed to the failure feelings Condition makes most reasonable processing.
Due in the prior art, after fault message screenshotss are stored in memory, user being needed actively to check that the memory comes Know server failure information, can not active push to user, will lead to user and be unable to learn in time fault condition, and the disclosure Screenshotss information and log directly can be issued into user by mail by SMTP, be informed in time in the event of a failure to realize The purpose of user, and then so that user is understood fault message in time and make corresponding counte-rplan.
For the objects, technical solutions and advantages of the disclosure are more clearly understood, below with reference to Fig. 3 to the scheme of the disclosure It is described in further detail.
Based on the process flow diagram of method for diagnosing faults shown in Fig. 3, including:
The mode bit that register is specified in step 301, BMC poll CPLD, judges whether mode bit is set;If so, Go to step 302;If it is not, then going to step 301;
Step 302 calls the screenshotss function of remote console to carry out screenshotss acquisition screenshotss information to the OS information of server, By the screenshotss information preservation to specified memory, 303 are gone to step;
Step 303, the first kind fault message for collecting the server generate log and store, and call the SDS in this BMC Log downloading function downloads the log, goes to step 304;
Step 304 calls the SMTP interface in this BMC to send the log and warning information carrying in mail To specified destination address, to be alerted to the corresponding given client end of specified destination address.
Assuming that when data failure of the current CPU in reading memory, meeting trigger data read error read error, thus CPU can send error signal by the MSMI_N pin of itself, and (such as carterr error signal or MSMI error believe Number) receive the error signal to CPLD, CPLD after, itself can be specified the mode bit of register (such as 0x28 register) from First value set is second value, is such as 1 from 0 set by the mode bit, MCA failure occurs in server for indicating.BMC poll It was found that when the mode bit of the specified register of CPLD is set, then it can determine that MCA failure occurs for server, so that BMC can be with The screenshotss function of remote console is called to carry out screenshotss to the operation system interface of current server, to obtain screenshotss letter It ceases, the dependent failure prompt of display read error is included in the screenshotss information.Meanwhile BMC can also be collected in CPU register The first kind fault message of each hardware generate and log and store the log, then call the SDS log in this BMC to download letter Number downloads the log.The log and warning information carrying are sent to specified mesh in mail finally by SMTP interface Address, to be alerted to the corresponding given client end of specified destination address so that user's timely learning fault warning, thus Solution is made in time.
Corresponding with the embodiment of aforementioned fault alarming method, the disclosure additionally provides the embodiment of fault warning device.
Fig. 4 is referred to, is a kind of structural schematic diagram of the fault warning device of the disclosure in one exemplary embodiment, Described device is applied to the BMC on server, and described device 40 includes:
Screenshotss unit 401, for when detecting first kind failure, the first kind failure to include at least server The failure that CPU is detected calls the screenshotss function of remote console then to utilize the screenshotss function of the remote console to aobvious The server OS information that display screen is shown carries out screenshotss and obtains screenshotss information;
Alarm Unit 402 includes the screenshotss information in the fault warning for carrying out fault warning.
As one embodiment, the server further includes CPLD, and the CPLD is equipped with register, wherein the deposit The mode bit of device is revised as second value from the first value in the first kind failure, and first value indicates normal, and described second Value indicates abnormal;
The screenshotss unit 401 detects that the mode bit of register on CPLD is revised as from the first value specifically for BMC When two-value, first kind failure is determined.
As one embodiment,
The failure that the CPU is detected includes at least a kind of following failure:
The internal fault of CPU;
The external connection failure of CPU.
As one embodiment, which further comprises:
Collector unit 403, for first kind fault message being collected from CPU, according to collection when detecting first kind failure First kind fault message generate and log and store log;
The Alarm Unit 402 is specifically used for by calling log download interface to download the log;By the log and The screenshotss information is sent to given client end together and is alerted.
As one embodiment, the Alarm Unit 402 is specifically used for by calling SMTP interface by the log and institute It states warning information carrying and is sent to specified destination address in mail, to be carried out to the corresponding given client end of specified destination address Alarm, the SMTP interface are configured with the specified destination address.
The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize disclosure scheme.Those of ordinary skill in the art are not paying Out in the case where creative work, it can understand and implement.
Corresponding with the embodiment of aforementioned fault alarming method, the disclosure additionally provides the network for realizing fault alarming method The embodiment of equipment.
As shown in figure 5, the network equipment includes memory 51, BMC52, communication interface 53 and communication bus 54;Its In, the memory 51, BMC52, communication interface 53 carry out mutual communication by the communication bus 54;
The memory 51, for storing computer program;
The BMC52, for executing the computer program stored on the memory 51, the processor 52 executes institute The either step for the fault alarming method that the embodiment of the present disclosure provides is realized when stating computer program.
The disclosure also provides a kind of computer readable storage medium, and calculating is stored in the computer readable storage medium Machine program realizes any step for the fault alarming method that the embodiment of the present disclosure provides when the computer program is executed by processor Suddenly.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.It is set especially for network For standby and computer readable storage medium embodiment, since it is substantially similar to the method embodiment, so the comparison of description Simply, the relevent part can refer to the partial explaination of embodiments of method.
In conclusion when the disclosure can make the BMC on server detect first kind failure, the first kind failure packet The failure that the CPU of server is detected is included, then calls the screenshotss function of remote console with cutting using the remote console Screen function carries out screenshotss to the server OS information that display screen is shown and obtains screenshotss information;Then fault warning, the event are carried out It include the screenshotss information in barrier alarm.Compared with the prior art, the disclosure can when BMC detects first kind failure The screenshotss that OS information is realized by the screenshotss function of remote console, need not rely on operating system, so as to so that the disclosure Suitable for several operation systems;And when the disclosure detects failure by the CPU of server, remote console is triggered by BMC Screenshotss function so as in the event of a failure in time carry out screenshotss, be conducive to accident analysis and can stop loss in time, mention The troubleshooting efficiency of server is risen.
The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all essences in the disclosure Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of disclosure protection.

Claims (12)

1. a kind of fault alarming method, which is characterized in that the method is applied to the baseboard management controller BMC on server, This method comprises:
When detecting first kind failure, the first kind failure includes the failure that the CPU of server is detected, then calls long-range The server operating system OS that the screenshotss function of console shows display screen with the screenshotss function using the remote console Information carries out screenshotss and obtains screenshotss information;
Fault warning is carried out, includes the screenshotss information in the fault warning.
2. the method according to claim 1, wherein the server further includes Complex Programmable Logic Devices CPLD, the CPLD are equipped with register, wherein the mode bit of the register is modified in the first kind failure from the first value For second value, first value indicates normal, and the second value indicates abnormal;
BMC detects first kind failure, comprising:
When BMC detects that the mode bit of register on CPLD is revised as second value from the first value, first kind failure is determined.
3. method according to claim 1 or 2, which is characterized in that
The failure that the CPU is detected includes at least a kind of following failure:
The internal fault of CPU;
The external connection failure of CPU.
4. the method according to claim 1, wherein this method is further wrapped when detecting first kind failure It includes:
First kind fault message is collected from CPU, generate log according to the first kind fault message collected and stores log;
The progress fault warning includes:
By calling log download interface to download the log;
The log and the screenshotss information are sent to given client end together to alert.
5. according to the method described in claim 4, it is characterized in that, the log and the warning information are sent to finger together Determine client carry out alarm include:
It is sent in mail by calling Simple Mail Transfer protocol SMTP interface to carry the log and the warning information To specified destination address, to alert to the corresponding given client end of specified destination address, the SMTP interface is configured with institute State specified destination address.
6. a kind of fault warning device, which is characterized in that described device is applied to the device BMC on server, which includes:
Screenshotss unit, for when detecting first kind failure, the first kind failure to include the event that the CPU of server is detected Barrier, the then service for calling the screenshotss function of remote console to show with the screenshotss function using the remote console to display screen Device OS information carries out screenshotss and obtains screenshotss information;
Alarm Unit includes the screenshotss information in the fault warning for carrying out fault warning.
7. device according to claim 6, which is characterized in that the server further includes CPLD, and the CPLD is equipped with and posts Storage, wherein the mode bit of the register is revised as second value, first value from the first value in the first kind failure Indicate normal, the second value indicates abnormal;
The screenshotss unit, when detecting that the mode bit of register on CPLD is revised as second value from the first value specifically for BMC, Determine first kind failure.
8. device according to claim 6 or 7, which is characterized in that
The failure that the CPU is detected includes at least a kind of following failure:
The internal fault of CPU;
The external connection failure of CPU.
9. device according to claim 6, which is characterized in that the device further comprises:
Collector unit, for first kind fault message being collected from CPU, according to first collected when detecting first kind failure Class fault message generates log and stores log;
The Alarm Unit is specifically used for by calling log download interface to download the log;By the log and described section Screen information is sent to given client end together and is alerted.
10. device according to claim 9, which is characterized in that
The Alarm Unit is specifically used for by calling SMTP interface to carry the log and the warning information in mail It is sent to specified destination address, to be alerted to the corresponding given client end of specified destination address, the SMTP interface configuration The specified destination address.
11. a kind of network equipment, which is characterized in that the network equipment includes memory, processor, communication interface and communication Bus;Wherein, the memory, processor, communication interface carry out mutual communication by the communication bus;
The memory, for storing computer program;
The processor, for executing the computer program stored on the memory, the processor executes the calculating The either step of the claim 1-5 method is realized when machine program.
12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program, the computer program realize the either step of the claim 1-5 method when being executed by processor.
CN201910695497.6A 2019-07-30 2019-07-30 A kind of fault alarming method and device Pending CN110427303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910695497.6A CN110427303A (en) 2019-07-30 2019-07-30 A kind of fault alarming method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910695497.6A CN110427303A (en) 2019-07-30 2019-07-30 A kind of fault alarming method and device

Publications (1)

Publication Number Publication Date
CN110427303A true CN110427303A (en) 2019-11-08

Family

ID=68411384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910695497.6A Pending CN110427303A (en) 2019-07-30 2019-07-30 A kind of fault alarming method and device

Country Status (1)

Country Link
CN (1) CN110427303A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111048139A (en) * 2019-12-22 2020-04-21 苏州浪潮智能科技有限公司 Storage medium detection method, device, equipment and readable storage medium
CN111049679A (en) * 2019-12-02 2020-04-21 深圳市智微智能软件开发有限公司 Server warning method and system
CN111131007A (en) * 2020-01-10 2020-05-08 山东超越数控电子股份有限公司 BMC mail sending method based on SMTP
CN111488235A (en) * 2020-04-16 2020-08-04 上海茂声智能科技有限公司 Terminal fault processing method and system and cloud platform
CN111581058A (en) * 2020-05-09 2020-08-25 西安易朴通讯技术有限公司 Fault management method, device, equipment and computer readable storage medium
CN111950743A (en) * 2020-07-08 2020-11-17 北京思特奇信息技术股份有限公司 Method and system for solving fault work order of mobile terminal
CN113064799A (en) * 2021-04-30 2021-07-02 网易传媒科技(北京)有限公司 Client monitoring method, device, system, medium and computing equipment
CN113722185A (en) * 2021-09-07 2021-11-30 超越科技股份有限公司 Domestic computer remote management system
WO2022001751A1 (en) * 2020-06-28 2022-01-06 中兴通讯股份有限公司 Virtual cloud desktop monitoring method, client and server, and storage medium
CN116560347A (en) * 2023-06-27 2023-08-08 江铃汽车股份有限公司 New energy automobile fault management method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082781A (en) * 2009-11-27 2011-06-01 宏正自动科技股份有限公司 Server management system and method
CN102609349A (en) * 2012-02-08 2012-07-25 北京百度网讯科技有限公司 Method and system for screen capture in server failure
JP2013196092A (en) * 2012-03-16 2013-09-30 Mitsubishi Electric Corp Monitoring control server device, and real time alarm management system and alarm reception terminal using the same
CN104794033A (en) * 2015-04-29 2015-07-22 浪潮电子信息产业股份有限公司 CPU low-frequency fault positioning method and device based on BMC
CN109039729A (en) * 2018-07-25 2018-12-18 浪潮电子信息产业股份有限公司 Fault detection method and device of cloud platform
CN109240863A (en) * 2018-08-30 2019-01-18 郑州云海信息技术有限公司 A kind of cpu fault localization method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082781A (en) * 2009-11-27 2011-06-01 宏正自动科技股份有限公司 Server management system and method
CN102609349A (en) * 2012-02-08 2012-07-25 北京百度网讯科技有限公司 Method and system for screen capture in server failure
JP2013196092A (en) * 2012-03-16 2013-09-30 Mitsubishi Electric Corp Monitoring control server device, and real time alarm management system and alarm reception terminal using the same
CN104794033A (en) * 2015-04-29 2015-07-22 浪潮电子信息产业股份有限公司 CPU low-frequency fault positioning method and device based on BMC
CN109039729A (en) * 2018-07-25 2018-12-18 浪潮电子信息产业股份有限公司 Fault detection method and device of cloud platform
CN109240863A (en) * 2018-08-30 2019-01-18 郑州云海信息技术有限公司 A kind of cpu fault localization method, device, equipment and storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111049679A (en) * 2019-12-02 2020-04-21 深圳市智微智能软件开发有限公司 Server warning method and system
CN111048139A (en) * 2019-12-22 2020-04-21 苏州浪潮智能科技有限公司 Storage medium detection method, device, equipment and readable storage medium
CN111131007A (en) * 2020-01-10 2020-05-08 山东超越数控电子股份有限公司 BMC mail sending method based on SMTP
CN111488235A (en) * 2020-04-16 2020-08-04 上海茂声智能科技有限公司 Terminal fault processing method and system and cloud platform
CN111488235B (en) * 2020-04-16 2023-06-16 上海茂声智能科技有限公司 Terminal fault processing method, system and cloud platform
CN111581058A (en) * 2020-05-09 2020-08-25 西安易朴通讯技术有限公司 Fault management method, device, equipment and computer readable storage medium
CN111581058B (en) * 2020-05-09 2024-03-19 西安易朴通讯技术有限公司 Fault management method, device, equipment and computer readable storage medium
WO2022001751A1 (en) * 2020-06-28 2022-01-06 中兴通讯股份有限公司 Virtual cloud desktop monitoring method, client and server, and storage medium
CN111950743A (en) * 2020-07-08 2020-11-17 北京思特奇信息技术股份有限公司 Method and system for solving fault work order of mobile terminal
CN113064799A (en) * 2021-04-30 2021-07-02 网易传媒科技(北京)有限公司 Client monitoring method, device, system, medium and computing equipment
CN113722185A (en) * 2021-09-07 2021-11-30 超越科技股份有限公司 Domestic computer remote management system
CN116560347A (en) * 2023-06-27 2023-08-08 江铃汽车股份有限公司 New energy automobile fault management method and system

Similar Documents

Publication Publication Date Title
CN110427303A (en) A kind of fault alarming method and device
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
KR101540129B1 (en) Remote access diagnostic device and methods thereof
US7328376B2 (en) Error reporting to diagnostic engines based on their diagnostic capabilities
US6829729B2 (en) Method and system for fault isolation methodology for I/O unrecoverable, uncorrectable error
CN106649173B (en) The in-orbit self-correction system and method for highly reliable spaceborne computer based on 1553B bus
CN111414268B (en) Fault processing method and device and server
EP0139069A2 (en) Distributed processing system with fault diagnostic
EP1224548B1 (en) System and method improving fault isolation and diagnosis in computers
US20100262863A1 (en) Method and device for the administration of computers
US6845469B2 (en) Method for managing an uncorrectable, unrecoverable data error (UE) as the UE passes through a plurality of devices in a central electronics complex
JPH01293450A (en) Troubled device specifying system
CN111104283B (en) Fault detection method, device, equipment and medium of distributed storage system
CN111048139A (en) Storage medium detection method, device, equipment and readable storage medium
CN109450669B (en) Abnormity alarming method, device and computer storage medium
CN110659159A (en) Service process operation monitoring method, device, equipment and storage medium
CN109120522A (en) A kind of multipath state monitoring method and device
US7278048B2 (en) Method, system and computer program product for improving system reliability
CN105849702A (en) Cluster system, server device, cluster system management method, and computer-readable recording medium
CN103731315A (en) Server failure detecting method
CN114116282B (en) Method and device for reporting and repairing network additional storage faults
US20220284704A1 (en) Anomaly detection method and system for image signal processor
CN115964218A (en) Method and device for identifying fault of high-speed serial computer expansion bus equipment
US11726853B2 (en) Electronic control device
CN115080362A (en) PCIE (peripheral component interface express) equipment speed reduction reporting method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191108

RJ01 Rejection of invention patent application after publication