CN106844145A - A kind of server hardware fault early warning method and device - Google Patents

A kind of server hardware fault early warning method and device Download PDF

Info

Publication number
CN106844145A
CN106844145A CN201611247164.XA CN201611247164A CN106844145A CN 106844145 A CN106844145 A CN 106844145A CN 201611247164 A CN201611247164 A CN 201611247164A CN 106844145 A CN106844145 A CN 106844145A
Authority
CN
China
Prior art keywords
server
hardware
early warning
daily record
hardware fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611247164.XA
Other languages
Chinese (zh)
Inventor
刘臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201611247164.XA priority Critical patent/CN106844145A/en
Publication of CN106844145A publication Critical patent/CN106844145A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of server hardware fault early warning method and device.Methods described includes:Hardware fault early warning list is pre-created, correspondence saves different hardware fault early warning information and corresponding server log content in the list;Server system operation daily record is obtained, acquired server system operation daily record is matched with hardware fault early warning list is built;If there is occurrence, it is determined that the server will occur the hardware fault described by the corresponding hardware fault early warning information of occurrence.Above-mentioned technical proposal before server hardware failure can timely early warning, process to be known where problem according to early warning information and in time, the time of consumption is short, it is ensured that the stability of whole server hardware system.

Description

A kind of server hardware fault early warning method and device
Technical field
The present invention relates to field of computer technology, and in particular to a kind of server hardware fault early warning method and device.
Background technology
With on server cluster business demand be incremented by, the quantity of server hardware also can constantly increase.Numerous Server in, once there is the situation of server hardware failure, such as, more than guarantee period (referred to as cross protect), clothes can be caused The hydraulic performance decline of business device hardware, or even there is the situation of unexpected machine of delaying, cause shadow can to the operation of whole server hardware system Ring.In the maintenance work of numerous server hardwares, typically just may be used after server hardware failure in the prior art To be found, then just solved, server hardware failure can not be found in time, nor the institute that can pinpoint the problems in time The cycle of solve problem is more long, and then influences the stability of whole server hardware system.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State the server hardware fault early warning method and device of problem.
According to one aspect of the present invention, there is provided a kind of server hardware fault early warning method, including:
Hardware fault early warning list is pre-created, correspondence saves different hardware fault early warning information and phases in the list The server log content answered;
Server system operation daily record is obtained, by acquired server system operation daily record with described to build hardware fault pre- Alert list is matched;
If there is occurrence, it is determined that the server will occur the corresponding hardware fault early warning information of occurrence and be retouched The hardware fault stated.
Alternatively, it is described acquisition server system operation daily record, by acquired server system operation daily record with it is described Hardware fault early warning list carries out matching to be included:
Obtain the hardware-related daily record in server running log;
Acquired hardware-related daily record is matched with the hardware fault early warning list.
Alternatively, the hardware-related daily record obtained in server running log includes:
According to the system configuration of server, it is determined that preserving the journal file title of hardware-related daily record;
According to identified journal file title, hardware-related daily record is obtained from corresponding journal file.
Alternatively, will occur described by the corresponding hardware fault early warning information of occurrence in described determination server After hardware fault, the method is further included:
If there is other servers with the server storage identical data and offer same services, then by the server On services migrating on described other servers.
Alternatively, will occur described by the corresponding hardware fault early warning information of occurrence in described determination server After hardware fault, the method is further included:
If there is no other servers with the server storage identical data and offer same services, then this is serviced Data and service on device are all moved on the standby server specified.
Alternatively, will occur described by the corresponding hardware fault early warning information of occurrence in described determination server After hardware fault, the method is further included:
The report comprising the server identification and the application and trouble early warning information is sent to specified location by specifying channel Alert message.
Alternatively, the method is further included:
The early warning wrong report on the server is received to notify;
The server is put back into.
Alternatively, the method is further included:
When there is server to actually occur hardware fault, obtain the server and occur in the corresponding time range of hardware fault Server system operation daily record in hardware-related daily record;
It is former with the hardware that the server is actually occurred according to finding at least one in acquired hardware-related daily record Hinder related daily record;
The log content that will be found out is corresponding with the early warning information of the hardware fault that the server is actually occurred to be saved in institute In stating hardware fault early warning list.
According to another aspect of the present invention, there is provided a kind of server hardware fault pre-alarming device, including:
List maintenance unit, is suitable to be pre-created hardware fault early warning list, and correspondence saves different hard in the list Part fault pre-alarming information and corresponding server log content;
Log matches unit, is suitable to obtain server system operation daily record, by acquired server system operation daily record Build hardware fault early warning list and matched with described, if there is occurrence, notify fault pre-alarming unit;
Fault pre-alarming unit, is suitable to after the notice for receiving log matches unit, determines that the server will be matched Hardware fault described by the corresponding hardware fault early warning information of item.
Alternatively, the log matches unit, is suitable to obtain the hardware-related daily record in server running log;Will Acquired hardware-related daily record is matched with the hardware fault early warning list.
Alternatively, the log matches unit, is suitable to the system configuration according to server, it is determined that preserving hardware-related The journal file title of daily record;According to identified journal file title, obtain hardware-related from corresponding journal file Daily record.
Alternatively, the device is further included:
Early warning processing unit, is suitable to when the fault pre-alarming unit that to determine that the server will occur occurrence corresponding hard During hardware fault described by part fault pre-alarming information, judge whether and the server storage identical data and offer is identical Other servers of service, if there is then by the services migrating on the server to described other servers.
Alternatively, the early warning processing unit, is further adapted for working as and judges in the absence of number identical with the server storage According to and when other servers of same services are provided, the data on the server and service are all moved into the active service specified On device.
Alternatively, the fault pre-alarming unit, is further adapted for being sent comprising the clothes to specified location by specifying channel The warning message of business device mark and the application and trouble early warning information.
Alternatively, the fault pre-alarming unit, is further adapted for receiving the early warning wrong report on the server and notifies;Should Server puts back into.
Alternatively, wherein,
The list maintenance unit, is further adapted for, when there is server to actually occur hardware fault, obtaining the server Hardware-related daily record in server system operation daily record in the corresponding time range of generation hardware fault;According to acquired Hardware-related daily record in find at least one daily record related to the hardware fault that the server is actually occurred;To search The log content for going out is corresponding with the early warning information of the hardware fault that the server is actually occurred to be saved in the hardware fault early warning In list.
In sum, technology according to the present invention scheme, being pre-created one, to preserve the different hardware faults of correspondence pre- The hardware fault early warning list of alert information and corresponding server log content;Server system operation daily record is obtained in real time, and Matched with the hardware fault early warning list of building being pre-created;If without occurrence, illustrating that the server is not in hard Part failure;If occurrence, then determine that the server hardware will occur the corresponding hardware fault early warning letter of occurrence The described hardware fault of breath, is that server hardware attendant obtains early warning information in time, according to early warning information can and When the server hardware that will break down of discovery and problem where, it is possible to processed in time.It can be seen that, the present invention is in clothes Business device hardware break down before can timely early warning, to process where knowing problem according to early warning information and in time, disappear The time of consumption is short, it is ensured that the stability of whole server hardware system.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention, And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows that a kind of flow of server hardware fault early warning method according to an embodiment of the invention is illustrated Figure;
Fig. 2 shows a kind of structural representation of server hardware fault pre-alarming device according to an embodiment of the invention Figure;
Fig. 3 shows a kind of structural representation of server hardware fault pre-alarming device in accordance with another embodiment of the present invention Figure.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
Fig. 1 shows that a kind of flow of server hardware fault early warning method according to an embodiment of the invention is illustrated Figure.As shown in figure 1, the method, including:
Step S110, is pre-created hardware fault early warning list, and it is pre- correspondingly to save different hardware faults in the list Alert information and corresponding server log content.
System operation daily record in server can preserve different in operation condition of server, including server running Normal information etc..So, according to known failure early warning information and the log information corresponding to it, create a hardware fault Early warning list.Can include in the fault pre-alarming list in different hardware fault early warning information and corresponding server log Hold.For example, include in fault pre-alarming list server delay machine early warning information and its corresponding to server log content.
Step S120, obtains server system operation daily record, by acquired server system operation daily record and hardware event Barrier early warning list is matched.
Include different hardware fault early warning information and corresponding server log content in fault pre-alarming list, As long as the server log content in server system operation daily record in faulty early warning list, the server may occur Corresponding hardware fault.So, in order to whether detection service device occurs hardware fault, it is necessary to obtain server system operation day Will, is then matched acquired server system operation daily record with hardware fault early warning list, if without occurrence, Illustrate that the server does not occur the risk of hardware fault.
Step S130, if there is occurrence, it is determined that it is pre- that the server will occur the corresponding hardware fault of occurrence Hardware fault described by alert information.
As long as whether having in monitoring the system operation log content in each server in meeting the fault pre-alarming list Server log content, then being considered as the server hardware will occur the corresponding hardware fault early warning information institute of occurrence The hardware fault of description.For example, include in fault pre-alarming list server delay machine early warning information and its corresponding to service Device log content.When exist in the system journal content in getting server A and fault pre-alarming list in server delay machine institute The log content of corresponding server log content matching, then be considered as the server A it may happen that out-of-warranty forecast information Described in machine of delaying hardware fault.
After there is occurrence, corresponding early warning information is exported, the determination of problem is carried out for attendant, and Processed in time.Because these early warning information be corresponding server it is possible that hardware fault, attendant can be with Searched problem in time according to the early warning information, judge that whether the server can be continuing with, and processed accordingly, prevented After server goes wrong, the stability of system is influenceed.For example, occurring in that the early warning information of the disk failure of server B, then Just first the business in the disk of server B can be moved out, then attendant is checked, determines problem points, is carried out in time Solve, if server B can be continuing with, then business can be moved back, if server B is not available, then just exist New server is added to be changed.
It can be seen that, the present invention server hardware failure before can timely early warning, to be obtained according to early warning information Know where problem and process in time, the time of consumption is short, it is ensured that the stability of whole server hardware system.
Although the system operation daily record in server can be preserved in operation condition of server, including server running Abnormal information etc..But the system operation daily record enormous amount in server, for guaranteed efficiency, it is impossible to traversal server In all of system operation daily record.In one embodiment of the invention, the acquisition server system operation in step S120 Daily record, by acquired server system operation daily record and hardware fault early warning list match including:Obtain server fortune Hardware-related daily record in row daily record;Acquired hardware-related daily record and hardware fault early warning list are carried out Match somebody with somebody.Since it is desired that the early warning of hardware fault is carried out, so only needing to obtain the hardware-related day in server running log Will.For example, the daily record relevant with server memory;And the day of the hardware such as disk, CPU, mainboard, the power supply with server Will.
It is right to realize because the related daily record of hardware is to constantly update, then obtain the related daily record of hardware in real time The monitor in real time of server.Or predetermined time period, such as 1 minute, often by 1 minute, related with regard to hardware of acquisition Daily record.
Specifically, the hardware-related daily record in above-mentioned acquisition server running log includes:According to server System configuration, it is determined that preserving the journal file title of hardware-related daily record;According to identified journal file title, slave phase The journal file answered obtains hardware-related daily record.
For example, the relevant information of the internal memory in the system configuration for passing through server, determines the day of internal memory correlation in server Will file name, then the related journal file title of internal memory according to determined by, obtains and internal memory from corresponding journal file Related daily record.
In one embodiment of the invention, in step S130 it is determined that the server occurrence will to occur corresponding After hardware fault described by hardware fault early warning information, the method shown in Fig. 1 is further included:If there is with the service Device stores identical data and provides other servers of same services, then by the services migrating on the server to other servers On.
Have determined that the server will occur the hardware fault described by the corresponding hardware fault early warning information of occurrence, Really there is corresponding hardware fault to prevent the server, it is ensured that the stability of the service that the server undertakes, it is determined that The server will occur after the hardware fault described by the corresponding hardware fault early warning information of occurrence, first by the server On services migrating on other servers.And, other servers said herein are and the server storage identical data and carry For other servers of same services, it is ensured that the normal operation of business.
Services migrating on the server to other servers is reached the standard grade and is searched whether in the presence of identical with the server storage Other servers of data and offer same services.If there is no if, further, in step S130 it is determined that the clothes Business device will occur after the hardware fault described by the corresponding hardware fault early warning information of occurrence, and the method shown in Fig. 1 is entered One step includes:If there is no other servers with the server storage identical data and offer same services, then this is taken Data and service on business device are all moved on the standby server specified.
In one embodiment of the invention, in step S130 it is determined that the server occurrence will to occur corresponding After hardware fault described by hardware fault early warning information, the method shown in Fig. 1 is further included:By specifying channel to finger Positioning puts warning message of the transmission comprising the server identification and application and trouble early warning information.
When it is determined that the server will occur hardware fault described by the corresponding hardware fault early warning information of occurrence it Afterwards, even if in order to ensure that related personnel gets the early warning information, then need related warning message and corresponding clothes The information of business device is exported to the position specified, for example, being sent in the mailbox of attendant by way of mail.
But, however not excluded that the possibility of warning message presence mistake, that is, situation about reporting by mistake, if there is the situation of wrong report, but It is data in corresponding server and service is moved out or the server has been stopped using, in order to ensure the service Device is put back into, and specifically, the above method is further included:The early warning wrong report on the server is received to notify;Should Server is put back into, or the data that will be moved out and service are moved back again.For example, the service that supply voltage shakiness sends After the warning message that device may be powered off, the server may be stopped and use, but, find that the warning message belongs to after investigation Normal voltage pulsation, then be accomplished by putting back into the server, at this moment, related personnel will send the clothes The early warning wrong report of business device is notified.So, after the early warning for receiving the server is reported by mistake to be notified, the server is put into again to be made With.
Because in the fault pre-alarming information and corresponding server log that include in the hardware fault early warning list for creating Appearance can not cover all of situation, it is also desirable to constantly update hardware fault early warning list.In one embodiment of the invention, Method shown in Fig. 1 is further included:
When there is server to actually occur hardware fault, since server has occurred and that hardware fault, then just illustrate hard The list of part fault pre-alarming does not preserve early warning information and corresponding log content on the hardware fault.So it is accomplished by obtaining Hardware-related daily record in server system operation daily record in the corresponding time range of server generation hardware fault;Root According to finding at least one day related to the hardware fault that the server is actually occurred in acquired hardware-related daily record Will;The log content that will be found out is corresponding with the early warning information of the hardware fault that the server is actually occurred to be saved in hardware fault In early warning list, to realize the renewal to hardware fault early warning list.
Fig. 2 shows a kind of structural representation of server hardware fault pre-alarming device according to an embodiment of the invention Figure.As shown in Fig. 2 the server hardware fault pre-alarming device 200 includes:
List maintenance unit 210, is suitable to be pre-created hardware fault early warning list, and correspondence saves different in the list Hardware fault early warning information and corresponding server log content.
System operation daily record in server can preserve different in operation condition of server, including server running Normal information etc..So, according to known failure early warning information and the log information corresponding to it, create a hardware fault Early warning list.Can include in the fault pre-alarming list in different hardware fault early warning information and corresponding server log Hold.For example, include in fault pre-alarming list server delay machine early warning information and its corresponding to server log content.
Log matches unit 220, is suitable to obtain server system operation daily record, by acquired server system operation day Will is matched with hardware fault early warning list is built, and if there is occurrence, notifies fault pre-alarming unit.
Include different hardware fault early warning information and corresponding server log content in fault pre-alarming list, As long as the server log content in server system operation daily record in faulty early warning list, the server may occur Corresponding hardware fault.So, in order to whether detection service device occurs hardware fault, it is necessary to obtain server system operation day Will, is then matched acquired server system operation daily record with hardware fault early warning list, if without occurrence, Illustrate that the server does not occur the risk of hardware fault.
Fault pre-alarming unit 230, is suitable to after the notice for receiving log matches unit, determines that the server will occur With the hardware fault described by the corresponding hardware fault early warning information of item.
As long as whether having in monitoring the system operation log content in each server in meeting the fault pre-alarming list Server log content, then being considered as the server hardware will occur the corresponding hardware fault early warning information institute of occurrence The hardware fault of description.For example, include in fault pre-alarming list server delay machine early warning information and its corresponding to service Device log content.When exist in the system journal content in getting server A and fault pre-alarming list in server delay machine institute The log content of corresponding server log content matching, then be considered as the server A it may happen that out-of-warranty forecast information Described in machine of delaying hardware fault.
After there is occurrence, corresponding early warning information is exported, the determination of problem is carried out for attendant, and Processed in time.Because these early warning information be corresponding server it is possible that hardware fault, attendant can be with Searched problem in time according to the early warning information, judge that whether the server can be continuing with, and processed accordingly, prevented After server goes wrong, the stability of system is influenceed.For example, occurring in that the early warning information of the disk failure of server B, then Just first the business in the disk of server B can be moved out, then attendant is checked, determines problem points, is carried out in time Solve, if server B can be continuing with, then business can be moved back, if server B is not available, then just exist New server is added to be changed.
It can be seen that, the present invention server hardware failure before can timely early warning, to be obtained according to early warning information Know where problem and process in time, the time of consumption is short, it is ensured that the stability of whole server hardware system.
Although the system operation daily record in server can be preserved in operation condition of server, including server running Abnormal information etc..But the system operation daily record enormous amount in server, for guaranteed efficiency, it is impossible to traversal server In all of system operation daily record.In one embodiment of the invention, log matches unit 220, is suitable to obtain server Hardware-related daily record in running log;Acquired hardware-related daily record is carried out with hardware fault early warning list Matching.Since it is desired that the early warning of hardware fault is carried out, so hardware-related in only needing to acquisition server running log Daily record.For example, the daily record relevant with server memory;And the hardware such as disk, CPU, mainboard, the power supply with server Daily record.
It is right to realize because the related daily record of hardware is to constantly update, then obtain the related daily record of hardware in real time The monitor in real time of server.Or predetermined time period, such as 1 minute, often by 1 minute, related with regard to hardware of acquisition Daily record.
Specifically, log matches unit 220, is suitable to the system configuration according to server, it is determined that preserving hardware-related The journal file title of daily record;According to identified journal file title, obtain hardware-related from corresponding journal file Daily record.
For example, the relevant information of the internal memory in the system configuration for passing through server, determines the day of internal memory correlation in server Will file name, then the related journal file title of internal memory according to determined by, obtains and internal memory from corresponding journal file Related daily record.
Fig. 3 shows a kind of structural representation of server hardware fault pre-alarming device in accordance with another embodiment of the present invention Figure.As shown in figure 3, the server hardware fault pre-alarming device 300 includes:List maintenance unit 310, log matches unit 320, Fault pre-alarming unit 330 and early warning processing unit 340.Wherein, list maintenance unit 310, log matches unit 320, failure are pre- List maintenance unit 210, log matches unit 220 shown in alert unit 330 and Fig. 2, fault pre-alarming unit 230 have correspondence phase Same function, identical part will not be repeated here.
Early warning processing unit 340, is suitable to when fault pre-alarming unit that to determine that the server will occur occurrence corresponding hard During hardware fault described by part fault pre-alarming information, judge whether and the server storage identical data and offer is identical Other servers of service, if there is then by the services migrating on the server to other servers.
Have determined that the server will occur the hardware fault described by the corresponding hardware fault early warning information of occurrence, Really there is corresponding hardware fault to prevent the server, it is ensured that the stability of the service that the server undertakes, it is determined that The server will occur after the hardware fault described by the corresponding hardware fault early warning information of occurrence, first by the server On services migrating on other servers.And, other servers said herein are and the server storage identical data and carry For other servers of same services, it is ensured that the normal operation of business.
Services migrating on the server to other servers is reached the standard grade and is searched whether in the presence of identical with the server storage Other servers of data and offer same services.If there is no if, in one embodiment of the invention, early warning treatment Unit 340, is further adapted for when other clothes for judging not existing with the server storage identical data and offer same services During business device, the data on the server and service are all moved on the standby server specified.
In one embodiment of the invention, fault pre-alarming unit 330, is further adapted for by specifying channel to specific bit Put warning message of the transmission comprising the server identification and application and trouble early warning information.
When it is determined that the server will occur hardware fault described by the corresponding hardware fault early warning information of occurrence it Afterwards, even if in order to ensure that related personnel gets the early warning information, then need related warning message and corresponding clothes The information of business device is exported to the position specified, for example, being sent in the mailbox of attendant by way of mail.
But, however not excluded that the possibility of warning message presence mistake, that is, situation about reporting by mistake, if there is the situation of wrong report, but It is data in corresponding server and service is moved out or the server has been stopped using, in order to ensure the service Device puts back into, and specifically, fault pre-alarming unit 330 is further adapted for receiving the early warning wrong report on the server logical Know;The server is put back into, or the data that will be moved out and service are moved back again.For example, the unstable hair of supply voltage After the warning message that the server for going out may be powered off, the server may be stopped and use, but, the alarm is found after investigation Information belongs to normal voltage pulsation, then be accomplished by putting back into the server, at this moment, related personnel will send The early warning wrong report of one server is notified.So, after the early warning for receiving the server is reported by mistake to be notified, the service is thought highly of Newly come into operation.
Because in the fault pre-alarming information and corresponding server log that include in the hardware fault early warning list for creating Appearance can not cover all of situation, it is also desirable to constantly update hardware fault early warning list.In one embodiment of the invention, List maintenance unit 310, is further adapted for when there is server to actually occur hardware fault, since server has occurred and that hardware Failure, then just explanation hardware fault early warning list is not preserved in early warning information and corresponding daily record on the hardware fault Hold.So be accomplished by obtaining in the server system operation daily record that the server occurs in the corresponding time range of hardware fault with The related daily record of hardware;Actually occurred with the server according to finding at least one in acquired hardware-related daily record The related daily record of hardware fault;The early warning information pair of the hardware fault that the log content and the server that will be found out are actually occurred Should be saved in hardware fault early warning list, to realize the renewal to hardware fault early warning list.
In sum, technology according to the present invention scheme, being pre-created one, to preserve the different hardware faults of correspondence pre- The hardware fault early warning list of alert information and corresponding server log content;Server system operation daily record is obtained in real time, and Matched with the hardware fault early warning list of building being pre-created;If without occurrence, illustrating that the server is not in hard Part failure;If occurrence, then determine that the server hardware will occur the corresponding hardware fault early warning letter of occurrence The described hardware fault of breath, is that server hardware attendant obtains early warning information in time, according to early warning information can and When the server hardware that will break down of discovery and problem where, it is possible to processed in time.It can be seen that, the present invention is in clothes Business device hardware break down before can timely early warning, to process where knowing problem according to early warning information and in time, disappear The time of consumption is short, it is ensured that the stability of whole server hardware system.
It should be noted that:
Algorithm and display be not inherently related to any certain computer, virtual bench or miscellaneous equipment provided herein. Various fexible units can also be used together with based on teaching in this.As described above, construct required by this kind of device Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) realize server hardware fault pre-alarming device according to embodiments of the present invention In some or all parts some or all functions.The present invention is also implemented as described herein for performing Some or all equipment or program of device (for example, computer program and computer program product) of method.So Realize that program of the invention can be stored on a computer-readable medium, or can have one or more signal shape Formula.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, or with any other shape Formula is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.
The invention discloses A1, a kind of server hardware fault early warning method, including:
Hardware fault early warning list is pre-created, correspondence saves different hardware fault early warning information and phases in the list The server log content answered;
Server system operation daily record is obtained, by acquired server system operation daily record and the hardware fault early warning List is matched;
If there is occurrence, it is determined that the server will occur the corresponding hardware fault early warning information of occurrence and be retouched The hardware fault stated.
A2, the method as described in A1, wherein, the acquisition server system operation daily record, by acquired server system System running log and the hardware fault early warning list match including:
Obtain the hardware-related daily record in server running log;
Acquired hardware-related daily record is matched with the hardware fault early warning list.
A3, the method as described in A2, wherein, the hardware-related daily record bag obtained in server running log Include:
According to the system configuration of server, it is determined that preserving the journal file title of hardware-related daily record;
According to identified journal file title, hardware-related daily record is obtained from corresponding journal file.
A4, the method as described in A1, wherein, the corresponding hardware fault of occurrence will occur in described determination server After hardware fault described by early warning information, the method is further included:
If there is other servers with the server storage identical data and offer same services, then by the server On services migrating on described other servers.
A5, the method as described in A4, wherein, the corresponding hardware fault of occurrence will occur in described determination server After hardware fault described by early warning information, the method is further included:
If there is no other servers with the server storage identical data and offer same services, then this is serviced Data and service on device are all moved on the standby server specified.
A6, the method as described in A1, wherein, the corresponding hardware fault of occurrence will occur in described determination server After hardware fault described by early warning information, the method is further included:
The report comprising the server identification and the application and trouble early warning information is sent to specified location by specifying channel Alert message.
A7, the method as described in A6, wherein, the method is further included:
The early warning wrong report on the server is received to notify;
The server is put back into.
A8, the method as any one of A1-A7, wherein, the method is further included:
When there is server to actually occur hardware fault, obtain the server and occur in the corresponding time range of hardware fault Server system operation daily record in hardware-related daily record;
It is former with the hardware that the server is actually occurred according to finding at least one in acquired hardware-related daily record Hinder related daily record;
The log content that will be found out is corresponding with the early warning information of the hardware fault that the server is actually occurred to be saved in institute In stating hardware fault early warning list.
The invention also discloses B9, a kind of server hardware fault pre-alarming device, including:
List maintenance unit, is suitable to be pre-created hardware fault early warning list, and correspondence saves different hard in the list Part fault pre-alarming information and corresponding server log content;
Log matches unit, is suitable to obtain server system operation daily record, by acquired server system operation daily record Build hardware fault early warning list and matched with described, if there is occurrence, notify fault pre-alarming unit;
Fault pre-alarming unit, is suitable to after the notice for receiving log matches unit, determines that the server will be matched Hardware fault described by the corresponding hardware fault early warning information of item.
B10, the device as described in B9, wherein,
The log matches unit, is suitable to obtain the hardware-related daily record in server running log;Will be acquired Hardware-related daily record matched with the hardware fault early warning list.
B11, the device as described in B10, wherein,
The log matches unit, is suitable to the system configuration according to server, it is determined that preserving hardware-related daily record Journal file title;According to identified journal file title, hardware-related daily record is obtained from corresponding journal file.
B12, the device as described in B9, wherein, the device is further included:
Early warning processing unit, is suitable to when the fault pre-alarming unit that to determine that the server will occur occurrence corresponding hard During hardware fault described by part fault pre-alarming information, judge whether and the server storage identical data and offer is identical Other servers of service, if there is then by the services migrating on the server to described other servers.
B13, the device as described in B12, wherein,
The early warning processing unit, is further adapted for working as and judges do not exist and the server storage identical data and offer During other servers of same services, the data on the server and service are all moved on the standby server specified.
B14, the device as described in B9, wherein,
The fault pre-alarming unit, is further adapted for being sent comprising the server identification to specified location by specifying channel With the warning message of the application and trouble early warning information.
B15, the device as described in B14, wherein,
The fault pre-alarming unit, is further adapted for receiving the early warning wrong report on the server and notifies;By the server Put back into.
B16, the device as any one of B9-B15, wherein,
The list maintenance unit, is further adapted for, when there is server to actually occur hardware fault, obtaining the server Hardware-related daily record in server system operation daily record in the corresponding time range of generation hardware fault;According to acquired Hardware-related daily record in find at least one daily record related to the hardware fault that the server is actually occurred;To search The log content for going out is corresponding with the early warning information of the hardware fault that the server is actually occurred to be saved in the hardware fault early warning In list.

Claims (10)

1. a kind of server hardware fault early warning method, including:
Hardware fault early warning list is pre-created, correspondence saves different hardware fault early warning information and corresponding in the list Server log content;
Server system operation daily record is obtained, by acquired server system operation daily record and the hardware fault early warning list Matched;
If there is occurrence, it is determined that the server will occur described by the corresponding hardware fault early warning information of occurrence Hardware fault.
2. the method for claim 1, wherein acquisition server system operation daily record, by acquired server System operation daily record and the hardware fault early warning list match including:
Obtain the hardware-related daily record in server running log;
Acquired hardware-related daily record is matched with the hardware fault early warning list.
3. method as claimed in claim 2, wherein, the hardware-related daily record bag obtained in server running log Include:
According to the system configuration of server, it is determined that preserving the journal file title of hardware-related daily record;
According to identified journal file title, hardware-related daily record is obtained from corresponding journal file.
4. the corresponding hardware event of occurrence the method for claim 1, wherein will occur in described determination server After hardware fault described by barrier early warning information, the method is further included:
If there is other servers with the server storage identical data and offer same services, then by the server Services migrating is on described other servers.
5. method as claimed in claim 4, wherein, the corresponding hardware of occurrence will occur in described determination server therefore After hardware fault described by barrier early warning information, the method is further included:
If there is no other servers with the server storage identical data and offer same services, then by the server Data and service all move on the standby server specified.
6. the corresponding hardware event of occurrence the method for claim 1, wherein will occur in described determination server After hardware fault described by barrier early warning information, the method is further included:
Disappeared by specifying channel to send the alarm comprising the server identification and the application and trouble early warning information to specified location Breath.
7. method as claimed in claim 6, wherein, the method is further included:
The early warning wrong report on the server is received to notify;
The server is put back into.
8. the method as any one of claim 1-7, wherein, the method is further included:
When there is server to actually occur hardware fault, the clothes that the server occurs in the corresponding time range of hardware fault are obtained Hardware-related daily record in business device system operation daily record;
According to finding at least one hardware fault phase actually occurred with the server in acquired hardware-related daily record The daily record of pass;
The log content that will be found out is corresponding with the early warning information of the hardware fault that the server is actually occurred be saved in it is described hard In part fault pre-alarming list.
9. a kind of server hardware fault pre-alarming device, including:
List maintenance unit, is suitable to be pre-created hardware fault early warning list, and correspondence saves different hardware events in the list Barrier early warning information and corresponding server log content;
Log matches unit, is suitable to obtain server system operation daily record, by acquired server system operation daily record and institute State and build hardware fault early warning list and matched, if there is occurrence, notify fault pre-alarming unit;
Fault pre-alarming unit, is suitable to after the notice for receiving log matches unit, determines that the server will occur occurrence pair The hardware fault described by hardware fault early warning information answered.
10. device as claimed in claim 9, wherein,
The log matches unit, is suitable to obtain the hardware-related daily record in server running log;By it is acquired with The related daily record of hardware is matched with the hardware fault early warning list.
CN201611247164.XA 2016-12-29 2016-12-29 A kind of server hardware fault early warning method and device Pending CN106844145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611247164.XA CN106844145A (en) 2016-12-29 2016-12-29 A kind of server hardware fault early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611247164.XA CN106844145A (en) 2016-12-29 2016-12-29 A kind of server hardware fault early warning method and device

Publications (1)

Publication Number Publication Date
CN106844145A true CN106844145A (en) 2017-06-13

Family

ID=59113429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611247164.XA Pending CN106844145A (en) 2016-12-29 2016-12-29 A kind of server hardware fault early warning method and device

Country Status (1)

Country Link
CN (1) CN106844145A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108040159A (en) * 2017-11-30 2018-05-15 努比亚技术有限公司 Localization method, mobile terminal and readable storage medium storing program for executing are restarted based on hardware driving
CN108959038A (en) * 2018-07-16 2018-12-07 郑州云海信息技术有限公司 A kind of method and device of distributed application services monitoring
CN109558272A (en) * 2017-09-26 2019-04-02 北京国双科技有限公司 The fault recovery method and device of server
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN111108481A (en) * 2017-09-29 2020-05-05 华为技术有限公司 Fault analysis method and related equipment
CN111367397A (en) * 2020-03-02 2020-07-03 无锡华云数据技术服务有限公司 Cloud host migration method, cloud host downtime determination system and storage medium
CN111778551A (en) * 2020-07-14 2020-10-16 哈尔滨科友半导体产业装备与技术研究院有限公司 Cloud computing-based PVT method crystal growth system automatic early warning system
CN112948217A (en) * 2021-03-29 2021-06-11 腾讯科技(深圳)有限公司 Server repair checking method and device, storage medium and electronic equipment
CN113010375A (en) * 2021-02-26 2021-06-22 腾讯科技(深圳)有限公司 Equipment alarm method and related equipment
CN113094224A (en) * 2019-12-20 2021-07-09 中移全通系统集成有限公司 Server asset management method and device, computer equipment and storage medium
CN113268377A (en) * 2021-04-25 2021-08-17 山东英信计算机技术有限公司 Abnormal state data backup method, system and storage medium
CN114003461A (en) * 2021-09-26 2022-02-01 苏州浪潮智能科技有限公司 Server failure prediction method, system, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279775A (en) * 2011-08-19 2011-12-14 西安交通大学 Method for processing failure of hard disk under Linux system
JP2016091125A (en) * 2014-10-30 2016-05-23 株式会社日立システムズ Failure log detection and transfer system, failure log detection and transfer method, and program
CN105740121A (en) * 2016-01-26 2016-07-06 中国银行股份有限公司 Log text monitoring and early-warning method and apparatus
CN106254100A (en) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 A kind of data disaster tolerance methods, devices and systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279775A (en) * 2011-08-19 2011-12-14 西安交通大学 Method for processing failure of hard disk under Linux system
JP2016091125A (en) * 2014-10-30 2016-05-23 株式会社日立システムズ Failure log detection and transfer system, failure log detection and transfer method, and program
CN105740121A (en) * 2016-01-26 2016-07-06 中国银行股份有限公司 Log text monitoring and early-warning method and apparatus
CN106254100A (en) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 A kind of data disaster tolerance methods, devices and systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邵必林: "《海量信息存储安全技术及其应用》", 30 April 2014, 西北工业大学出版社 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558272A (en) * 2017-09-26 2019-04-02 北京国双科技有限公司 The fault recovery method and device of server
CN111108481A (en) * 2017-09-29 2020-05-05 华为技术有限公司 Fault analysis method and related equipment
CN111108481B (en) * 2017-09-29 2021-08-13 华为技术有限公司 Fault analysis method and related equipment
CN108040159B (en) * 2017-11-30 2021-01-29 江苏觅丰电商科技有限公司 Restart positioning method based on hardware drive, mobile terminal and readable storage medium
CN108040159A (en) * 2017-11-30 2018-05-15 努比亚技术有限公司 Localization method, mobile terminal and readable storage medium storing program for executing are restarted based on hardware driving
CN108959038A (en) * 2018-07-16 2018-12-07 郑州云海信息技术有限公司 A kind of method and device of distributed application services monitoring
CN109828868A (en) * 2019-01-04 2019-05-31 新华三技术有限公司成都分公司 Date storage method, device, management equipment and dual-active data-storage system
CN109828868B (en) * 2019-01-04 2023-02-03 新华三技术有限公司成都分公司 Data storage method, device, management equipment and double-active data storage system
CN110780646B (en) * 2019-09-21 2021-11-26 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN110780646A (en) * 2019-09-21 2020-02-11 苏州浪潮智能科技有限公司 Memory quality early warning method based on MES system
CN113094224A (en) * 2019-12-20 2021-07-09 中移全通系统集成有限公司 Server asset management method and device, computer equipment and storage medium
CN113094224B (en) * 2019-12-20 2022-07-29 中移全通系统集成有限公司 Server asset management method and device, computer equipment and storage medium
CN111367397A (en) * 2020-03-02 2020-07-03 无锡华云数据技术服务有限公司 Cloud host migration method, cloud host downtime determination system and storage medium
CN111778551A (en) * 2020-07-14 2020-10-16 哈尔滨科友半导体产业装备与技术研究院有限公司 Cloud computing-based PVT method crystal growth system automatic early warning system
CN113010375A (en) * 2021-02-26 2021-06-22 腾讯科技(深圳)有限公司 Equipment alarm method and related equipment
CN113010375B (en) * 2021-02-26 2023-03-28 腾讯科技(深圳)有限公司 Equipment alarm method and related equipment
CN112948217A (en) * 2021-03-29 2021-06-11 腾讯科技(深圳)有限公司 Server repair checking method and device, storage medium and electronic equipment
CN113268377A (en) * 2021-04-25 2021-08-17 山东英信计算机技术有限公司 Abnormal state data backup method, system and storage medium
CN114003461A (en) * 2021-09-26 2022-02-01 苏州浪潮智能科技有限公司 Server failure prediction method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN106844145A (en) A kind of server hardware fault early warning method and device
CN108833184B (en) Service fault positioning method and device, computer equipment and storage medium
US9170873B2 (en) Diagnosing distributed applications using application logs and request processing paths
CN105095056B (en) A kind of method of data warehouse data monitoring
US9071535B2 (en) Comparing node states to detect anomalies
Lim et al. A log mining approach to failure analysis of enterprise telephony systems
CN105512027B (en) Process status monitoring method and device
CN110851320A (en) Server downtime supervision method, system, terminal and storage medium
CN112737800B (en) Service node fault positioning method, call chain generating method and server
JP6878984B2 (en) Monitoring program, monitoring method and monitoring device
CN105404581A (en) Database evaluation method and device
CN109034423A (en) A kind of method, apparatus, equipment and storage medium that fault pre-alarming determines
CN111767173A (en) Network equipment data processing method and device, computer equipment and storage medium
CN103701655A (en) Fault self-diagnosis and self-recovery method and system for interchanger
CN106656636A (en) Cloud platform fault detection method and device
CN109271270A (en) The troubleshooting methodology, system and relevant apparatus of bottom hardware in storage system
CN111062503B (en) Power grid monitoring alarm processing method, system, terminal and storage medium
CN111597093B (en) Exception handling method, device and equipment thereof
CN115102838B (en) Emergency processing method and device for server downtime risk and electronic equipment
WO2014196982A1 (en) Identifying log messages
CN106789335A (en) A kind of method and system for processing information
CN107682173B (en) Automatic fault positioning method and system based on transaction model
JP2017211806A (en) Communication monitoring method, security management system, and program
CN115941441A (en) System link automation monitoring operation and maintenance method, system, equipment and medium
CN109003643A (en) A kind of data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication