CN107291584A - A kind of chassis failure detection method and system - Google Patents

A kind of chassis failure detection method and system Download PDF

Info

Publication number
CN107291584A
CN107291584A CN201710501633.4A CN201710501633A CN107291584A CN 107291584 A CN107291584 A CN 107291584A CN 201710501633 A CN201710501633 A CN 201710501633A CN 107291584 A CN107291584 A CN 107291584A
Authority
CN
China
Prior art keywords
error
error message
thread
classification
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710501633.4A
Other languages
Chinese (zh)
Inventor
王洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710501633.4A priority Critical patent/CN107291584A/en
Publication of CN107291584A publication Critical patent/CN107291584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing

Abstract

The invention discloses a kind of chassis failure detection method and system, including:Passage time management thread carries out timing, and when reaching a default time cycle, fault detection signal is sent to fault detect thread;Each hardware in cabinet is detected by fault detect thread, when detecting the failure of some hardware, the error message of failure is obtained, and error message is reported into error handle thread;The classification of error message is determined according to default classifying rules by error handle thread, and error message is handled according to the classification of error message.Therefore, timing, chassis failure detection and error message processing are realized respectively using three independent modules, the asynchronous process reported to chassis failure detection and error message is realized, chassis failure detection and the efficiency reported is improved, and then improve the efficiency to chassis failure processing.Also, the error message classification to failure is handled, and caching of classifying, and facilitates lookup of the user to error message.

Description

A kind of chassis failure detection method and system
Technical field
The present invention relates to computer realm, more particularly to a kind of chassis failure detection method and system.
Background technology
Cabinet is as the part in computer fittings, and its role is that placement and fixed various computer hardwares are matched somebody with somebody Put, and these hardware configurations are played a part of with support and protection, normal operation of these hardware configurations to computer plays non- Often important effect, accordingly, it would be desirable to which regularly whether monitoring cabinet hardware is normal.
In the prior art, typically cabinet is detected by the way of serial sensing, when detecting chassis failure, it is necessary to Just cabinet can be detected again after failure is reported into processing, it is understood that to be entered using a thread to chassis failure Row detection, the mode of this troubleshooting, it is impossible to while fault detect and troubleshooting are carried out, it is less efficient.
The content of the invention
In view of this, the embodiments of the invention provide a kind of chassis failure detection method and system, prior art is solved In, chassis failure is detected using same thread, it is impossible to while carrying out fault detect and troubleshooting, and improve event The efficiency that barrier detection and failure are reported.
A kind of chassis failure detection method provided in an embodiment of the present invention, this method includes:
Passage time management thread carries out timing, when reaching a default time cycle, is sent out to fault detect thread Send fault detection signal;
Each hardware in cabinet is detected by the fault detect thread, when the failure for detecting some hardware When, the error message of the failure is obtained, and the error message is reported into error handle thread;
The classification of the error message is determined according to default classifying rules by the error handle thread, and according to institute The classification for stating error message is handled the error message.
Optionally, it is described the error message is reported into error handle thread to include:
The error message is written in default first queue by the fault detect thread;
The error handle thread reads the error message from described first pair row.
Optionally, the classification that the error message is determined according to default classifying rules, including:
Obtain the error code in the error message;
The error code is matched with default classification information;
Obtain the classification information matched with the error code.
Optionally, the classification according to the error message is handled the error message, including:
The error message is converted into default form, the target error information after being changed;
According to the classification of the error message, the target error information is cached.
Optionally, in addition to:
The error handle thread is received after error message, judges whether to need to fault detect thread feedback information;
If needing to fault detect thread feedback information, feedback information is write default by the error handling module Two queues;
The fault detection module reads the feedback information from the second queue.Provided in an embodiment of the present invention one Chassis failure detecting system is planted, the system includes:
Time control module, timing is carried out for passage time management thread, when reaching a default time cycle, Fault detection signal is sent to fault detect thread;
Fault detection module, for, to each hardware is detected in cabinet, working as detection by the fault detect thread To some hardware failure when, obtain the error message of the failure, and the error message is reported into error handle thread;
Error handling module, for determining that the mistake is believed according to default classifying rules by the error handle thread The classification of breath, and the error message is handled according to the classification of the error message.
Optionally, the fault detection module, including:
Submodule is write, the error message is written in default first queue for the fault detect thread;
Reading submodule, the error message is read for the error handle thread from described first pair row.
Optionally, the error handling module, including:
First obtains subelement, for obtaining the error code in the error message;
Coupling subelement, for the error code to be matched with default classification information;
Second obtains subelement, for obtaining the classification information matched with the error code.
Optionally, the error handling module, including:
Form transform subblock, for the error message to be converted into default form, the target after being changed is wrong False information;
Cache sub-module, for the classification according to the error message, the target error information is cached.
Optionally, in addition to:
Judge module, is received after error message for the error handle thread, judges whether to need to fault detect Thread feedback information;
Writing module, if for needing to fault detect thread feedback information, the error handling module is by feedback letter The default second queue of breath write-in;
Read module, the feedback information is read for the fault detection module from the second queue.
The embodiment of the invention discloses a kind of chassis failure detection method and system, this method includes:Passage time manages Thread carries out timing, when reaching a default time cycle, and fault detection signal is sent to fault detect thread;By institute State fault detect thread to detect each hardware in cabinet, when detecting the failure of some hardware, obtain the failure Error message, and the error message is reported into error handle thread;By the error handle thread according to default Classifying rules determines the classification of the error message, and according to the error message classification to the error message at Reason.It follows that the method for the embodiment of the present invention, use three independent modules realize respectively timing, chassis failure detection with And error message processing, the asynchronous process reported to chassis failure detection and error message is realized, chassis failure inspection is improved The efficiency surveyed and reported, and then improve the efficiency to chassis failure processing.Also, the error message classification to failure is handled, And caching of classifying, facilitate lookup of the user to error message.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
Fig. 1 shows a kind of schematic flow sheet of chassis failure detection method provided in an embodiment of the present invention;
Fig. 2 shows the schematic diagram of caching provided in an embodiment of the present invention of classifying to error message;
Fig. 3 shows a kind of structural representation of chassis failure detecting system provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
With reference to Fig. 1, a kind of schematic flow sheet of chassis failure detection method provided in an embodiment of the present invention is shown, at this In embodiment, methods described can include:
S101:Passage time management thread carries out timing, when reaching a default time cycle, to fault detect line Journey sends fault detection signal;
In the present embodiment, before S101, a time management thread can be created in main thread first, and pass through master Thread sets a default time cycle, for example, the default time cycle set can be 10S.
In the present embodiment, before S101, main thread has also bound time interruption callback, when the inspection of time management thread Measure when reaching the default time cycle, callback can be interrupted by allocating time, so that time management thread is to failure Detect that thread sends fault detection signal.
S102:The fault detect thread is detected to each hardware in cabinet, when the failure for detecting some hardware When, the error message of the failure is obtained, and the error message is reported into error handle thread;
In the present embodiment, the error message got can include:Node ID, error code, error event ID, error description Information etc..These information can be with the source (for example, being the failure of which hardware) of the unique mark failure and the tool of the failure Body information.
Wherein, get after error message, the content of error message can carry out to packing processing, and after packing is handled Error message be sent to error handle thread.
In the present embodiment, before S103, main thread has been pre-created first queue, when fault detect thread detects event After the error message of barrier, the error message is reported into error handle thread can specifically include:
The error message is written in default first queue by the fault detect thread;
The error handle thread reads the error message from described first pair row.
Wherein, the default first queue is the first queue that main thread is pre-created, then error message is written into Before one queue, the error message first can also be pressed into queue.
, it is necessary to explanation in the present embodiment, first queue can be send/recv to row, and fault detect thread can be with Error message is written to send in row, error handle thread can be from recv to reading the error message of write-in in row.
In addition, in the present embodiment, failure thread is reported to error handle thread, it is also understood that it is by IPC to be (English full name:Inter-Process Communication, Chinese full name:Interprocess communication) mechanism progress.
S103:The error handle thread determines the classification of the error message, and foundation according to default classifying rules The classification of the error message is handled the error message.
In the present embodiment, the classification of the error message is determined according to default classifying rules, can specifically be included:
Obtain the error code in the error message;
The error code is matched with default classification information;
Obtain the classification information matched with the error code.
In the present embodiment, default classification can be technical staff rule of thumb or the construction of cabinet is pre-set , the classification of error message can for example include:Temperature, voltage, power, fan etc..
In the present embodiment, it is determined that after the classification of error message, different classes of information can carry out different processing.By In before S103, main thread has bound error callback functions, after the classification of error message is determined, can be with The corresponding error callback of the category are jumped to handle the error message.
In the present embodiment, error handle thread, can be by the mistake after the error message is read from first queue Information is converted to the form that error handle thread can be recognized first, and the classification of the error message is then determined again.
In the present embodiment, the classification according to the error message, which carries out corresponding processing to the error message, to be included:
The error message is converted into default form, the target error information after being changed;
According to the classification of the error message, the target error information is cached.
In the present embodiment, as shown in Fig. 2 different classes of error message can be cached to different positions, such as:Can be with Default spatial cache is divided first, the different spatial cache of different error category occupancy and position therefore, it can Realize and error message is subjected to classification storage, such user can transfer the error message of caching by wrong type.
In the present embodiment, some specific error messages that fault detect thread is detected are then forwarded to time management line , it is necessary to which time management thread feeds back the information whether received, specifically, can also include after journey:
The error handle thread is received after error message, judges whether to need to fault detect thread feedback information;
If needing to fault detect thread feedback information, feedback information is write default by the error handling module Two queues;
The fault detection module reads the feedback information from the second queue.
, it is necessary to explanation in the present embodiment, second queue can be send/recv to row, and error handle thread can be with Feedback information is written to send in row, fault detect thread can be from recv to reading the feedback information of write-in in row.
Wherein, the feedback information can be represented, whether the error handle thread receives fault detect thread transmission Error message.
, it is necessary in explanation, the certain operations being mentioned above, have and performing S101 during certain operations in the present embodiment Perform before, the operation performed before S101h can be expressed as initialization operation, wherein, the concrete operations of initialization can With including:
Main thread creation time management thread, fault detect thread and error handle thread;
Determine the effective hardware included in cabinet;
Time cycle is set;
Time management thread binding time interrupts callback;
Fault detect thread creation first queue;
Error handle thread creation second queue;
Error handle thread application spatial cache.
In the present embodiment, realized respectively at timing, chassis failure detection and error message using three independent threads Reason, realizes the asynchronous process reported to chassis failure detection and error message, improves chassis failure detection and the effect reported Rate, and then improve the efficiency to chassis failure processing.Also, the error message classification to failure is handled, and caching of classifying, side Lookup of the user to error message.
With reference to Fig. 3, a kind of structure flow chart of the system of chassis failure detection provided in an embodiment of the present invention is shown, In the present embodiment, the system includes:
Time control module 301, carries out timing, when reaching a default time cycle for passage time management thread When, send fault detection signal to fault detect thread;
Fault detection module 302, for, to each hardware is detected in cabinet, working as inspection by the fault detect thread When measuring the failure of some hardware, the error message of the failure is obtained, and the error message is reported into error handle line Journey;
Error handling module 303, for determining the mistake according to default classifying rules by the error handle thread The classification of false information, and the error message is handled according to the classification of the error message.
Optionally, the fault detection module, including:
Submodule is write, the error message is written in default first queue for the fault detect thread;
Reading submodule, the error message is read for the error handle thread from described first pair row.
Optionally, the error handling module, including:
First obtains subelement, for obtaining the error code in the error message;
Coupling subelement, for the error code to be matched with default classification information;
Second obtains subelement, for obtaining the classification information matched with the error code.
Optionally, the error handling module, including:
Form transform subblock, for the error message to be converted into default form, the target after being changed is wrong False information;
Cache sub-module, for the classification according to the error message, the target error information is cached.
Optionally, in addition to:
Judge module, is received after error message for the error handle thread, judges whether to need to fault detect Thread feedback information;
Writing module, if for needing to fault detect thread feedback information, the error handling module is by feedback letter The default second queue of breath write-in;
Read module, the feedback information is read for the fault detection module from the second queue.
The device of the embodiment of the present invention, timing, chassis failure detection and mistake are realized using three independent modules respectively False information processing, realizes and chassis failure is detected and the asynchronous process that reports of error message, improve chassis failure detection and The efficiency reported, and then improve the efficiency to chassis failure processing.Also, the error message classification to failure is handled, and point Class is cached, and facilitates lookup of the user to error message.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation be all between difference with other embodiment, each embodiment identical similar part mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of chassis failure detection method, it is characterised in that methods described includes:
Passage time management thread carries out timing, and when reaching a default time cycle, event is sent to fault detect thread Barrier detection signal;
Each hardware in cabinet is detected by the fault detect thread, when detecting the failure of some hardware, obtained The error message of the failure is taken, and the error message is reported into error handle thread;
The classification of the error message is determined according to default classifying rules by the error handle thread, and according to the mistake The classification of false information is handled the error message.
2. according to the method described in claim 1, it is characterised in that described that the error message is reported into error handle thread Including:
The error message is written in default first queue by the fault detect thread;
The error handle thread reads the error message from described first pair row.
3. according to the method described in claim 1, it is characterised in that described to determine the mistake letter according to default classifying rules The classification of breath, including:
Obtain the error code in the error message;
The error code is matched with default classification information;
Obtain the classification information matched with the error code.
4. according to the method described in claim 1, it is characterised in that the classification according to the error message is to the mistake Information is handled, including:
The error message is converted into default form, the target error information after being changed;
According to the classification of the error message, the target error information is cached.
5. according to the method described in claim 1, in addition to:
The error handle thread is received after error message, judges whether to need to fault detect thread feedback information;
If needing to fault detect thread feedback information, feedback information is write default second team by the error handling module Row;
The fault detection module reads the feedback information from the second queue.
6. a kind of chassis failure detecting system, it is characterised in that the system includes:
Time control module, timing is carried out for passage time management thread, when reaching a default time cycle, to event Barrier detection thread sends fault detection signal;
Fault detection module, for being detected by the fault detect thread to each hardware in cabinet, when detecting certain During the failure of individual hardware, the error message of the failure is obtained, and the error message is reported into error handle thread;
Error handling module, for determining the error message according to default classifying rules by the error handle thread Classification, and the error message is handled according to the classification of the error message.
7. system according to claim 6, it is characterised in that the fault detection module, including:
Submodule is write, the error message is written in default first queue for the fault detect thread;
Reading submodule, the error message is read for the error handle thread from described first pair row.
8. system according to claim 6, it is characterised in that the error handling module, including:
First obtains subelement, for obtaining the error code in the error message;
Coupling subelement, for the error code to be matched with default classification information;
Second obtains subelement, for obtaining the classification information matched with the error code.
9. system according to claim 6, it is characterised in that the error handling module, including:
Form transform subblock, for the error message to be converted into default form, the target error letter after being changed Breath;
Cache sub-module, for the classification according to the error message, the target error information is cached.
10. system according to claim 6, in addition to:
Judge module, is received after error message for the error handle thread, judges whether to need to fault detect thread Feedback information;
Writing module, if for needing to fault detect thread feedback information, the error handling module writes feedback information Enter default second queue;
Read module, the feedback information is read for the fault detection module from the second queue.
CN201710501633.4A 2017-06-27 2017-06-27 A kind of chassis failure detection method and system Pending CN107291584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710501633.4A CN107291584A (en) 2017-06-27 2017-06-27 A kind of chassis failure detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710501633.4A CN107291584A (en) 2017-06-27 2017-06-27 A kind of chassis failure detection method and system

Publications (1)

Publication Number Publication Date
CN107291584A true CN107291584A (en) 2017-10-24

Family

ID=60099422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710501633.4A Pending CN107291584A (en) 2017-06-27 2017-06-27 A kind of chassis failure detection method and system

Country Status (1)

Country Link
CN (1) CN107291584A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605592A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Mechanism of detecting malfunctions of distributed computer system
CN105281962A (en) * 2015-12-03 2016-01-27 成都广达新网科技股份有限公司 System for achieving network management performance collection based on parallel pipelines and working method thereof
CN106453420A (en) * 2016-12-08 2017-02-22 郑州云海信息技术有限公司 Request processing device and method and terminal
CN106598790A (en) * 2015-10-16 2017-04-26 中兴通讯股份有限公司 Server hardware failure detection method, apparatus of server, and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605592A (en) * 2013-11-29 2014-02-26 中国航空工业集团公司第六三一研究所 Mechanism of detecting malfunctions of distributed computer system
CN106598790A (en) * 2015-10-16 2017-04-26 中兴通讯股份有限公司 Server hardware failure detection method, apparatus of server, and server
CN105281962A (en) * 2015-12-03 2016-01-27 成都广达新网科技股份有限公司 System for achieving network management performance collection based on parallel pipelines and working method thereof
CN106453420A (en) * 2016-12-08 2017-02-22 郑州云海信息技术有限公司 Request processing device and method and terminal

Similar Documents

Publication Publication Date Title
CN104572517B (en) Method, controller and the computer system of requested date are provided
CN101292226B (en) Device, system and method for thread communication and synchronization
CN104537103B (en) Data processing method and data processing equipment
CN108959564A (en) Data warehouse metadata management method, readable storage medium storing program for executing and computer equipment
CN108833131A (en) System, method, equipment and the computer storage medium of distributed data base cloud service
CN105917345B (en) The detection of side channel analysis between virtual machine
CN104598341B (en) For determining the method and system of the location of fault between interconnection/controller
CN106815119A (en) The hardware monitoring device of server
CN101385276A (en) Apparatus, system and method for error assessment over a communication link
CN106936616A (en) Backup communication method and apparatus
CN108199860A (en) A kind of alert processing method and the network equipment
WO2021114877A1 (en) Missing-number detection method, apparatus, electronic device, and storage medium
CN110333989A (en) A kind of server failure detection method, system and electronic equipment and storage medium
Hammond et al. Leveraging global influenza surveillance and response system for the COVID-19 pandemic response and beyond
CN102959526B (en) Address mapping testing fixture, central authorities' process arithmetic unit and address mapping inspection method
CN108092921A (en) Data exchange system and method
JP2014120001A (en) Monitoring device, monitoring method of monitoring object host, monitoring program, and recording medium
CN107291584A (en) A kind of chassis failure detection method and system
CN104781790B (en) Signal software recoverable error
CN107590060A (en) A kind of analysis method and device of terminal interim card
JP2003305010A (en) Method and system for monitoring communicable disease
JP5544929B2 (en) Operation management device, operation management method, operation management program
CN107291596A (en) A kind of computer glitch maintenance system based on internet
CN101594305A (en) A kind of message processing method and device
US7992047B2 (en) Context sensitive detection of failing I/O devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171024