CN102567550A - Method and device for collecting data of emergency event in operating system (OS) - Google Patents

Method and device for collecting data of emergency event in operating system (OS) Download PDF

Info

Publication number
CN102567550A
CN102567550A CN2011104557862A CN201110455786A CN102567550A CN 102567550 A CN102567550 A CN 102567550A CN 2011104557862 A CN2011104557862 A CN 2011104557862A CN 201110455786 A CN201110455786 A CN 201110455786A CN 102567550 A CN102567550 A CN 102567550A
Authority
CN
China
Prior art keywords
capsule
packet
emergency event
data
operating system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011104557862A
Other languages
Chinese (zh)
Inventor
王卫钢
吴建成
沙超群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN2011104557862A priority Critical patent/CN102567550A/en
Publication of CN102567550A publication Critical patent/CN102567550A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for collecting data of an emergency event in an operating system (OS). The method comprises the following steps of: starting the operation system; checking whether a data packet of a capsule exists during the generation of an OS emergency event, wherein if the data packet of the capsule is discovered, and data containing the OS emergency event are checked to exist in the data packet or not, if the data containing the OS emergency event are discovered, and the data packet of the capsule is determined to have permanent efficacy on the OS or not, and if the data packet of the capsule is discovered to have the permanent efficacy on the OS, and the data packet of the capsule is listed into a system table; and resetting the system.

Description

The collecting method of emergency event and device among the operating system OS
Technical field
Present invention relates in general to computer realm, more specifically, relate to the collecting method and the device of emergency event among the operating system OS.
Background technology
In the prior art, when the application of the data center of large-scale cluster and computing center, usually need be configured and safeguard each node
Distributional pattern, data center or computer room administrator that node disperses; When relating to the maintenance work that needs the BIOS of each node of configuration setting; According to the data of the IDC of Internet data center, the failure rate of server OS emergency event OS panic that is applied in industry-by-industry is up to 10%.
Yet, occur under the situation of so high failure rate, often can't accurately locate and solve because of the basic data at the scene when lacking fault and taking place.Therefore, need a kind of mechanism of the collection to the data relevant at present badly with fault.
Summary of the invention
For addressing the above problem, the invention provides the collecting method of emergency event among a kind of operating system OS, may further comprise the steps: start the operating system; When the OS emergency event takes place when; Whether inspection has the packet of capsule capsule, if find to have the packet of capsule, checks then whether the data that comprise said OS emergency event are arranged in the said packet; If find to have the data that comprise said OS emergency event; Whether the packet that then defines capsule is effectively permanent to said OS, if find saidly have the packet of capsule forever effective to said OS, then has the packet of capsule to list in system's table with said; And with system reset.
Before starting the step of said operating system, said method also comprises: handle the data of last emergency event, and will the analytical information relevant with said data show and carry out fault diagnosis.
When said OS emergency event took place, said OS can expand the related service of the said capsule that defines in the fixed interface UEFI standard and makes up said capsule through calling unification, and said capsule is sent to firmware firmware.
Wherein, the said related service that makes up said capsule becomes said capsule with said data encapsulation, and said firmware is recorded in said capsule in the storage medium.
Wherein, said storage medium is a non-volatile memory medium.
In addition, the data collector of emergency event among a kind of operating system OS is provided also, has comprised: started module, be used to start the operating system; The first inspection module is used for when the OS emergency event takes place, and whether inspection has the packet of capsule capsule; Second detection module is used for when finding the packet of capsule is arranged, checking whether the data that comprise said OS emergency event are arranged in the said packet; Determination module is used for when finding the data that comprise said OS emergency event are arranged, and whether the packet that defines capsule is effectively permanent to said OS; Acquisition module is used for having the packet of capsule to list in system's table finding that the said packet that capsule arranged is permanent effectively the time to said OS with said; And reseting module, be used for system reset.
Wherein, when said OS emergency event took place, said OS can expand the related service of the said capsule that defines in the fixed interface UEFI standard and makes up said capsule through calling unification, and said capsule is sent to firmware firmware.
Description of drawings
When combining accompanying drawing to read, can understand the present invention better according to following detailed description.Should be emphasized that according to the standard practices in the industry, various parts are not drawn in proportion.In fact, in order clearly to discuss, the size of various parts can be by any increase or minimizing
Fig. 1 shows the basic flow sheet according to OS emergency event data processing function under the OS of exemplary embodiment of the present invention;
Fig. 2 shows the collection mechanism according to the OS panic data of exemplary embodiment of the present invention; And
Fig. 3 shows the basic hardware layout according to exemplary embodiment of the present invention.
Embodiment
For the different parts of embodiment of the present invention, below describe many various embodiment or example are provided.The particular example of below describing element and layout is to simplify the present invention.Certainly these only are that example does not plan to limit.Moreover; First parts are formed on and can comprise on second parts that wherein first and second parts are with the embodiment of direct contact formation in below describing; And can comprise that also wherein additional parts formation is inserted into the embodiment in first and second parts, make first and second parts directly not contact.For the sake of simplicity with clear, can be at random with the various parts of different plotted.
Under traditional BIOS, when the system failure, can only be through calling the Video service of INT10; Mistake and some relevant information are printed to screen; And contrast traditional startup method, also there is clean boot in different mechanism, the UEFI normalized definition operating system and the platform firmware interface of linking up; This structure has comprised the relevant information of platform, and the relevant startup that can call under the OS and resident service.These interfaces and service provide a kind of mechanism.Operating system can become capsule with data encapsulation, and passes to platform firmware.
The present invention's technical scheme thinking substantially is following:
This programme is through the system bios aspect based on the UEFI framework; When Panic is appearred in the OS of node with OS panic data; Be packaged into packet, give long-range supervisor console through procotol with packet, perhaps the resident service under operating system sends the data to system bios; Realization is to the collection of node OS panic data, for the location of problem provides the necessary base data.
To combine accompanying drawing to specifically describe technical scheme of the present invention below.
Fig. 1 shows the basic flow sheet according to OS emergency event data processing function under the OS of exemplary embodiment of the present invention.
Be in the state (140) that OS starts operation at each node; After OS Panic appears in OS; The corresponding treatment progress of system (135) is handled; Treatment progress (135) goes to check the packet whether capsule is arranged, if there is not to find to have the packet of capsule, just directly withdraws from treatment progress; If find to have the packet of capsule, in the inspection packet whether the data that comprise OS panic are arranged, to the data packet format that OS panic is arranged, carry out particular processing (such as, sign).
Can discern these data this moment if desired under OS; And these data are also handled afterwards, be to OS effectively (138) with the capsule package identification, and list capsule in system's table (141); The panic Data Identification with finish dealing with after, with system reset (132).At this moment OS panic data have been gathered completion, after the system reset, again with the panic data capsule that collects, analyze and show and carry out fault diagnosis and analysis.
Further, Fig. 2 shows the collection mechanism according to the OS panic data of exemplary embodiment of the present invention.That is, Fig. 2 has highlighted the key step of gathering OS panic data:
As OS during in normal operation (208), system has unusually, the generation of OS Panic incident; At this moment OS makes up capsule (defining in the UEFI standard) through the related service of calling capsule, and capsule is sent to firmware, and the service that makes up capsule becomes capsule with data encapsulation; After firmware is notified; Through updatedcapsule (), data are recorded in the nonvolatile medium (such as, the Flash flash memory); System is after reboot again like this, and the panic data are still available.
The Panic data acquisition determines whether the needs resetting system according to the system design needs after accomplishing.Resetting system if desired, system will jump to reseting vector reset vector (132), and executive system resets.After the system reset, whether operating system decides last panic data available through the check system table, if data can be used, normal OS reads the data of last panic, and with data analysis information, shows and carry out fault diagnosis.
In addition, Fig. 3 shows the basic hardware layout according to exemplary embodiment of the present invention.
Microcontroller (110) passes through spi bus; Visit non-volatile storage medium flash memory; OS Panic data with certain format on the SPI flash memory; Microcontroller (110) adopts the mode of OOB to obtain the data of OS panic like this, and microcontroller hangs under the south bridge ICH through the PCIe bus, and microcontroller and system processor are through bus communications such as SMBUS.
After system processor was hung up, microcontroller detected after system hung up through heartbeat signal, and the visit flash memory gets access to the data of capsule, and OS panic data are sent to management node through NIC.After management node is received data, data are recorded in the database, supply consequent malfunction analysis and location.
In general, the invention provides the collecting method of emergency event among a kind of operating system OS, may further comprise the steps: start the operating system; When the OS emergency event takes place when; Whether inspection has the packet of capsule capsule, if find to have the packet of capsule, checks then whether the data that comprise the OS emergency event are arranged in the packet; If find to have the data that comprise the OS emergency event; Whether the packet that then defines capsule is effectively permanent to OS, if find have the packet of capsule forever effective to OS, then will have the packet of capsule to list in system's table; And with system reset.
Preferably, before the step that starts the operating system, this method also comprises: handle the data of last emergency event, and analytical information associated with the data shown carry out fault diagnosis.
Preferably, when the OS emergency event took place, OS can expand the related service of the capsule that defines in the fixed interface UEFI standard and makes up capsule through calling unification, and capsule is sent to firmware firmware.
Preferably, the related service that makes up capsule becomes capsule with data encapsulation, and firmware is recorded in capsule in the storage medium.
Preferably, storage medium is a non-volatile memory medium.
In addition, the present invention also provides the data collector of emergency event among a kind of operating system OS, comprising: start module, be used to start the operating system; The first inspection module is used for when the OS emergency event takes place, and whether inspection has the packet of capsule capsule; Second detection module is used for when finding the packet of capsule is arranged, and in the inspection packet whether the data that comprise the OS emergency event is arranged; Determination module; Be used for when finding the data that comprise the OS emergency event are arranged, whether the packet that defines capsule is effectively permanent to OS, acquisition module; Be used at the packet of finding to have capsule forever effectively the time, will have the packet of capsule to list in system's table OS; And reseting module, be used for system reset.
Preferably, when the OS emergency event took place, OS can expand the related service of the capsule that defines in the fixed interface UEFI standard and makes up capsule through calling unification, and capsule is sent to firmware firmware.
Discuss the parts of some embodiment above, made those of ordinary skills can understand various aspects of the present invention better.It will be understood by those skilled in the art that can use at an easy rate the present invention design or change as the basis other be used to reach with here the identical purpose of the embodiment that introduces and/or realize the processing and the structure of same advantage.Those of ordinary skills should be appreciated that also this equivalent constructions does not deviate from the spirit and scope of the present invention, and under the situation that does not deviate from the spirit and scope of the present invention, can carry out multiple variation, replacement and change.

Claims (7)

1. the collecting method of emergency event among the operating system OS is characterized in that, may further comprise the steps:
Start the operating system;
When the OS emergency event took place, whether inspection had the packet of capsule capsule,
If find to have the packet of capsule, check then whether the data that comprise said OS emergency event are arranged in the said packet,
If finding has the data that comprise said OS emergency event, whether the packet that then defines capsule is effectively permanent to said OS,
If finding saidly has the packet of capsule forever effective to said OS, then there is the packet of capsule to list in system's table with said; And
With system reset.
2. collecting method according to claim 1 is characterized in that, before the step that starts said operating system, said method also comprises:
Handle the data of last emergency event, and will the analytical information relevant show and carry out fault diagnosis with said data.
3. collecting method according to claim 1; It is characterized in that; When said OS emergency event takes place; Said OS can expand the related service of the said capsule that defines in the fixed interface UEFI standard and makes up said capsule through calling unification, and said capsule is sent to firmware firmware.
4. collecting method according to claim 3 is characterized in that, the said related service that makes up said capsule becomes said capsule with said data encapsulation, and said firmware is recorded in said capsule in the storage medium.
5. collecting method according to claim 4 is characterized in that, said storage medium is a non-volatile memory medium.
6. the data collector of emergency event among the operating system OS is characterized in that, comprising:
Start module, be used to start the operating system;
The first inspection module is used for when the OS emergency event takes place, and whether inspection has the packet of capsule capsule;
Second detection module is used for when finding the packet of capsule is arranged, checking whether the data that comprise said OS emergency event are arranged in the said packet;
Determination module is used for when finding the data that comprise said OS emergency event are arranged, and whether the packet that defines capsule is effectively permanent to said OS;
Acquisition module is used for having the packet of capsule to list in system's table finding that the said packet that capsule arranged is permanent effectively the time to said OS with said; And
Reseting module is used for system reset.
7. data collector according to claim 6; It is characterized in that; When said OS emergency event takes place; Said OS can expand the related service of the said capsule that defines in the fixed interface UEFI standard and makes up said capsule through calling unification, and said capsule is sent to firmware firmware.
CN2011104557862A 2011-12-31 2011-12-31 Method and device for collecting data of emergency event in operating system (OS) Pending CN102567550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104557862A CN102567550A (en) 2011-12-31 2011-12-31 Method and device for collecting data of emergency event in operating system (OS)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104557862A CN102567550A (en) 2011-12-31 2011-12-31 Method and device for collecting data of emergency event in operating system (OS)

Publications (1)

Publication Number Publication Date
CN102567550A true CN102567550A (en) 2012-07-11

Family

ID=46412948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104557862A Pending CN102567550A (en) 2011-12-31 2011-12-31 Method and device for collecting data of emergency event in operating system (OS)

Country Status (1)

Country Link
CN (1) CN102567550A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207797A (en) * 2013-03-15 2013-07-17 南京工业大学 Capsule type custom-made updating method based on unified extensible firmware interface firmware system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025709A (en) * 2006-02-22 2007-08-29 联想(北京)有限公司 System and method for obtaining fault in-situ information for computer operating system
US20090006827A1 (en) * 2007-06-26 2009-01-01 Rothman Michael A Firmware Processing for Operating System Panic Data
US20090327679A1 (en) * 2008-04-23 2009-12-31 Huang David H Os-mediated launch of os-independent application
US20100082932A1 (en) * 2008-09-30 2010-04-01 Rothman Michael A Hardware and file system agnostic mechanism for achieving capsule support
CN102147763A (en) * 2010-02-05 2011-08-10 中国长城计算机深圳股份有限公司 Method, system and computer for recording weblog

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025709A (en) * 2006-02-22 2007-08-29 联想(北京)有限公司 System and method for obtaining fault in-situ information for computer operating system
US20090006827A1 (en) * 2007-06-26 2009-01-01 Rothman Michael A Firmware Processing for Operating System Panic Data
US20090327679A1 (en) * 2008-04-23 2009-12-31 Huang David H Os-mediated launch of os-independent application
US20100082932A1 (en) * 2008-09-30 2010-04-01 Rothman Michael A Hardware and file system agnostic mechanism for achieving capsule support
CN102147763A (en) * 2010-02-05 2011-08-10 中国长城计算机深圳股份有限公司 Method, system and computer for recording weblog

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207797A (en) * 2013-03-15 2013-07-17 南京工业大学 Capsule type custom-made updating method based on unified extensible firmware interface firmware system
CN103207797B (en) * 2013-03-15 2013-11-27 南京工业大学 Capsule type custom-made updating method based on unified extensible firmware interface firmware system

Similar Documents

Publication Publication Date Title
CN105938450B (en) The method and system that automatic debugging information is collected
CN106648958B (en) Basic input output system replys management system and its method and program product
US8069371B2 (en) Method and system for remotely debugging a hung or crashed computing system
CN102419803B (en) Method, system and device for searching and killing computer virus
WO2016197737A1 (en) Self-check processing method, apparatus and system
US20170212815A1 (en) Virtualization substrate management device, virtualization substrate management system, virtualization substrate management method, and recording medium for recording virtualization substrate management program
CN116204933B (en) Method for isolating PCIe network card based on jailhouse under ARM64 architecture
WO2012155707A1 (en) Preventing data loss during reboot and logical storage resource management device
CN104216771A (en) Restarting method and device for software program
CN106997313B (en) Signal processing method and system of application program and terminal equipment
CN116340053A (en) Log processing method, device, computer equipment and medium for system crash
US11226755B1 (en) Core dump in a storage device
CN110851334A (en) Flow statistical method, electronic device, system and medium
CN106227540A (en) Obtain the methods, devices and systems of displaying information on screen
JP6337437B2 (en) Information processing apparatus, information processing system, and program
CN102567550A (en) Method and device for collecting data of emergency event in operating system (OS)
CN114765051A (en) Memory test method and device, readable storage medium and electronic equipment
CN104618191B (en) Communication fault detection method and device between a kind of host and naked memory block
US20240046720A1 (en) Vehicle-mounted information processing apparatus and vehicle-mounted information processing method
CN113064750B (en) Tracking method, device and medium for BIOS log information
CN106484523B (en) A kind of managing hardware device method and device thereof
CN114070755B (en) Virtual machine network flow determination method and device, electronic equipment and storage medium
CN109344032A (en) A kind of monitoring method and device
JP5163180B2 (en) Device controller
CN107168815A (en) A kind of method for collecting hardware error message

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120711