CN107608813A - A kind of method that failure is automatically analyzed based on linux operation system informations - Google Patents

A kind of method that failure is automatically analyzed based on linux operation system informations Download PDF

Info

Publication number
CN107608813A
CN107608813A CN201710827649.4A CN201710827649A CN107608813A CN 107608813 A CN107608813 A CN 107608813A CN 201710827649 A CN201710827649 A CN 201710827649A CN 107608813 A CN107608813 A CN 107608813A
Authority
CN
China
Prior art keywords
failure
operation system
information
system informations
diagnosis rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710827649.4A
Other languages
Chinese (zh)
Inventor
郭美思
周国浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710827649.4A priority Critical patent/CN107608813A/en
Publication of CN107608813A publication Critical patent/CN107608813A/en
Pending legal-status Critical Current

Links

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention is more particularly directed to a kind of method that failure is automatically analyzed based on linux operation system informations.This automatically analyzes the method for failure based on linux operation system informations, obtains linux operation system informations first, and form diagnosis rule storehouse according to different faults classification and trouble unit;Diagnosis rule in diagnosis rule storehouse automatically analyzes to operation system information, after corresponding diagnosis rule is matched, to description and the fault resolution of ging wrong, and preserves analysis result.This automatically analyzes the method for failure based on linux operation system informations, obtain linux operation system informations and a diagnosis rule storehouse is formed according to the rule and treating method of fault routine, when linux operating systems break down, the information checked in diagnosis rule storehouse can find corresponding solution, substantially increase the efficiency of malfunction elimination.

Description

A kind of method that failure is automatically analyzed based on linux operation system informations
Technical field
It is more particularly to a kind of to be divided automatically based on linux operation system informations the present invention relates to Computer Applied Technology field The method for analysing failure.
Background technology
With the development in epoch, the improvement of people's living standards, the life style and working method of people are all become Change, computer has become equipment irreplaceable in people's daily life.
User can be handled official business using computer application software, operate computer.And application software is in operating system Support it is lower could run, operating system is the interface of user and computer, while and computer hardware and other software connect Mouthful.The relevant data of operating system can be supplied to user to be used to analyze to solve the problems, such as.
But there is a lot, operation system information because operating system component is relatively complicated, the reason for caused failure Enormous amount.When computer breaks down, it is necessary to which technical staff checks that operation system information is analyzed manually, find corresponding Fault message, solve the failure problems occurred, therefore technical staff wants rapid to determine that failure cause is extremely difficult.
The substantial amounts of operation system information of manual analysis, not only wastes time and energy expensive, and efficiency is low.For this feelings Condition, the present invention devise a kind of method that failure is automatically analyzed based on linux operation system informations.
The content of the invention
The defects of present invention is in order to make up prior art, there is provided a kind of simple efficiently based on linux operating systems letter Cease the method for automatically analyzing failure.
The present invention is achieved through the following technical solutions:
A kind of method that failure is automatically analyzed based on linux operation system informations, it is characterised in that comprise the following steps:
(1)Obtain linux operation system informations;
(2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;
(3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disk letter Breath, activation bit, network interface card information, BMC information and RAID information.
The CPU information, which is collected, includes summary info and details, uses lscpu orders, dmidecode-t Processor orders and cat/proc/cpuinfo orders;The memory information is collected using free orders, dmidecode - t memory orders and cat/proc/meminfo orders;The BIOS information is collected is ordered using dmidecode-t bios Order;The disc information is collected using lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The drive Dynamic information uses lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC information is received Collection uses ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then adopt With random forests algorithm, automatic identification failure simultaneously excavates phenomenon of the failure and the relation of diagnosis rule, and the event to automatically identifying Barrier carries out experts' evaluation, and effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refine Be out of order rule base field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, day Will details, keyword, log path, problem description and solution.
When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event, parsing are read Mcelog, position failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons, Allot corresponding slot Information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function Call stack, analyse in depth failure cause.
Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, by more Decision tree is voted phenomenon of the failure, failure judgement, and takes corresponding solution.
The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Failure portion Part is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/ log/mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, is changed after confirming specific core position Internal memory.
Beneficial effects of the present invention:This automatically analyzes the method for failure based on linux operation system informations, obtains linux Operation system information simultaneously forms a diagnosis rule storehouse according to the rule and treating method of fault routine, when linux operating systems During failure, the information checked in diagnosis rule storehouse can find corresponding solution, substantially increase malfunction elimination Efficiency.
Brief description of the drawings
Accompanying drawing 1 automatically analyzes the method schematic diagram of failure for the present invention based on linux operation system informations.
Embodiment
In order that technical problems, technical solutions and advantages to be solved are more clearly understood, tie below Drawings and examples are closed, the present invention will be described in detail.It should be noted that specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.
This automatically analyzes the method for failure based on linux operation system informations, comprises the following steps:
(1)Obtain linux operation system informations;
(2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;
(3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disk letter Breath, activation bit, network interface card information, BMC information and RAID information.
The CPU information, which is collected, includes summary info and details, uses lscpu orders, dmidecode-t Processor orders and cat/proc/cpuinfo orders;The memory information is collected using free orders, dmidecode - t memory orders and cat/proc/meminfo orders;The BIOS information is collected is ordered using dmidecode-t bios Order;The disc information is collected using lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The drive Dynamic information uses lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC information is received Collection uses ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then adopt With random forests algorithm, automatic identification failure simultaneously excavates phenomenon of the failure and the relation of diagnosis rule, and the event to automatically identifying Barrier carries out experts' evaluation, and effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refine Be out of order rule base field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, day Will details, keyword, log path, problem description and solution.
When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event, parsing are read Mcelog, position failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons, Allot corresponding slot Information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function Call stack, analyse in depth failure cause.
Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, by more Decision tree is voted phenomenon of the failure, failure judgement, and takes corresponding solution.
The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Failure portion Part is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/ log/mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, is changed after confirming specific core position Internal memory.

Claims (9)

  1. A kind of 1. method that failure is automatically analyzed based on linux operation system informations, it is characterised in that comprise the following steps:
    (1)Obtain linux operation system informations;
    (2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;
    (3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
  2. 2. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disc information, driving letter Breath, network interface card information, BMC information and RAID information.
  3. 3. the method according to claim 2 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The CPU information, which is collected, includes summary info and details, is ordered using lscpu orders, dmidecode-t processor Order and cat/proc/cpuinfo orders;The memory information is collected is ordered using free orders, dmidecode-t memory Order and cat/proc/meminfo orders;The BIOS information is collected and uses dmidecode-t bios orders;The disk Information uses lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The activation bit, which is collected, to be made With lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC informations use Ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
  4. 4. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then using random gloomy Woods algorithm, automatic identification failure simultaneously excavate phenomenon of the failure and the relation of diagnosis rule, and the failure to automatically identifying is carried out specially Family's evaluation, effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
  5. 5. the method according to claim 4 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refinement is out of order rule Then storehouse field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
  6. 6. the method that failure is automatically analyzed based on linux operation system informations according to claim 4 or 5, its feature are existed In:Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, and daily record is detailed Thin information, keyword, log path, problem description and solution.
  7. 7. the method according to claim 5 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event are read, parses mcelog, positioning Failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons, match corresponding insert Groove information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function call stack, deeply Analyzing failure cause.
  8. 8. the method according to claim 4 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, passes through more decision trees Phenomenon of the failure is voted, failure judgement, and take corresponding solution.
  9. 9. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Trouble unit is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/log/ mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, confirm to change behind specific core position in Deposit.
CN201710827649.4A 2017-09-14 2017-09-14 A kind of method that failure is automatically analyzed based on linux operation system informations Pending CN107608813A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710827649.4A CN107608813A (en) 2017-09-14 2017-09-14 A kind of method that failure is automatically analyzed based on linux operation system informations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710827649.4A CN107608813A (en) 2017-09-14 2017-09-14 A kind of method that failure is automatically analyzed based on linux operation system informations

Publications (1)

Publication Number Publication Date
CN107608813A true CN107608813A (en) 2018-01-19

Family

ID=61063749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710827649.4A Pending CN107608813A (en) 2017-09-14 2017-09-14 A kind of method that failure is automatically analyzed based on linux operation system informations

Country Status (1)

Country Link
CN (1) CN107608813A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189638A (en) * 2018-08-20 2019-01-11 郑州云海信息技术有限公司 A kind of GPU driving detection method, device, terminal and storage medium
CN109947585A (en) * 2019-03-13 2019-06-28 西安易朴通讯技术有限公司 The processing method and processing device of PCIE device failure
CN111694804A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Troubleshooting method and device
TWI726469B (en) * 2019-10-31 2021-05-01 宏碁股份有限公司 Method and device for automatically acquiring status information
CN113031991A (en) * 2021-04-13 2021-06-25 南京大学 Remote self-adaptive upgrading method and device for embedded system
WO2023193388A1 (en) * 2022-04-08 2023-10-12 苏州浪潮智能科技有限公司 Method and apparatus for fault locating during power supply process of storage system, and medium
CN118643000A (en) * 2024-08-14 2024-09-13 苏州元脑智能科技有限公司 Method for generating configuration information table of PCIe port of server, method and device for sending configuration information table

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833497A (en) * 2010-03-30 2010-09-15 山东高效能服务器和存储研究院 Computer fault management system based on expert system method
CN102081562A (en) * 2009-11-30 2011-06-01 华为技术有限公司 Equipment diagnosis method and system
CN103198000A (en) * 2013-04-02 2013-07-10 浪潮电子信息产业股份有限公司 Method for positioning faulted memory in linux system
CN103699489A (en) * 2014-01-03 2014-04-02 中国人民解放军装甲兵工程学院 Software remote fault diagnosis and repair method based on knowledge base
CN104155596A (en) * 2014-08-12 2014-11-19 北京航空航天大学 Artificial circuit fault diagnosis system based on random forest
CN106383760A (en) * 2016-09-19 2017-02-08 郑州云海信息技术有限公司 Computer fault management method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081562A (en) * 2009-11-30 2011-06-01 华为技术有限公司 Equipment diagnosis method and system
CN101833497A (en) * 2010-03-30 2010-09-15 山东高效能服务器和存储研究院 Computer fault management system based on expert system method
CN103198000A (en) * 2013-04-02 2013-07-10 浪潮电子信息产业股份有限公司 Method for positioning faulted memory in linux system
CN103699489A (en) * 2014-01-03 2014-04-02 中国人民解放军装甲兵工程学院 Software remote fault diagnosis and repair method based on knowledge base
CN104155596A (en) * 2014-08-12 2014-11-19 北京航空航天大学 Artificial circuit fault diagnosis system based on random forest
CN106383760A (en) * 2016-09-19 2017-02-08 郑州云海信息技术有限公司 Computer fault management method and apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
51CTO博客: "dmidecode lspci lsblk 查看系统信息 cpu和内存硬盘信息", 《HTTPS://BLOG.51CTO.COM/TENDERRAIN/1875895》 *
CSDN: "linux查看raid和硬件信息", 《HTTPS://BLOG.CSDN.NET/LIUYUEHUI110/ARTICLE/DETAILS/43149329》 *
博客园: "Linux查看系统硬件信息(实例详解)", 《HTTPS://WWW.CNBLOGS.COM/GGJUCHENG/ARCHIVE/2013/01/14/2859613.HTML》 *
豆瓣: "Linux培训之硬件信息命令大全", 《HTTPS://WWW.DOUBAN.COM/NOTE/533880159/》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189638A (en) * 2018-08-20 2019-01-11 郑州云海信息技术有限公司 A kind of GPU driving detection method, device, terminal and storage medium
CN109947585A (en) * 2019-03-13 2019-06-28 西安易朴通讯技术有限公司 The processing method and processing device of PCIE device failure
CN111694804A (en) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 Troubleshooting method and device
CN111694804B (en) * 2019-03-13 2023-05-26 阿里巴巴集团控股有限公司 Fault checking method and device
TWI726469B (en) * 2019-10-31 2021-05-01 宏碁股份有限公司 Method and device for automatically acquiring status information
CN113031991A (en) * 2021-04-13 2021-06-25 南京大学 Remote self-adaptive upgrading method and device for embedded system
CN113031991B (en) * 2021-04-13 2023-11-17 南京大学 Remote self-adaptive upgrading method and device for embedded system
WO2023193388A1 (en) * 2022-04-08 2023-10-12 苏州浪潮智能科技有限公司 Method and apparatus for fault locating during power supply process of storage system, and medium
CN118643000A (en) * 2024-08-14 2024-09-13 苏州元脑智能科技有限公司 Method for generating configuration information table of PCIe port of server, method and device for sending configuration information table

Similar Documents

Publication Publication Date Title
CN107608813A (en) A kind of method that failure is automatically analyzed based on linux operation system informations
US8386854B2 (en) Automatic analysis of log entries through use of clustering
CN111209131A (en) Method and system for determining fault of heterogeneous system based on machine learning
CN107644256A (en) A kind of method that diagnosis rule storehouse is formed based on machine learning mode
CN107301120B (en) Method and device for processing unstructured log
CN104135387A (en) Network management data processing visual monitoring method based on meta-model topology
CN109710518A (en) Script checking method and device
CN113672456A (en) Modular self-monitoring method, system, terminal and storage medium of application platform
CN107870850A (en) A kind of efficient the Internet, applications log system
CN112817853A (en) Automatic test method, system and electronic equipment
CN110765325A (en) Operation and maintenance analysis method and system of CEPH distributed storage system
CN110855461A (en) Log analysis method based on association analysis and rule base
CN112068981A (en) Knowledge base-based fault scanning recovery method and system in Linux operating system
CN107679159B (en) Method and device for generating fault diagnosis question response, server and storage medium
CN117973347A (en) Automatic traceability report automatic generation method and system based on automatic template filling technology
CN117669484A (en) Chip simulation log checking method, device and readable medium
CN107918573A (en) The automatic analysis system of SAS card failure information in a kind of Linux kernel daily records
CN107562593A (en) A kind of automated testing method and system for verifying internal memory ECC functions
CN112749079A (en) Defect classification method and device for software test and computing equipment
CN116069628A (en) Intelligent-treatment software automatic regression testing method, system and equipment
CN112256830B (en) Equipment investigation information acquisition method and device and equipment fault investigation system
CN113220585A (en) Automatic fault diagnosis method and related device
CN114238283A (en) Method, medium and equipment for analyzing MYSQL database transaction
CN116938675A (en) Log abnormality detection positioning method and device and electronic equipment
CN113900902A (en) Log processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180119

RJ01 Rejection of invention patent application after publication