CN107608813A - A kind of method that failure is automatically analyzed based on linux operation system informations - Google Patents
A kind of method that failure is automatically analyzed based on linux operation system informations Download PDFInfo
- Publication number
- CN107608813A CN107608813A CN201710827649.4A CN201710827649A CN107608813A CN 107608813 A CN107608813 A CN 107608813A CN 201710827649 A CN201710827649 A CN 201710827649A CN 107608813 A CN107608813 A CN 107608813A
- Authority
- CN
- China
- Prior art keywords
- failure
- operation system
- information
- system informations
- diagnosis rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Test And Diagnosis Of Digital Computers (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention is more particularly directed to a kind of method that failure is automatically analyzed based on linux operation system informations.This automatically analyzes the method for failure based on linux operation system informations, obtains linux operation system informations first, and form diagnosis rule storehouse according to different faults classification and trouble unit;Diagnosis rule in diagnosis rule storehouse automatically analyzes to operation system information, after corresponding diagnosis rule is matched, to description and the fault resolution of ging wrong, and preserves analysis result.This automatically analyzes the method for failure based on linux operation system informations, obtain linux operation system informations and a diagnosis rule storehouse is formed according to the rule and treating method of fault routine, when linux operating systems break down, the information checked in diagnosis rule storehouse can find corresponding solution, substantially increase the efficiency of malfunction elimination.
Description
Technical field
It is more particularly to a kind of to be divided automatically based on linux operation system informations the present invention relates to Computer Applied Technology field
The method for analysing failure.
Background technology
With the development in epoch, the improvement of people's living standards, the life style and working method of people are all become
Change, computer has become equipment irreplaceable in people's daily life.
User can be handled official business using computer application software, operate computer.And application software is in operating system
Support it is lower could run, operating system is the interface of user and computer, while and computer hardware and other software connect
Mouthful.The relevant data of operating system can be supplied to user to be used to analyze to solve the problems, such as.
But there is a lot, operation system information because operating system component is relatively complicated, the reason for caused failure
Enormous amount.When computer breaks down, it is necessary to which technical staff checks that operation system information is analyzed manually, find corresponding
Fault message, solve the failure problems occurred, therefore technical staff wants rapid to determine that failure cause is extremely difficult.
The substantial amounts of operation system information of manual analysis, not only wastes time and energy expensive, and efficiency is low.For this feelings
Condition, the present invention devise a kind of method that failure is automatically analyzed based on linux operation system informations.
The content of the invention
The defects of present invention is in order to make up prior art, there is provided a kind of simple efficiently based on linux operating systems letter
Cease the method for automatically analyzing failure.
The present invention is achieved through the following technical solutions:
A kind of method that failure is automatically analyzed based on linux operation system informations, it is characterised in that comprise the following steps:
(1)Obtain linux operation system informations;
(2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;
(3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching
After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disk letter
Breath, activation bit, network interface card information, BMC information and RAID information.
The CPU information, which is collected, includes summary info and details, uses lscpu orders, dmidecode-t
Processor orders and cat/proc/cpuinfo orders;The memory information is collected using free orders, dmidecode
- t memory orders and cat/proc/meminfo orders;The BIOS information is collected is ordered using dmidecode-t bios
Order;The disc information is collected using lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The drive
Dynamic information uses lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC information is received
Collection uses ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then adopt
With random forests algorithm, automatic identification failure simultaneously excavates phenomenon of the failure and the relation of diagnosis rule, and the event to automatically identifying
Barrier carries out experts' evaluation, and effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refine
Be out of order rule base field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, day
Will details, keyword, log path, problem description and solution.
When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event, parsing are read
Mcelog, position failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons,
Allot corresponding slot Information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function
Call stack, analyse in depth failure cause.
Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, by more
Decision tree is voted phenomenon of the failure, failure judgement, and takes corresponding solution.
The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Failure portion
Part is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/
log/mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, is changed after confirming specific core position
Internal memory.
Beneficial effects of the present invention:This automatically analyzes the method for failure based on linux operation system informations, obtains linux
Operation system information simultaneously forms a diagnosis rule storehouse according to the rule and treating method of fault routine, when linux operating systems
During failure, the information checked in diagnosis rule storehouse can find corresponding solution, substantially increase malfunction elimination
Efficiency.
Brief description of the drawings
Accompanying drawing 1 automatically analyzes the method schematic diagram of failure for the present invention based on linux operation system informations.
Embodiment
In order that technical problems, technical solutions and advantages to be solved are more clearly understood, tie below
Drawings and examples are closed, the present invention will be described in detail.It should be noted that specific embodiment described herein is only used
To explain the present invention, it is not intended to limit the present invention.
This automatically analyzes the method for failure based on linux operation system informations, comprises the following steps:
(1)Obtain linux operation system informations;
(2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;
(3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching
After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disk letter
Breath, activation bit, network interface card information, BMC information and RAID information.
The CPU information, which is collected, includes summary info and details, uses lscpu orders, dmidecode-t
Processor orders and cat/proc/cpuinfo orders;The memory information is collected using free orders, dmidecode
- t memory orders and cat/proc/meminfo orders;The BIOS information is collected is ordered using dmidecode-t bios
Order;The disc information is collected using lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The drive
Dynamic information uses lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC information is received
Collection uses ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then adopt
With random forests algorithm, automatic identification failure simultaneously excavates phenomenon of the failure and the relation of diagnosis rule, and the event to automatically identifying
Barrier carries out experts' evaluation, and effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refine
Be out of order rule base field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, day
Will details, keyword, log path, problem description and solution.
When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event, parsing are read
Mcelog, position failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons,
Allot corresponding slot Information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function
Call stack, analyse in depth failure cause.
Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, by more
Decision tree is voted phenomenon of the failure, failure judgement, and takes corresponding solution.
The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Failure portion
Part is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/
log/mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, is changed after confirming specific core position
Internal memory.
Claims (9)
- A kind of 1. method that failure is automatically analyzed based on linux operation system informations, it is characterised in that comprise the following steps:(1)Obtain linux operation system informations;(2)Diagnosis rule storehouse is formed according to different faults classification and trouble unit;(3)Diagnosis rule in diagnosis rule storehouse is automatically analyzed to operation system information, and failure is corresponded to when matching After rule, to description and the fault resolution of ging wrong, and analysis result is preserved.
- 2. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(1)In, linux operation system informations include CPU information, memory information, BIOS information, disc information, driving letter Breath, network interface card information, BMC information and RAID information.
- 3. the method according to claim 2 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The CPU information, which is collected, includes summary info and details, is ordered using lscpu orders, dmidecode-t processor Order and cat/proc/cpuinfo orders;The memory information is collected is ordered using free orders, dmidecode-t memory Order and cat/proc/meminfo orders;The BIOS information is collected and uses dmidecode-t bios orders;The disk Information uses lsblk, lsscsi, df-h, mount, fdisk-l, smartctl orders;The activation bit, which is collected, to be made With lsmod orders;The network interface card information is collected and uses ifconfig, lspci order;The BMC informations use Ipmitool orders;The RAID information is collected specifies instrument to be collected by different type RAID.
- 4. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(2)In, fault message and solution are obtained at any time, extract diagnosis rule storehouse field;Then using random gloomy Woods algorithm, automatic identification failure simultaneously excavate phenomenon of the failure and the relation of diagnosis rule, and the failure to automatically identifying is carried out specially Family's evaluation, effective phenomenon of the failure and processing scheme are generated into diagnosis rule, are stored in diagnosis rule storehouse.
- 5. the method according to claim 4 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: By curstomer's site, research and development department, the fault message and solution that test organization and operation maintenance personnel obtain, refinement is out of order rule Then storehouse field;Meanwhile the data in training set are accurately positioned specific equipment, failure cause is analysed in depth.
- 6. the method that failure is automatically analyzed based on linux operation system informations according to claim 4 or 5, its feature are existed In:Diagnosis rule storehouse field includes machine models, operating system, fault category, trouble unit, daily record rank, and daily record is detailed Thin information, keyword, log path, problem description and solution.
- 7. the method according to claim 5 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: When data in training set are accurately positioned CPU and memory failure, CPU events and internal memory event are read, parses mcelog, positioning Failure CPU and core position;PCIE failures are positioned, read PCIE events, according to the machine silk-screen table of comparisons, match corresponding insert Groove information;CallTrace failure error-reporting routine sections are positioned, analyze CallTrace event logs, excavate function call stack, deeply Analyzing failure cause.
- 8. the method according to claim 4 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: Using random forests algorithm, the forest being made up of decision tree is generated, merger processing is carried out to fault message, passes through more decision trees Phenomenon of the failure is voted, failure judgement, and take corresponding solution.
- 9. the method according to claim 1 that failure is automatically analyzed based on linux operation system informations, it is characterised in that: The step(3)In, when occurring EMS memory error in operation system information, fault category is system;Trouble unit is Memory;Daily record rank is critical;Keyword is Memory Controller, Err;Log path is /var/log/ mcelog;Problem description is Memory Controller Hub failure;Solution method is memory failure, confirm to change behind specific core position in Deposit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710827649.4A CN107608813A (en) | 2017-09-14 | 2017-09-14 | A kind of method that failure is automatically analyzed based on linux operation system informations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710827649.4A CN107608813A (en) | 2017-09-14 | 2017-09-14 | A kind of method that failure is automatically analyzed based on linux operation system informations |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107608813A true CN107608813A (en) | 2018-01-19 |
Family
ID=61063749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710827649.4A Pending CN107608813A (en) | 2017-09-14 | 2017-09-14 | A kind of method that failure is automatically analyzed based on linux operation system informations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107608813A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189638A (en) * | 2018-08-20 | 2019-01-11 | 郑州云海信息技术有限公司 | A kind of GPU driving detection method, device, terminal and storage medium |
CN109947585A (en) * | 2019-03-13 | 2019-06-28 | 西安易朴通讯技术有限公司 | The processing method and processing device of PCIE device failure |
CN111694804A (en) * | 2019-03-13 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Troubleshooting method and device |
TWI726469B (en) * | 2019-10-31 | 2021-05-01 | 宏碁股份有限公司 | Method and device for automatically acquiring status information |
CN113031991A (en) * | 2021-04-13 | 2021-06-25 | 南京大学 | Remote self-adaptive upgrading method and device for embedded system |
WO2023193388A1 (en) * | 2022-04-08 | 2023-10-12 | 苏州浪潮智能科技有限公司 | Method and apparatus for fault locating during power supply process of storage system, and medium |
CN118643000A (en) * | 2024-08-14 | 2024-09-13 | 苏州元脑智能科技有限公司 | Method for generating configuration information table of PCIe port of server, method and device for sending configuration information table |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833497A (en) * | 2010-03-30 | 2010-09-15 | 山东高效能服务器和存储研究院 | Computer fault management system based on expert system method |
CN102081562A (en) * | 2009-11-30 | 2011-06-01 | 华为技术有限公司 | Equipment diagnosis method and system |
CN103198000A (en) * | 2013-04-02 | 2013-07-10 | 浪潮电子信息产业股份有限公司 | Method for positioning faulted memory in linux system |
CN103699489A (en) * | 2014-01-03 | 2014-04-02 | 中国人民解放军装甲兵工程学院 | Software remote fault diagnosis and repair method based on knowledge base |
CN104155596A (en) * | 2014-08-12 | 2014-11-19 | 北京航空航天大学 | Artificial circuit fault diagnosis system based on random forest |
CN106383760A (en) * | 2016-09-19 | 2017-02-08 | 郑州云海信息技术有限公司 | Computer fault management method and apparatus |
-
2017
- 2017-09-14 CN CN201710827649.4A patent/CN107608813A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102081562A (en) * | 2009-11-30 | 2011-06-01 | 华为技术有限公司 | Equipment diagnosis method and system |
CN101833497A (en) * | 2010-03-30 | 2010-09-15 | 山东高效能服务器和存储研究院 | Computer fault management system based on expert system method |
CN103198000A (en) * | 2013-04-02 | 2013-07-10 | 浪潮电子信息产业股份有限公司 | Method for positioning faulted memory in linux system |
CN103699489A (en) * | 2014-01-03 | 2014-04-02 | 中国人民解放军装甲兵工程学院 | Software remote fault diagnosis and repair method based on knowledge base |
CN104155596A (en) * | 2014-08-12 | 2014-11-19 | 北京航空航天大学 | Artificial circuit fault diagnosis system based on random forest |
CN106383760A (en) * | 2016-09-19 | 2017-02-08 | 郑州云海信息技术有限公司 | Computer fault management method and apparatus |
Non-Patent Citations (4)
Title |
---|
51CTO博客: "dmidecode lspci lsblk 查看系统信息 cpu和内存硬盘信息", 《HTTPS://BLOG.51CTO.COM/TENDERRAIN/1875895》 * |
CSDN: "linux查看raid和硬件信息", 《HTTPS://BLOG.CSDN.NET/LIUYUEHUI110/ARTICLE/DETAILS/43149329》 * |
博客园: "Linux查看系统硬件信息(实例详解)", 《HTTPS://WWW.CNBLOGS.COM/GGJUCHENG/ARCHIVE/2013/01/14/2859613.HTML》 * |
豆瓣: "Linux培训之硬件信息命令大全", 《HTTPS://WWW.DOUBAN.COM/NOTE/533880159/》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189638A (en) * | 2018-08-20 | 2019-01-11 | 郑州云海信息技术有限公司 | A kind of GPU driving detection method, device, terminal and storage medium |
CN109947585A (en) * | 2019-03-13 | 2019-06-28 | 西安易朴通讯技术有限公司 | The processing method and processing device of PCIE device failure |
CN111694804A (en) * | 2019-03-13 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Troubleshooting method and device |
CN111694804B (en) * | 2019-03-13 | 2023-05-26 | 阿里巴巴集团控股有限公司 | Fault checking method and device |
TWI726469B (en) * | 2019-10-31 | 2021-05-01 | 宏碁股份有限公司 | Method and device for automatically acquiring status information |
CN113031991A (en) * | 2021-04-13 | 2021-06-25 | 南京大学 | Remote self-adaptive upgrading method and device for embedded system |
CN113031991B (en) * | 2021-04-13 | 2023-11-17 | 南京大学 | Remote self-adaptive upgrading method and device for embedded system |
WO2023193388A1 (en) * | 2022-04-08 | 2023-10-12 | 苏州浪潮智能科技有限公司 | Method and apparatus for fault locating during power supply process of storage system, and medium |
CN118643000A (en) * | 2024-08-14 | 2024-09-13 | 苏州元脑智能科技有限公司 | Method for generating configuration information table of PCIe port of server, method and device for sending configuration information table |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107608813A (en) | A kind of method that failure is automatically analyzed based on linux operation system informations | |
US8386854B2 (en) | Automatic analysis of log entries through use of clustering | |
CN111209131A (en) | Method and system for determining fault of heterogeneous system based on machine learning | |
CN107644256A (en) | A kind of method that diagnosis rule storehouse is formed based on machine learning mode | |
CN107301120B (en) | Method and device for processing unstructured log | |
CN104135387A (en) | Network management data processing visual monitoring method based on meta-model topology | |
CN109710518A (en) | Script checking method and device | |
CN113672456A (en) | Modular self-monitoring method, system, terminal and storage medium of application platform | |
CN107870850A (en) | A kind of efficient the Internet, applications log system | |
CN112817853A (en) | Automatic test method, system and electronic equipment | |
CN110765325A (en) | Operation and maintenance analysis method and system of CEPH distributed storage system | |
CN110855461A (en) | Log analysis method based on association analysis and rule base | |
CN112068981A (en) | Knowledge base-based fault scanning recovery method and system in Linux operating system | |
CN107679159B (en) | Method and device for generating fault diagnosis question response, server and storage medium | |
CN117973347A (en) | Automatic traceability report automatic generation method and system based on automatic template filling technology | |
CN117669484A (en) | Chip simulation log checking method, device and readable medium | |
CN107918573A (en) | The automatic analysis system of SAS card failure information in a kind of Linux kernel daily records | |
CN107562593A (en) | A kind of automated testing method and system for verifying internal memory ECC functions | |
CN112749079A (en) | Defect classification method and device for software test and computing equipment | |
CN116069628A (en) | Intelligent-treatment software automatic regression testing method, system and equipment | |
CN112256830B (en) | Equipment investigation information acquisition method and device and equipment fault investigation system | |
CN113220585A (en) | Automatic fault diagnosis method and related device | |
CN114238283A (en) | Method, medium and equipment for analyzing MYSQL database transaction | |
CN116938675A (en) | Log abnormality detection positioning method and device and electronic equipment | |
CN113900902A (en) | Log processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180119 |
|
RJ01 | Rejection of invention patent application after publication |