CN110888753A - Software fault positioning method and system - Google Patents
Software fault positioning method and system Download PDFInfo
- Publication number
- CN110888753A CN110888753A CN201911091979.7A CN201911091979A CN110888753A CN 110888753 A CN110888753 A CN 110888753A CN 201911091979 A CN201911091979 A CN 201911091979A CN 110888753 A CN110888753 A CN 110888753A
- Authority
- CN
- China
- Prior art keywords
- key data
- software
- reserved memory
- memory
- tracked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a software fault positioning method and a system, which relate to the field of software fault diagnosis and comprise the following steps: setting a reserved memory of an operating system; screening key data of the process of the software to be detected according to the operation scene, and determining the key data to be tracked and the acquisition frequency thereof; extracting the key data to be tracked, storing the key data in the reserved memory, and positioning the fault of the software by using the key data stored in the reserved memory. The invention can record various data for analysis in the software execution process with extremely low expense without calling a given debugging library by software.
Description
Technical Field
The invention relates to the field of software fault diagnosis, in particular to a software fault positioning method and system.
Background
Under the contribution of the whole open source boundary, the Linux operating system provides some fault location means, such as perf, ftrace and other tools, wherein perf is a Linux performance analysis tool, and ftrace is used for helping developers to know the runtime behavior of a Linux kernel so as to perform fault debugging or performance analysis. The development of the tool mainly faces to an x86 system, the operating environment of the tool is characterized by strong performance and rich storage resources, such as a high-performance server, so that the capability of positioning problems is mainly considered during the development of related tools, and the influence on the performance is not the direction of main concern; however, the telecommunication equipment usually uses embedded processors, the processing capacity and the storage resource of the embedded processors are limited, and even in a development scene, the relevant tools are difficult to operate, and the fault location by using the embedded processors is not feasible.
Along with the development of telecommunication equipment, the complexity of software of the telecommunication equipment is increased by geometric progression, and the introduction of CGL (telecom operator Linux operating system) provides basic guarantee for the reliability of the telecommunication equipment, but the software and hardware problems which are not detected and positioned are always latent when the actual telecommunication equipment runs. In addition, a debugging version and a release version exist in software, the current network running equipment usually uses the release version due to the equipment volume and cost constraint of the telecommunication equipment, and the debugging information does not exist under the release version, so that the fault diagnosis capability of the current network software and hardware is further weakened, and even if the debugging version is put into engineering use, the operation efficiency is adversely affected due to the expense of the debugging version.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a software fault positioning method which can record various data for analysis in the software execution process with extremely low overhead without calling a given debugging library by software.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a method for locating software faults, the method comprising the steps of:
setting a reserved memory of an operating system;
screening key data of the process of the software to be detected according to the operation scene, and determining the key data to be tracked and the acquisition frequency thereof;
extracting the key data to be tracked, storing the key data in the reserved memory, and positioning the fault of the software by using the key data stored in the reserved memory.
On the basis of the technical scheme, the method further comprises the step of compressing the key data stored in the reserved memory by using a compression algorithm.
On the basis of the above technical solution, setting the reserved memory of the operating system specifically includes:
and determining the size of the reserved memory according to the actual memory of the operating system and the scale of the software to be detected.
On the basis of the technical proposal, the device comprises a shell,
the key data comprises a process number, a process memory space using condition, a process scheduling condition, a process sub-thread quantity, a sub-thread scheduling condition, a signal sent by the process and/or a signal received by the process, a switch is arranged for tracking each type of key data, and the key data is screened through the switch.
On the basis of the technical scheme, the method for determining the key data to be tracked by screening the key data of the process of the software to be detected according to the operation scene specifically comprises the following steps:
in a research and development scene, turning on all switches for tracking key data;
and under the integrated test, the pilot test and the current network scene, opening a corresponding key data tracking switch according to the difficult and complicated problems to be checked.
On the basis of the technical proposal, the device comprises a shell,
in a research and development scene, the collection frequency of key data is collected once every 10 ms;
in an integrated test scene, the collection frequency of key data is collected once every 1 s;
in pilot plant and present network scenarios, the collection frequency of the key data is once every 10 s.
Meanwhile, another object of the present invention is to provide a software fault location system, which can record various types of data for analysis during software execution with very little overhead, without requiring software to call a given debug library.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a software fault locating system comprising:
the reserved memory setting module is used for setting a reserved memory of the operating system;
the process information management module is used for screening the key data of the software to be detected according to the operation scene and determining the key data to be tracked and the acquisition frequency of the key data;
the information extraction module is used for extracting key data needing to be tracked and storing the key data in the reserved memory; and
and the fault positioning module is used for positioning the fault of the software by using the key data stored in the reserved memory.
On the basis of the technical proposal, the device comprises a shell,
the software fault location system also comprises a reserved memory management module, wherein the reserved memory management module is used for compressing the key data stored in the reserved memory by using a compression algorithm.
On the basis of the technical scheme, the reserved memory setting module determines the size of the reserved memory according to the actual memory of the operating system and the scale of the software to be detected.
On the basis of the technical scheme, the key data comprise a process number, a process memory space using condition, a process scheduling condition, a process sub-thread number, a sub-thread scheduling condition, a signal sent by the process and/or a signal received by the process, the process information management module is provided with a switch for tracking each type of key data, and the switch is used for screening the key data.
Compared with the prior art, the invention has the advantages that:
according to the software fault positioning method, by tracking and extracting the key data, namely the method for generating the data is adopted, compared with the traditional method for collecting and analyzing the data after the problem exists, the method for positioning the software fault does not need to call a given debugging library by software, namely the software level does not need to be configured, and the complexity of the software is not increased.
In addition, based on the process management information and according to the operation scene, the key data of the process to be tracked is determined and tracked, and the negative influence on the performance of the whole machine can be reduced. According to the operation scene, the acquisition frequency is set for the key data to be extracted, so that the resource overhead of data recording can be controlled according to the actual situation.
Drawings
Fig. 1 is a flowchart of a software fault location method in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, the present embodiment provides a software fault locating method, including the following steps:
s1, setting a reserved memory of an operating system;
the Reserved memory (Reserved memory) in Linux means that a part of memory in a system is Reserved, and a kernel does not establish a page table for the part of memory. In this embodiment, the reserved memory of the system is set by setting the kernel start parameter memmap, the size of the reserved memory is determined according to the actual memory size of the operating system and the specific software scale, and the usage does not exceed 64M, which can meet most of the requirements, and is 16M in this embodiment.
S2, screening key data of the process of the software to be detected according to the operation scene, and determining the key data to be tracked and the acquisition frequency of the key data;
generally, when software failure is located, a control interface needs to be provided externally through a system service, so as to conveniently inform key information such as a name and a process number of software which needs to be concerned. And writing the name of the process corresponding to the software and the maximum management process number into the process management information.
In this embodiment, a process information management driver kernel state driver (process _ info _ manager. ko) is developed, and a process information management method is provided:
that is, when the data is externally output to a user, a required node is generated under/proc,/proc is a proc node, and a required node is generated under/proc, such as a node for managing the maximum number of monitoring processes, a node for managing process names, a node for enabling and disabling the function, and the like. The method for generating the nodes is to adopt a kernel interface of a Linux operating system, and the generated nodes can complete functions of informing key information such as names, process numbers and the like of software needing attention.
Specifically, for example:
(1) adding a node for managing the maximum monitoring process number:
/proc/track_process_info/max_track_number
this entry allows the maximum number of tracking processes to be written by the following method, such as:
echo 2>/proc/track_process_info/max_track_number
(2) adding a node for managing process names:
/proc/track_process_info/track_name_list
this entry allows a process name to be written to a particular trace by methods such as
echo–E“process_name_0”>
/proc/track_process_info/track_name_list
echo–E“process_name_1”>
/proc/track_process_info/track_name_list
echo–E“process_name_2”>
/proc/track_process_info/track_name_list
The number of entries of the added tracking processes is controlled by max _ track _ number, as shown in the previous example, currently, the maximum tracking of two processes is allowed, and then the third entry is not recorded, and the information of the currently tracked processes can be shown by the following method:
cat/proc/track_process_info/track_name_list
process_name_0process_name_1
(3) adding nodes which enable and close the function:
/proc/track_process_info/enable
this function is turned on and off by the following method.
echo 1>/proc/track_process_info/enable
echo 0>/proc/track_process_info/enable
The current actual state can be viewed by the following methods, for example:
cat/proc/track_process_info/enable
1
it indicates that the current function is enabled. The key data can be screened through the switch by adding nodes which enable and close the function, so that the key data which need to be concerned can be selected according to actual conditions.
After the required nodes are generated under the proc, the process corresponding to the software can be well managed and controlled, and the subsequent screening of the key data of the process is facilitated.
In the interior and the exterior, the key data of the process of the software to be detected is screened based on the process management information and according to the operation scene, and the key data to be tracked is determined. The purpose is as follows: the data monitoring system has the advantages that the actual use function is limited in different scenes, and certain negative effects can be brought to the performance of the whole machine under the condition that all data are monitored.
Specifically, as a better optional mode, the key data includes a process number, a process memory space usage condition, a process scheduling condition, a number of sub-threads of the process, a sub-thread scheduling condition, a signal sent by the process and/or a signal received by the process, a switch is set for tracking each type of key data, and the key data is screened through the switch.
Namely, when a certain type of key data needs to be tracked, the switch is opened, and when the certain type of key data does not need to be tracked, the switch is closed.
Because if all data are monitored, the performance of the whole machine is influenced to a certain extent. Therefore, the embodiment determines and tracks the key data of the process to be tracked according to the specific operation scene, and controls whether to track the key data through the switch so as to reduce the performance pressure of the whole machine.
Moreover, the collection frequency of the extracted key data can be set according to the operation scene. As a preferred embodiment, according to an operation scenario, screening key data of a process of software to be detected, and determining key data to be tracked specifically includes the following steps:
s21, starting switches for tracking all key data in a research and development scene;
because in the development scenario, when the performance level is temporarily not concerned, all data saving and tracking can be turned on, and data can be recorded at high frequency, for example, every 10 ms.
S22, under the integrated, pilot-scale and current network scenes, according to the difficult and complicated problems needing to be checked, a corresponding key data tracking switch is turned on.
There are many problems, and when a problem is solved, a relevant switch is turned on according to specific conditions, for example, a switch for monitoring a memory may be turned on if the memory is considered to overflow sometimes, or a switch related to a process may be turned on if a certain process is considered to use too many resources sometimes.
Under an integrated test scene, the performance begins to pay attention, partial functions can be opened, and the frequency of data storage is reduced, such as recording the process memory result and the CPU occupancy rate in seconds;
under the scenes of pilot plant test and current network use, the change condition needs to be observed for a long time, only certain necessary diagnostic data can be enabled, and data analysis in the future is facilitated, such as recording the occupancy rate of a memory and a CPU once every 10 seconds, and recording specific abnormal signals such as SIGTERM and the like.
And S3, extracting the key data to be tracked, storing the key data in a reserved memory, and positioning the fault of the software by using the key data stored in the reserved memory.
Specifically, information extraction, saving kernel state driver (data _ fetch.ko) and reserved memory management kernel state driver (reserved _ mem _ manager.ko) need to be developed.
Different CPUs and different kernel versions provide corresponding versions according to actual compiling environments, the realization principle is completely consistent, management of reserved memory data throughput is provided, and the internal data management form of the reserved memory is compressed by adopting an lzma (Lempel-Ziv-Markov chain-Algorithm) general compression Algorithm to reduce the actual capacity of stored data, so that more key data can be stored in the reserved memory. Meanwhile, software (show _ reserved _ mem _ data) is provided which can capture and read the reserved memory data from the operating system after entering the system.
In summary, in the embodiment, by tracking and extracting the key data, that is, by using a method for generating data, compared with the conventional method for collecting and analyzing data after a problem exists, a given debugging library does not need to be called by software, that is, configuration is not needed in a software layer, and complexity of software is not increased.
In addition, based on the process management information and according to the operation scene, the key data of the process to be tracked is determined and tracked, and the negative influence on the performance of the whole machine can be reduced. According to the operation scene, the acquisition frequency is set for the key data to be extracted, so that the resource overhead of data recording can be controlled according to the actual situation.
An embodiment of the present invention further provides a software fault locating system, including: the device comprises a reserved memory setting module, a process information management module, an information extraction module and a fault positioning module.
The reserved memory setting module is used for setting the reserved memory of the operating system. In this embodiment, the reserved memory setting module sets the reserved memory of the system by setting the kernel start parameter memmap, and determines the size of the reserved memory according to the actual memory of the operating system and the scale of the software to be detected.
And the process information management module is used for screening the key data of the software to be detected according to the operation scene and determining the key data to be tracked and the acquisition frequency thereof.
In this embodiment, the key data includes a process number, a process memory space usage condition, a process scheduling condition, a number of sub-threads of the process, a sub-thread scheduling condition, a signal sent by the process, and/or a signal received by the process, and the process information management module sets a switch for tracking each type of key data, and the switch is used for screening the key data.
Namely, when a certain type of key data needs to be tracked, the switch is opened, and when the certain type of key data does not need to be tracked, the switch is closed.
As a preferred embodiment, the process information management module is configured to:
in a research and development scene, turning on all switches for tracking key data;
and under the integrated, pilot-scale and current network scenes, opening a corresponding key data tracking switch according to the difficult and complicated problems to be checked.
According to the operation scene, the key data of the process to be tracked is determined and tracked, and the negative influence on the performance of the whole machine can be reduced. According to the operation scene, the acquisition frequency is set for the key data to be extracted, so that the resource overhead of data recording can be controlled according to the actual situation.
And the information extraction module is used for extracting the key data needing to be tracked and storing the key data in the reserved memory.
Preferably, the information extraction module is further configured to:
in a research and development scene, setting the collection frequency of key data to be collected once every 10 ms;
in an integrated test scene, setting the collection frequency of key data as once per 1 s;
in the pilot-scale test and the current network scene, the collection frequency of the key data is set to be collected every 10 s.
And the fault positioning module is used for positioning the fault of the software by using the key data stored in the reserved memory.
Furthermore, the software fault location system further comprises a reserved memory management module, wherein the reserved memory management module is used for compressing the key data stored in the reserved memory by using a compression algorithm. In the embodiment, lzma (Lempel-Ziv-Markov chain-Algorithm) general compression Algorithm is adopted for compression to reduce the actual capacity of stored data, so that more critical data can be stored in the reserved memory.
The present invention is not limited to the above-described embodiments, and it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements are also considered to be within the scope of the present invention. Those not described in detail in this specification are within the skill of the art.
Claims (10)
1. A software fault locating method is characterized by comprising the following steps:
setting a reserved memory of an operating system;
screening key data of the process of the software to be detected according to the operation scene, and determining the key data to be tracked and the acquisition frequency thereof;
extracting the key data to be tracked, storing the key data in the reserved memory, and positioning the fault of the software by using the key data stored in the reserved memory.
2. The software fault locating method of claim 1, further comprising the step of compressing critical data stored in the retention memory using a compression algorithm.
3. The software fault locating method according to claim 1, wherein setting a reserved memory of an operating system specifically includes:
and determining the size of the reserved memory according to the actual memory of the operating system and the scale of the software to be detected.
4. The software fault locating method of claim 1,
the key data comprises a process number, a process memory space using condition, a process scheduling condition, a process sub-thread quantity, a sub-thread scheduling condition, a signal sent by the process and/or a signal received by the process, a switch is arranged for tracking each type of key data, and the key data is screened through the switch.
5. The software fault locating method according to claim 4, wherein the key data of the process of the software to be detected is screened according to the operation scenario to determine the key data to be tracked, and the method specifically comprises the following steps:
in a research and development scene, turning on all switches for tracking key data;
and under the integrated test, the pilot test and the current network scene, opening a corresponding key data tracking switch according to the difficult and complicated problems to be checked.
6. The software fault locating method of claim 5, wherein:
in a research and development scene, the collection frequency of key data is collected once every 10 ms;
in an integrated test scene, the collection frequency of key data is collected once every 1 s;
in pilot plant and present network scenarios, the collection frequency of the key data is once every 10 s.
7. A software fault locating system, comprising:
the reserved memory setting module is used for setting a reserved memory of the operating system;
the process information management module is used for screening the key data of the software to be detected according to the operation scene and determining the key data to be tracked and the acquisition frequency of the key data;
the information extraction module is used for extracting key data needing to be tracked and storing the key data in the reserved memory; and
and the fault positioning module is used for positioning the fault of the software by using the key data stored in the reserved memory.
8. The software fault locating system of claim 7, further comprising a reserve memory management module to compress critical data stored in reserve memory using a compression algorithm.
9. The software fault locating system of claim 7, wherein:
and the reserved memory setting module determines the size of the reserved memory according to the actual memory of the operating system and the scale of the software to be detected.
10. The software fault locating system of claim 7, wherein: the key data comprises a process number, a process memory space using condition, a process scheduling condition, a process sub-thread quantity, a sub-thread scheduling condition, a signal sent by the process and/or a signal received by the process, the process information management module sets a switch for tracking each type of key data, and the switch is used for screening the key data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911091979.7A CN110888753A (en) | 2019-11-08 | 2019-11-08 | Software fault positioning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911091979.7A CN110888753A (en) | 2019-11-08 | 2019-11-08 | Software fault positioning method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110888753A true CN110888753A (en) | 2020-03-17 |
Family
ID=69747233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911091979.7A Pending CN110888753A (en) | 2019-11-08 | 2019-11-08 | Software fault positioning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110888753A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197621A (en) * | 2007-12-07 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for remote diagnosing and locating failure of network management system |
JP2010182237A (en) * | 2009-02-09 | 2010-08-19 | Nec Corp | System, method and program for sampling stack trace |
CN106354661A (en) * | 2016-09-13 | 2017-01-25 | 郑州云海信息技术有限公司 | Internal storage distribution method and device for storage software |
-
2019
- 2019-11-08 CN CN201911091979.7A patent/CN110888753A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101197621A (en) * | 2007-12-07 | 2008-06-11 | 中兴通讯股份有限公司 | Method and system for remote diagnosing and locating failure of network management system |
JP2010182237A (en) * | 2009-02-09 | 2010-08-19 | Nec Corp | System, method and program for sampling stack trace |
CN106354661A (en) * | 2016-09-13 | 2017-01-25 | 郑州云海信息技术有限公司 | Internal storage distribution method and device for storage software |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109284269B (en) | Abnormal log analysis method and device, storage medium and server | |
US9355003B2 (en) | Capturing trace information using annotated trace output | |
CN103544095B (en) | The method for supervising of server program and system thereof | |
CN111597089B (en) | Linux system call event acquisition and caching device and method | |
EP1579296A2 (en) | Privileged-based qualification of branch trace store data | |
CN112445686A (en) | Memory leak detection method, device and computer-readable storage medium | |
CN109542444B (en) | JAVA application monitoring method, device, server and storage medium | |
CN112445692A (en) | Case testing method and terminal | |
CN110764962B (en) | Log processing method and device | |
CN116414722B (en) | Fuzzy test processing method and device, fuzzy test system and storage medium | |
CN112612697A (en) | Software defect testing and positioning method and system based on byte code technology | |
CN110888753A (en) | Software fault positioning method and system | |
CN115114117B (en) | Data recording method and data recording device | |
CN116820932A (en) | BMC fault diagnosis method, device, equipment and medium | |
CN116610575A (en) | Software testing method and device and electronic equipment | |
GB2551574A (en) | An apparatus and method for generating and processing a trace stream indicative of instruction execution by processing circuitry | |
CN115828262A (en) | Open source component vulnerability scanning method, device, equipment and storage medium | |
CN115422008A (en) | Non-invasive process monitoring method, device, equipment and storage medium | |
CN115576816A (en) | Linux operating system-based android application function automatic testing method and device | |
CN113806195A (en) | Data processing method, device, equipment, system and storage medium | |
KR100428712B1 (en) | A Tracepoint Setting Method for Non-Stop Debugging of Multi-task Programs | |
CN118550755B (en) | Heap memory boundary crossing positioning method, device, equipment and medium | |
CN118113553A (en) | Memory file system monitoring method and computing device | |
CN116820875A (en) | Method and system for dynamically acquiring use condition of program function stack memory | |
CN115048236A (en) | Signal processing method and processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200317 |
|
RJ01 | Rejection of invention patent application after publication |