CN106649039B - A kind of fault-tolerant method of C language monitoring software under embedded Linux system - Google Patents
A kind of fault-tolerant method of C language monitoring software under embedded Linux system Download PDFInfo
- Publication number
- CN106649039B CN106649039B CN201611147014.1A CN201611147014A CN106649039B CN 106649039 B CN106649039 B CN 106649039B CN 201611147014 A CN201611147014 A CN 201611147014A CN 106649039 B CN106649039 B CN 106649039B
- Authority
- CN
- China
- Prior art keywords
- monitoring
- module
- signal
- error
- mistake
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012544 monitoring process Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 43
- 239000012141 concentrate Substances 0.000 claims description 3
- 239000012792 core layer Substances 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 claims description 2
- 125000004122 cyclic group Chemical group 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000010410 layer Substances 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000009131 signaling function Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000003945 visual behavior Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/865—Monitoring of software
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The fault-tolerant method of C language monitoring software under a kind of built-in Linux environment provided by the invention, by realizing the try catch exception handling structure in similar C++ in C language, capture mistake at the first time in monitoring software, and mistake is handled, program is avoided to collapse, so that monitoring software has stronger fault-tolerance.Monitoring software can be made in the case where encountering core dumped, floating-point operation mistake, exiting mistake, guarantee that the cyclic process of monitoring programme executes always, and the position that error message, mistake can occur in error processing procedure records, in order to the analysis of mistake;The system for realizing aforesaid operations is provided simultaneously, fault-tolerant strategy and method of the system based on C language monitoring software, realize the processing to erroneous procedures during monitoring, and performance loss is small, the operating reliability for effectively increasing system avoids the complexity of hardware customization.
Description
Technical field
The invention belongs to server monitoring administrative skill fields, and in particular to C language is supervised under a kind of embedded Linux system
The method for controlling software fault-tolerant.
Background technique
Embedded Linux system has been widely used in server monitoring field at present, these monitoring softwares, when the moment
Quarter is monitored management to the operating status of server, needs to keep prolonged stable operation, and can be in own collapse
In the case of, it reruns in a relatively short period of time.
Currently, most of monitoring software, is developed using C language.But in C language, do not provide in similar C++
The exception handling of trycatch type, once the problems such as encountering certain such as core dumpeds, floating-point operation mistake, program will collapse
It beats a precipitate retreat out.Guarantee that program continues to run, current solution be monitoring software is inspected periodically, once detect prison
It controls software anomaly to terminate, just reruns the monitoring software again, this method is on the one hand complicated for operation, checks require weight every time
New edited tissue language;On the other hand, it can only be prevented by checking, and abnormal problem cannot be solved constantly.
Therefore it provides the fault-tolerant method of C language monitoring software is very necessary under a kind of embedded Linux system.
Summary of the invention
It is an object of the invention to solve the above-mentioned exception handler existing in the prior art for lacking try catch type
System, can not carry out the problems such as error handle in monitoring software, provide C language monitoring software under a kind of Embedded Linux System
Fault-tolerant method.
The present invention is achieved by the following technical solutions:
A kind of fault-tolerant method of C language monitoring software under embedded Linux system, it is characterised in that: the following steps are included:
(1) monitoring software brings into operation;
(2) whether code meets monitoring condition, (3) is entered step if met, if not satisfied, exiting monitoring;
(3) registration signal processing is carried out;Once mistake occurs for monitoring software, mistake is just taken over by signal processing flow, and
It is not to be handled by operating system, usually directly exits;
(4) signal is detected, detects whether that there are error signals, if error signal cannot be detected, executes normal monitoring
Process enters step (5), if detecting error signal, error process, go to step (2);It will test at signal
Reason process increases before supervision subjects, and when no signal carries out normal supervision subjects process, then carries out error handle when having signal.
(5) whether supervision subjects generate error signal, if generating error signal, carry out signal processing, and jump to step
Suddenly (4), if not generating error signal, go to step (2) restart monitor process.In signal processing,
Increase jumps, and program is made to jump back to the circulation process of monitoring programme.
Preferably, also carrying out logging operations when carrying out signal processing in the step 5.
Preferably, when carrying out signal processing in the step 5, also progress signal type record operation.
Preferably, also carrying out wrong time of origin record operation when carrying out error handle in the step (4).
Preferably, when carrying out error handle in the step (4), also progress global variable record operation.
A kind of system for realizing above-mentioned fault-tolerance approach, the system concentrate on the operation core of embedded Linux system
Layer, it is characterised in that: including monitoring management module and coupled process manager module, signal processing module, mistake
Processing module, the process manager module, signal processing module, error handling module are sequentially connected, in which:
(1) process manager module, the module is for realizing monitoring Period Process management, creation, tune including monitoring process
Degree, communication, so that primary process while orderly executing original logic, meets the needs of the fault-tolerant monitoring of C language;
(2) signal processing module, which detects for realizing registration signal and signal processing, registration signal detect
After mistake, program is voluntarily handled first, and (SuSE) Linux OS processing is entered when can not solve;Signal processing module will be whole
The program circuit at end jumps, and comes back in monitoring process;
(3) error handling module, the module diagnose for realizing type of error and take corresponding mistake according to pre-configuration
Processing mode completes errors repair;
(4) monitoring management module, including master control end and internal control end, master control end are supplied to the visual behaviour of user
Make interface, internal control end is interacted with master control end so that user can in master control end checking monitoring software operation state,
Fault-tolerant log is pre-configured system parameters.
Compared with prior art, the beneficial effects of the present invention are:
The fault-tolerant method of C language monitoring software under a kind of built-in Linux environment provided by the invention, by C language
The middle try catch exception handling structure realized in similar C++, captures mistake, and to mistake at the first time in monitoring software
It is handled, program is avoided to collapse, so that monitoring software has stronger fault-tolerance.Monitoring software can be made to meet
To core dumped, floating-point operation mistake, exit mistake in the case where, guarantee that the cyclic process of monitoring programme executes always, and can
The position that error message, mistake occur in error processing procedure records, in order to the analysis of mistake.It is also mentioned in scheme
System for carrying out the process design is supplied, fault-tolerant strategy and method of the system based on C language monitoring software realize pair
The processing of erroneous procedures during monitoring, performance loss is small, effectively increases the operating reliability of system, avoids hardware customization
Complexity.
In addition, the method for the present invention principle is reliable, step is simple, has very extensive application prospect.
It can be seen that compared with prior art, the present invention have substantive distinguishing features outstanding and it is significant ground it is progressive, implementation
Beneficial effect be also obvious.
Detailed description of the invention
Fig. 1 is the workflow for the method that C language monitoring software is fault-tolerant under a kind of built-in Linux environment provided by the invention
Cheng Tu.
Fig. 2 is the structural representation of C language monitoring software tolerant system under a kind of built-in Linux environment provided by the invention
Figure.
Wherein, 1- process manager module, 2- signal processing module, 3- error handling module, 4- monitoring management module, 41-
Master control end, 42- internal control end.
Specific embodiment
Present invention is further described in detail with reference to the accompanying drawing:
The fault-tolerant method of C language monitoring software, the first place to general monitoring software under a kind of embedded Linux system
Reason process does an explanation.Monitoring software (main) is the program executed according to certain clocking discipline, Infinite Cyclic.It wants
The persistence operation for guaranteeing monitoring software, needs monitoring core code (monitor) enough stabilizations, does not make mistakes.But with
The complexity of code increases, and the probability of error becomes larger, especially some mistakes sporadic, recurrence rate is low.These are wrong
It misses, if floating-point operation is divided by 0, will lead to monitoring programme error and exit.Code in following table is the code of the present embodiment
Frame.Description to carry out fault-tolerance approach of the present invention for it.
As shown in Figure 1, a kind of fault-tolerant method of C language monitoring software under embedded Linux system, comprising the following steps:
(1) monitoring software brings into operation;
(2) whether code meets monitoring condition, (3) is entered step if met, if not satisfied, exiting monitoring;
(3) registration signal processing is carried out;
(4) signal is detected, detects whether that there are error signals, if error signal cannot be detected, executes normal monitoring
Process enters step (5), if detecting error signal, error process, go to step (2);
(5) whether supervision subjects generate error signal, if generating error signal, carry out signal processing, and jump to step
Suddenly (4), if not generating error signal, go to step (2) restart monitor process.
Wherein, step (3) carries out registration signal processing, the signal function of corresponding code sample.If it is registration signal
Processing, once error, program will exit, linux will pop up an information.After registration signal processing, after mistake occurs, first
These error handles are carried out by program oneself, when oneself is not handled, can just be handled by (SuSE) Linux OS.To guarantee program
It does not exit, SIGSEGV, SIGFPE, SIGABRT registration signal should at least be handled.Signal_hdl is exactly the signal registered
Handle function.
Detection signal in step (4) corresponds to the sigsetjmp function in code sample.This is a selection structure, such as
Fruit does not detect signal, then executes normal monitoring (monitor) process;If detecting signal, error process.?
In error handle, it can recorde the time of mistake generation, in addition can recorde some global variables, with the generation of general location mistake
Position.
Signal processing operations in step (5) correspond to the signal_hdl function in code sample.Signal processing it is main
Purpose is that the program circuit of interruption is jumped (siglongjmp), comes back to monitoring process, rather than after receiving signal,
Just exit monitoring programme.Furthermore it is possible to increase journalizing in signal process function, remember to the type etc. of signal
Record, a part as bug analysis log.
As shown in Fig. 2, the present invention also provides a kind of system for realizing above-mentioned fault-tolerance approach, which concentrates on embedding
Enter the operation core layer of formula linux system, including monitoring management module 4 and coupled process manager module 1, letter
Number processing module 2, error handling module 3, the process manager module 1, signal processing module 2, error handling module 3 successively connect
It connects, in which:
(1) process manager module 1, the module is for realizing monitoring Period Process management, creation, tune including monitoring process
Degree, communication, so that primary process while orderly executing original logic, meets the needs of the fault-tolerant monitoring of C language;
(2) signal processing module 2, which detects for realizing registration signal and signal processing, registration signal detection
To after mistake, program is voluntarily handled first, and (SuSE) Linux OS processing is entered when can not solve;Signal processing module will
The program circuit of terminal jumps, and comes back in monitoring process;
(3) error handling module 3, the module diagnose for realizing type of error and take corresponding mistake according to pre-configuration
Processing mode completes errors repair;
(4) monitoring management module 4, including master control end 41 and internal control end 42, master control end 41 are supplied to user
Visual operation interface, internal control end 42 are interacted with master control end 41, so that user can check prison at master control end 41
Software operation state is controlled, fault-tolerant log is pre-configured system parameters.
The fault-tolerant method of C language monitoring software under a kind of built-in Linux environment provided by the invention, by C language
The middle try catch structure realized in similar C++, captures mistake, and handle mistake at the first time in monitoring software,
Program is avoided to collapse, so that monitoring software has stronger fault-tolerance.Can make monitoring software encounter core dumped,
Floating-point operation mistake in the case where exiting mistake, guarantees that the cyclic process of monitoring programme executes always, and can be in error handle
The position that error message, mistake occur in process records, in order to the analysis of mistake.It is additionally provided in scheme for real
The system design of existing this method, fault-tolerant strategy and method of the system based on C language monitoring software are realized to during monitoring
The processing of erroneous procedures, performance loss is small, effectively increases the operating reliability of system, avoids the complexity of hardware customization.
Above-mentioned technical proposal is one embodiment of the present invention, for those skilled in the art, at this
On the basis of disclosure of the invention application method and principle, it is easy to make various types of improvement or deformation, be not limited solely to this
Invent method described in above-mentioned specific embodiment, therefore previously described mode is only preferred, and and do not have limitation
The meaning of property.
Claims (1)
1. C language monitoring software tolerant system under a kind of embedded Linux system, the system concentrate on embedded Linux system
Operation core layer, it is characterised in that: at monitoring management module and coupled process manager module, signal
Module, error handling module are managed, the process manager module, signal processing module, error handling module are sequentially connected, in which:
(1) process manager module, the module for realizing monitoring Period Process management, lead to by creation, scheduling including monitoring process
Letter, so that primary process while orderly executing original logic, meets the needs of the fault-tolerant monitoring of C language;
(2) signal processing module, which detects for realizing registration signal and signal processing, registration signal detect mistake
Afterwards, program is voluntarily handled first, and (SuSE) Linux OS processing is entered when can not solve;Signal processing module is by terminal
Program circuit jumps, and comes back in monitoring process;
(3) error handling module, the module diagnose for realizing type of error and take corresponding error handle according to pre-configuration
Mode completes errors repair;
(4) monitoring management module, including master control end and internal control end, master control end are supplied to visual operation circle of user
Face, internal control end are interacted with master control end, so that user can be fault-tolerant in master control end checking monitoring software operation state
Log is pre-configured system parameters;
The process manager module is specifically used for:
(a) monitoring software is made to bring into operation;
(b) whether monitoring code meets monitoring condition, the entering signal processing module if meeting, if not satisfied, exiting monitoring;
The signal processing module is specifically used for:
(a) registration signal processing is carried out;
(b) signal is detected, detects whether that there are error signals, if error signal cannot be detected, executes normal monitoring process,
Whether supervision subjects generate error signal, if generating error signal, signal processing are carried out, and detect signal again, if do not had
There is generation error signal, then jumps to process manager module and restart to monitor process;If detecting error signal, enter
Error handling module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611147014.1A CN106649039B (en) | 2016-12-13 | 2016-12-13 | A kind of fault-tolerant method of C language monitoring software under embedded Linux system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611147014.1A CN106649039B (en) | 2016-12-13 | 2016-12-13 | A kind of fault-tolerant method of C language monitoring software under embedded Linux system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649039A CN106649039A (en) | 2017-05-10 |
CN106649039B true CN106649039B (en) | 2019-09-27 |
Family
ID=58825255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611147014.1A Active CN106649039B (en) | 2016-12-13 | 2016-12-13 | A kind of fault-tolerant method of C language monitoring software under embedded Linux system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649039B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706752A (en) * | 2009-11-20 | 2010-05-12 | 中兴通讯股份有限公司 | Method and device for in-situ software error positioning |
CN104794031A (en) * | 2015-04-16 | 2015-07-22 | 上海交通大学 | Cloud system fault detection method combining self-adjustment strategy with virtualization technology |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001005692A (en) * | 1999-06-25 | 2001-01-12 | Toshiba Corp | Computer system, its maintenance and management system, and method for informing of fault |
-
2016
- 2016-12-13 CN CN201611147014.1A patent/CN106649039B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706752A (en) * | 2009-11-20 | 2010-05-12 | 中兴通讯股份有限公司 | Method and device for in-situ software error positioning |
CN104794031A (en) * | 2015-04-16 | 2015-07-22 | 上海交通大学 | Cloud system fault detection method combining self-adjustment strategy with virtualization technology |
Also Published As
Publication number | Publication date |
---|---|
CN106649039A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Das et al. | Desh: deep learning for system health prediction of lead times to failure in hpc | |
US10719072B2 (en) | Method and apparatus for diagnosis and recovery of system problems | |
Fu et al. | Execution anomaly detection in distributed systems through unstructured log analysis | |
CN106445781B (en) | The detection system of HPC large-scale parallel program exception based on message transmission | |
Lou et al. | Mining dependency in distributed systems through unstructured logs analysis | |
CN107301115A (en) | Application exception is monitored and restoration methods and equipment | |
Jia et al. | SMARTLOG: Place error log statement by deep understanding of log intention | |
CN108897676B (en) | Flight guidance control software reliability analysis system and method based on formalization rules | |
US20070079288A1 (en) | System and method for capturing filtered execution history of executable program code | |
US8489941B2 (en) | Automatic documentation of ticket execution | |
CN109714202A (en) | A kind of client off-line reason method of discrimination and concentrating type safety management system | |
CN107729217A (en) | A kind of database abnormality eliminating method and terminal | |
CN111752741A (en) | System performance detection method and device | |
RU2597472C2 (en) | Method and device for monitoring of the device equipped with a microprocessor | |
Cotroneo et al. | Enhancing failure propagation analysis in cloud computing systems | |
CN105511937A (en) | Batch virtual machine blue screen monitoring method suitable for cloud platform | |
Chen et al. | Automatic root cause analysis via large language models for cloud incidents | |
Huang | Human error analysis in software engineering | |
CN103645985B (en) | Source code macro-matching detection method | |
CN106445787B (en) | Method and device for monitoring server core dump file and electronic equipment | |
CN103870349B (en) | For the configuration management device and method of data handling system | |
CN106649039B (en) | A kind of fault-tolerant method of C language monitoring software under embedded Linux system | |
Lu et al. | Iaso: an autonomous fault-tolerant management system for supercomputers | |
Weber et al. | Diagnosis and repair of dependent failures in the control system of a mobile autonomous robot | |
CN105426304B (en) | A kind of control method and device for restarting test |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |