CN110442470A - A kind of the system stability monitoring and restoration methods of communication equipment - Google Patents
A kind of the system stability monitoring and restoration methods of communication equipment Download PDFInfo
- Publication number
- CN110442470A CN110442470A CN201910682271.2A CN201910682271A CN110442470A CN 110442470 A CN110442470 A CN 110442470A CN 201910682271 A CN201910682271 A CN 201910682271A CN 110442470 A CN110442470 A CN 110442470A
- Authority
- CN
- China
- Prior art keywords
- thread
- monitoring
- communication equipment
- restoration methods
- monitors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Environmental & Geological Engineering (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention belongs to technical field of communication equipment, and in particular to a kind of the system stability monitoring and restoration methods of communication equipment, be specifically divided into 3 steps: Linux command executes status monitoring and recovering step;The read-write status monitoring of FLASH memory space and recovering step;Important thread monitoring running state and recovering step.This programme can monitor multiple " Linux command execution states " for influencing equipment stable operation simultaneously, the function point of " the read-write status monitoring of FLASH memory space " and " multiple thread operating statuses ", and executing different abnormal solutions according to the result monitored makes equipment be restored to normal condition, to guarantee the stable operation of equipment.
Description
Technical field
The invention belongs to technical field of communication equipment, and in particular to a kind of the system stability monitoring and recovery of communication equipment
Method.
Background technique
Prior art, which monitors system stability, to be realized by software watchdog or hardware watchdog, specially
One thread timing dog-feeding, if feeding dog thread abnormal, system reboot if without feeding dog in time occurs.Communication equipment is influenced to stablize
Property the reason of it is very much, wherein include following 3 points: 1.linux order execution state;The readable write state of 2.FLASH memory space;3.
Important thread operating status.
Prior art, which only monitors, feeds this thread of dog, and other function points for influencing equipment stable operation do not have
It monitors, if exception, which occur, in other places also results in equipment fluctuation of service, so existing scheme not can solve
The problem of equipment stable operation.
Summary of the invention
In order to solve technological deficiency existing in the prior art, the invention proposes a kind of system stability of communication equipment
Monitoring and restoration methods.
The invention is realized by the following technical scheme:
A kind of the system stability monitoring and restoration methods of communication equipment, comprising steps of
(1) Linux command executes status monitoring and recovery, specifically includes step:
1.1, Linux command system is monitored, i=0 is enabled;
1.2, the first preset time is waited, judges the size of i;
If 1.3, i=0, system command ls is executed, and judge implementing result;
If 1.4, i=1, system command ps is executed, and judge implementing result;
If 1.5, i=2, system command free is executed, and judge implementing result;
1.6, whether the implementing result in judgment step 1.3-1.5 fails;If so, executing step 1.8, step is executed if not
Rapid 1.7;
1.7, if ((++ i) >=3), { i=0;, return step 1.2;
1.8, hardware watchdog stops feeding dog, restarts system, terminates process.
(2) the read-write status monitoring of FLASH memory space and recovery, specifically include step:
2.1, FLASH memory space is monitored;
2.2, the second preset time is waited;
2.3, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step
2.2, if it is not, entering step 2.4;
2.4, set FLASH is abnormal, and request processing terminates process.
(3) important thread monitoring running state and recovery.
Further, it in the step (1), is executed using linux system instruction ls, ps and free timing, determines life
Implementing result is enabled, and decides whether to call exception handling according to returning the result.
Preferably, first preset time is 30 minutes.
Further, it in the step (2), uses using second preset time as interval, is periodically stored in FLASH
A file is read and write in space, and decides whether to call exception handling according to returning the result.
Preferably, second preset time 12 hours.
Further, further comprise step in the step (3):
3.1 in the beginning location set monitoring request flag position of per thread.
3.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread
The runing time of loop body will be faster than the runing time of monitoring thread loops body.
Just whether 3.3 monitoring threads judge the thread according to the monitoring request flag of each monitored thread and counter
Often operation, to decide whether to call exception handling.
The monitoring method of 3.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if monitored thread
Counter is as last counter values, and it is abnormal that monitoring thread then determines that the thread occurs, if the detection mistake reaches
Exception handling will be called to setting number.
Further, in the step (3), the important thread monitoring running state and recovery further comprise:
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enable n=n+1, return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, enabling n=n+1, step is returned
Rapid D.
Further, in the step (3), each thread for needing to monitor executes step:
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
The invention also includes a kind of computer readable storage mediums, are stored thereon with computer program, which is characterized in that should
The step of monitoring and restoration methods are realized when program is executed by processor.
The invention also includes a kind of computer equipment, including memory, processor and storage on a memory and can located
The computer program run on reason device, which is characterized in that the processor realizes monitoring and restoration methods when executing described program
The step of.
Compared with prior art, the present invention at least has the following beneficial effects or advantage: while monitoring multiple influences and setting
" the Linux command execution state " of standby stable operation, " the read-write status monitoring of FLASH memory space " and " multiple thread operations
The function point of state ", and executing different abnormal solutions according to the result monitored makes equipment be restored to normal condition,
To guarantee the stable operation of equipment.
Detailed description of the invention
The present invention is described in further details below with reference to attached drawing;
Fig. 1 is that Linux command of the invention executes status monitoring and restoration methods principle flow chart;
Fig. 2 is the read-write status monitoring of FLASH memory space and restoration methods principle flow chart of the invention;
Fig. 3 is single thread principle flow chart of the invention;
Fig. 4 is thread monitoring running state and restoration methods principle flow chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
The present invention provides a kind of communication equipment of (SuSE) Linux OS, and the system stability including a kind of communication equipment
Monitoring and restoration methods, detailed process is as follows:
Linux command executes status monitoring:
1. monitoring method: being executed using linux system instruction ls, ps and free timing, determine command execution results, and root
Decide whether to call abnormality eliminating method according to returning the result.
2. monitoring and restoration methods schematic diagram as shown in Figure 1, comprising steps of
A, Linux command system is monitored, i=0 is enabled;
B, it waits 30 minutes, judges the size of i;
If C, i=0, system command ls is executed, and judge implementing result;
If D, i=1, system command ps is executed, and judge implementing result;
If E, i=2, system command free is executed, and judge implementing result;
F, whether the implementing result in judgment step C~E fails;If so, executing step H, step G is executed if not;
G, if ((++ i) >=3), { i=0;, return step B;
H, hardware watchdog stops feeding dog, restarts system, terminates process.
The read-write status monitoring of FLASH memory space:
1. monitoring method: 12 hours are interval, periodically read and write a file in FLASH memory space, and tie according to returning
Fruit decides whether to call abnormality eliminating method.
2. monitoring and restoration methods schematic diagram as shown in Fig. 2, comprising steps of
A, FLASH memory space is monitored;
B, it waits 12 hours;
C, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step B, if
It is no, enter step D;
D, set FLASH is abnormal, and request processing terminates process.
Thread monitoring running state:
1. monitoring method:
1.1 in the beginning location set monitoring request flag position of per thread.
1.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread
The runing time of loop body will be faster than the runing time of monitoring thread loops body.
Just whether 1.3 monitoring threads judge the thread according to the monitoring request flag of each monitored thread and counter
Often operation, to decide whether to call exception handling.
The monitoring method of 1.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if monitored thread
Counter is as last counter values, and it is abnormal that monitoring thread then determines that the thread occurs, if the detection mistake reaches
Exception handling will be called to setting number.
2. monitoring and restoration methods schematic diagram:
2.1 it is each need the thread schematic diagrams that monitor as shown in figure 3, comprising steps of
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
2.2 monitoring threads thread schematic diagram as shown in figure 4, comprising steps of
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enabling n=n+1, (monitoring is next
Thread), return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, enabling n=n+1 (under monitoring
One thread), return step D.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, which is characterized in that should
The step of monitoring and restoration methods are realized when program is executed by processor.
The present invention also provides a kind of computer equipment, including memory, processor and storage on a memory and can located
The computer program run on reason device, which is characterized in that the processor realizes monitoring and restoration methods when executing described program
The step of.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects
Describe in detail it is bright, it should be understood that the above is only a specific embodiment of the present invention, the guarantor being not intended to limit the present invention
Protect range.Without departing from the spirit and scope of the invention, any modification, equivalent substitution, improvement and etc. done also belong to this
Within the protection scope of invention.
Claims (10)
1. a kind of system stability of communication equipment monitors and restoration methods, which is characterized in that comprising steps of
(1) Linux command executes status monitoring and recovery, specifically includes step:
1.1, Linux command system is monitored, i=0 is enabled;
1.2, the first preset time is waited, judges the size of i;
If 1.3, i=0, system command ls is executed, and judge implementing result;
If 1.4, i=1, system command ps is executed, and judge implementing result;
If 1.5, i=2, system command free is executed, and judge implementing result;
1.6, whether the implementing result in judgment step 1.3-1.5 fails;If so, executing step 1.8, step is executed if not
1.7;
1.7, if ((++ i) >=3), { i=0;, return step 1.2;
1.8, hardware watchdog stops feeding dog, restarts system, terminates process.
(2) the read-write status monitoring of FLASH memory space and recovery, specifically include step:
2.1, FLASH memory space is monitored;
2.2, the second preset time is waited;
2.3, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step 2.2, if
It is no, enter step 2.4;
2.4, set FLASH is abnormal, and request processing terminates process.
(3) important thread monitoring running state and recovery.
2. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described
In step (1), is executed using linux system instruction ls, ps and free timing, determine command execution results, and tie according to returning
Fruit decides whether to call exception handling.
3. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described the
One preset time is 30 minutes.
4. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described
In step (2), use using second preset time as interval, periodically in FLASH memory space one file of read-write, and according to
It returns the result and decides whether to call exception handling.
5. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described the
Two preset times 12 hours.
6. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described
Further comprise step in step (3):
3.1 in the beginning location set monitoring request flag position of per thread.
3.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread loops
The runing time of body will be faster than the runing time of monitoring thread loops body.
3.3 monitoring threads judge whether the thread is normally transported according to the monitoring request flag of each monitored thread and counter
Row, to decide whether to call exception handling.
The monitoring method of 3.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if the counting of monitored thread
For device as last counter values, it is abnormal that monitoring thread then determines that the thread occurs, and sets if the detection mistake reaches
Exception handling will be called by determining number.
7. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described
In step (3), the important thread monitoring running state and recovery further comprise:
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enable n=n+1, return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, n=n+1 is enabled, return step D.
8. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described
In step (3), each thread for needing to monitor executes step:
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The step of any one of claim 1 to 8 the method is realized when row.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes any one of claim 1 to 8 the method when executing described program
Step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910682271.2A CN110442470B (en) | 2019-07-26 | 2019-07-26 | System stability monitoring and recovering method of communication equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910682271.2A CN110442470B (en) | 2019-07-26 | 2019-07-26 | System stability monitoring and recovering method of communication equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442470A true CN110442470A (en) | 2019-11-12 |
CN110442470B CN110442470B (en) | 2023-08-29 |
Family
ID=68431757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910682271.2A Active CN110442470B (en) | 2019-07-26 | 2019-07-26 | System stability monitoring and recovering method of communication equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442470B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1595368A (en) * | 2003-09-13 | 2005-03-16 | 华为技术有限公司 | Abnormal monitoring equipment and method for multi-task system |
CN101996106A (en) * | 2010-12-17 | 2011-03-30 | 南京中兴力维软件有限公司 | Method for monitoring software running state |
CN107133130A (en) * | 2017-05-19 | 2017-09-05 | 上海斐讯数据通信技术有限公司 | Computer operational monitoring method and apparatus |
CN107402844A (en) * | 2017-07-14 | 2017-11-28 | 深圳市沃特沃德股份有限公司 | Operating system method for restarting, device and accessory system |
-
2019
- 2019-07-26 CN CN201910682271.2A patent/CN110442470B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1595368A (en) * | 2003-09-13 | 2005-03-16 | 华为技术有限公司 | Abnormal monitoring equipment and method for multi-task system |
CN101996106A (en) * | 2010-12-17 | 2011-03-30 | 南京中兴力维软件有限公司 | Method for monitoring software running state |
CN107133130A (en) * | 2017-05-19 | 2017-09-05 | 上海斐讯数据通信技术有限公司 | Computer operational monitoring method and apparatus |
CN107402844A (en) * | 2017-07-14 | 2017-11-28 | 深圳市沃特沃德股份有限公司 | Operating system method for restarting, device and accessory system |
Also Published As
Publication number | Publication date |
---|---|
CN110442470B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2624140A1 (en) | Method and system for detecting anomaly of network processor | |
US20210109800A1 (en) | Method and apparatus for monitoring device failure | |
EP3025233B1 (en) | Robust hardware/software error recovery system | |
US9459949B2 (en) | Methods and apparatus to provide failure detection | |
CN104426696B (en) | A kind of method of troubleshooting, server and system | |
CN107783829B (en) | Task processing method and device, storage medium and computer equipment | |
CN110083494A (en) | The method and apparatus of hardware error are managed in multi-core environment | |
CN103268277A (en) | Method and system for outputting log information | |
CN100395722C (en) | Method for preserving abnormal state information of control system | |
CN114116280A (en) | Interactive BMC self-recovery method, system, terminal and storage medium | |
CN109324959B (en) | Method for automatically transferring data, server and computer readable storage medium | |
CN108958965A (en) | A kind of BMC monitoring can restore the method, device and equipment of ECC error | |
CN105224426A (en) | Physical host fault detection method, device and empty machine management method, system | |
CN110442470A (en) | A kind of the system stability monitoring and restoration methods of communication equipment | |
US10599530B2 (en) | Method and apparatus for recovering in-memory data processing system | |
CN111221683A (en) | Double-flash hot backup method, system, terminal and storage medium for data center switch | |
CN103514086A (en) | Extraction method and device for software error report | |
CN107273291B (en) | Processor debugging method and system | |
CN106843022A (en) | A kind of method for improving embedded control system output reliability | |
CN111143127B (en) | Method, device, storage medium and equipment for supervising network equipment | |
CN114116330A (en) | Server performance test method, system, terminal and storage medium | |
CN108984378B (en) | Asynchronous processing method and device for log data | |
CN111444032A (en) | Computer system fault repairing method, system and equipment | |
CN106844634B (en) | Database transaction optimization method and system | |
JP6572722B2 (en) | Event occurrence notification program, event occurrence notification method, and event occurrence notification device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |