CN110442470A - A kind of the system stability monitoring and restoration methods of communication equipment - Google Patents

A kind of the system stability monitoring and restoration methods of communication equipment Download PDF

Info

Publication number
CN110442470A
CN110442470A CN201910682271.2A CN201910682271A CN110442470A CN 110442470 A CN110442470 A CN 110442470A CN 201910682271 A CN201910682271 A CN 201910682271A CN 110442470 A CN110442470 A CN 110442470A
Authority
CN
China
Prior art keywords
thread
monitoring
communication equipment
restoration methods
monitors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910682271.2A
Other languages
Chinese (zh)
Other versions
CN110442470B (en
Inventor
黄振江
王清波
黄仝宇
汪刚
宋一兵
侯玉清
刘双广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gosuncn Technology Group Co Ltd
Original Assignee
Gosuncn Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gosuncn Technology Group Co Ltd filed Critical Gosuncn Technology Group Co Ltd
Priority to CN201910682271.2A priority Critical patent/CN110442470B/en
Publication of CN110442470A publication Critical patent/CN110442470A/en
Application granted granted Critical
Publication of CN110442470B publication Critical patent/CN110442470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to technical field of communication equipment, and in particular to a kind of the system stability monitoring and restoration methods of communication equipment, be specifically divided into 3 steps: Linux command executes status monitoring and recovering step;The read-write status monitoring of FLASH memory space and recovering step;Important thread monitoring running state and recovering step.This programme can monitor multiple " Linux command execution states " for influencing equipment stable operation simultaneously, the function point of " the read-write status monitoring of FLASH memory space " and " multiple thread operating statuses ", and executing different abnormal solutions according to the result monitored makes equipment be restored to normal condition, to guarantee the stable operation of equipment.

Description

A kind of the system stability monitoring and restoration methods of communication equipment
Technical field
The invention belongs to technical field of communication equipment, and in particular to a kind of the system stability monitoring and recovery of communication equipment Method.
Background technique
Prior art, which monitors system stability, to be realized by software watchdog or hardware watchdog, specially One thread timing dog-feeding, if feeding dog thread abnormal, system reboot if without feeding dog in time occurs.Communication equipment is influenced to stablize Property the reason of it is very much, wherein include following 3 points: 1.linux order execution state;The readable write state of 2.FLASH memory space;3. Important thread operating status.
Prior art, which only monitors, feeds this thread of dog, and other function points for influencing equipment stable operation do not have It monitors, if exception, which occur, in other places also results in equipment fluctuation of service, so existing scheme not can solve The problem of equipment stable operation.
Summary of the invention
In order to solve technological deficiency existing in the prior art, the invention proposes a kind of system stability of communication equipment Monitoring and restoration methods.
The invention is realized by the following technical scheme:
A kind of the system stability monitoring and restoration methods of communication equipment, comprising steps of
(1) Linux command executes status monitoring and recovery, specifically includes step:
1.1, Linux command system is monitored, i=0 is enabled;
1.2, the first preset time is waited, judges the size of i;
If 1.3, i=0, system command ls is executed, and judge implementing result;
If 1.4, i=1, system command ps is executed, and judge implementing result;
If 1.5, i=2, system command free is executed, and judge implementing result;
1.6, whether the implementing result in judgment step 1.3-1.5 fails;If so, executing step 1.8, step is executed if not Rapid 1.7;
1.7, if ((++ i) >=3), { i=0;, return step 1.2;
1.8, hardware watchdog stops feeding dog, restarts system, terminates process.
(2) the read-write status monitoring of FLASH memory space and recovery, specifically include step:
2.1, FLASH memory space is monitored;
2.2, the second preset time is waited;
2.3, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step 2.2, if it is not, entering step 2.4;
2.4, set FLASH is abnormal, and request processing terminates process.
(3) important thread monitoring running state and recovery.
Further, it in the step (1), is executed using linux system instruction ls, ps and free timing, determines life Implementing result is enabled, and decides whether to call exception handling according to returning the result.
Preferably, first preset time is 30 minutes.
Further, it in the step (2), uses using second preset time as interval, is periodically stored in FLASH A file is read and write in space, and decides whether to call exception handling according to returning the result.
Preferably, second preset time 12 hours.
Further, further comprise step in the step (3):
3.1 in the beginning location set monitoring request flag position of per thread.
3.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread The runing time of loop body will be faster than the runing time of monitoring thread loops body.
Just whether 3.3 monitoring threads judge the thread according to the monitoring request flag of each monitored thread and counter Often operation, to decide whether to call exception handling.
The monitoring method of 3.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if monitored thread Counter is as last counter values, and it is abnormal that monitoring thread then determines that the thread occurs, if the detection mistake reaches Exception handling will be called to setting number.
Further, in the step (3), the important thread monitoring running state and recovery further comprise:
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enable n=n+1, return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, enabling n=n+1, step is returned Rapid D.
Further, in the step (3), each thread for needing to monitor executes step:
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
The invention also includes a kind of computer readable storage mediums, are stored thereon with computer program, which is characterized in that should The step of monitoring and restoration methods are realized when program is executed by processor.
The invention also includes a kind of computer equipment, including memory, processor and storage on a memory and can located The computer program run on reason device, which is characterized in that the processor realizes monitoring and restoration methods when executing described program The step of.
Compared with prior art, the present invention at least has the following beneficial effects or advantage: while monitoring multiple influences and setting " the Linux command execution state " of standby stable operation, " the read-write status monitoring of FLASH memory space " and " multiple thread operations The function point of state ", and executing different abnormal solutions according to the result monitored makes equipment be restored to normal condition, To guarantee the stable operation of equipment.
Detailed description of the invention
The present invention is described in further details below with reference to attached drawing;
Fig. 1 is that Linux command of the invention executes status monitoring and restoration methods principle flow chart;
Fig. 2 is the read-write status monitoring of FLASH memory space and restoration methods principle flow chart of the invention;
Fig. 3 is single thread principle flow chart of the invention;
Fig. 4 is thread monitoring running state and restoration methods principle flow chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
The present invention provides a kind of communication equipment of (SuSE) Linux OS, and the system stability including a kind of communication equipment Monitoring and restoration methods, detailed process is as follows:
Linux command executes status monitoring:
1. monitoring method: being executed using linux system instruction ls, ps and free timing, determine command execution results, and root Decide whether to call abnormality eliminating method according to returning the result.
2. monitoring and restoration methods schematic diagram as shown in Figure 1, comprising steps of
A, Linux command system is monitored, i=0 is enabled;
B, it waits 30 minutes, judges the size of i;
If C, i=0, system command ls is executed, and judge implementing result;
If D, i=1, system command ps is executed, and judge implementing result;
If E, i=2, system command free is executed, and judge implementing result;
F, whether the implementing result in judgment step C~E fails;If so, executing step H, step G is executed if not;
G, if ((++ i) >=3), { i=0;, return step B;
H, hardware watchdog stops feeding dog, restarts system, terminates process.
The read-write status monitoring of FLASH memory space:
1. monitoring method: 12 hours are interval, periodically read and write a file in FLASH memory space, and tie according to returning Fruit decides whether to call abnormality eliminating method.
2. monitoring and restoration methods schematic diagram as shown in Fig. 2, comprising steps of
A, FLASH memory space is monitored;
B, it waits 12 hours;
C, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step B, if It is no, enter step D;
D, set FLASH is abnormal, and request processing terminates process.
Thread monitoring running state:
1. monitoring method:
1.1 in the beginning location set monitoring request flag position of per thread.
1.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread The runing time of loop body will be faster than the runing time of monitoring thread loops body.
Just whether 1.3 monitoring threads judge the thread according to the monitoring request flag of each monitored thread and counter Often operation, to decide whether to call exception handling.
The monitoring method of 1.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if monitored thread Counter is as last counter values, and it is abnormal that monitoring thread then determines that the thread occurs, if the detection mistake reaches Exception handling will be called to setting number.
2. monitoring and restoration methods schematic diagram:
2.1 it is each need the thread schematic diagrams that monitor as shown in figure 3, comprising steps of
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
2.2 monitoring threads thread schematic diagram as shown in figure 4, comprising steps of
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enabling n=n+1, (monitoring is next Thread), return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, enabling n=n+1 (under monitoring One thread), return step D.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, which is characterized in that should The step of monitoring and restoration methods are realized when program is executed by processor.
The present invention also provides a kind of computer equipment, including memory, processor and storage on a memory and can located The computer program run on reason device, which is characterized in that the processor realizes monitoring and restoration methods when executing described program The step of.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects Describe in detail it is bright, it should be understood that the above is only a specific embodiment of the present invention, the guarantor being not intended to limit the present invention Protect range.Without departing from the spirit and scope of the invention, any modification, equivalent substitution, improvement and etc. done also belong to this Within the protection scope of invention.

Claims (10)

1. a kind of system stability of communication equipment monitors and restoration methods, which is characterized in that comprising steps of
(1) Linux command executes status monitoring and recovery, specifically includes step:
1.1, Linux command system is monitored, i=0 is enabled;
1.2, the first preset time is waited, judges the size of i;
If 1.3, i=0, system command ls is executed, and judge implementing result;
If 1.4, i=1, system command ps is executed, and judge implementing result;
If 1.5, i=2, system command free is executed, and judge implementing result;
1.6, whether the implementing result in judgment step 1.3-1.5 fails;If so, executing step 1.8, step is executed if not 1.7;
1.7, if ((++ i) >=3), { i=0;, return step 1.2;
1.8, hardware watchdog stops feeding dog, restarts system, terminates process.
(2) the read-write status monitoring of FLASH memory space and recovery, specifically include step:
2.1, FLASH memory space is monitored;
2.2, the second preset time is waited;
2.3, text document is read and write in FLASH memory space, judges to read and write whether result succeeds, if so, return step 2.2, if It is no, enter step 2.4;
2.4, set FLASH is abnormal, and request processing terminates process.
(3) important thread monitoring running state and recovery.
2. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described In step (1), is executed using linux system instruction ls, ps and free timing, determine command execution results, and tie according to returning Fruit decides whether to call exception handling.
3. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described the One preset time is 30 minutes.
4. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described In step (2), use using second preset time as interval, periodically in FLASH memory space one file of read-write, and according to It returns the result and decides whether to call exception handling.
5. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described the Two preset times 12 hours.
6. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described Further comprise step in step (3):
3.1 in the beginning location set monitoring request flag position of per thread.
3.2 in the loop body of thread, every once just to add 1 to the counter of the thread into loop body, is monitored thread loops The runing time of body will be faster than the runing time of monitoring thread loops body.
3.3 monitoring threads judge whether the thread is normally transported according to the monitoring request flag of each monitored thread and counter Row, to decide whether to call exception handling.
The monitoring method of 3.4 monitoring threads, the counter of the relatively more monitored thread of monitoring thread, if the counting of monitored thread For device as last counter values, it is abnormal that monitoring thread then determines that the thread occurs, and sets if the detection mistake reaches Exception handling will be called by determining number.
7. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described In step (3), the important thread monitoring running state and recovery further comprise:
A, start, monitor each thread operating status;
B, it waits 60 seconds;
C, thread n=0 is enabled, is monitored since first thread;
D, judge whether that n < m, m are that can monitor total number of threads;If entering step E, return step B if not;
E, judge thread n monitoring mark whether set, if so, enter step F, if it is not, enable n=n+1, return step D;
F, judge whether the thread counter changes;If entering step G;
G, system reboot mark adds 1;
H, judge whether system reboot mark is greater than the set value, if then system reboot;If it is not, n=n+1 is enabled, return step D.
8. the system stability of communication equipment according to claim 1 monitors and restoration methods, which is characterized in that described In step (3), each thread for needing to monitor executes step:
A, start, into thread;
B, it is named to thread;
C, set thread monitors flag bit;
D, every to add 1 into thread counter of loop body into thread loops body;
E, other business in processing cycle body, return step D.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1 to 8 the method is realized when row.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes any one of claim 1 to 8 the method when executing described program Step.
CN201910682271.2A 2019-07-26 2019-07-26 System stability monitoring and recovering method of communication equipment Active CN110442470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910682271.2A CN110442470B (en) 2019-07-26 2019-07-26 System stability monitoring and recovering method of communication equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910682271.2A CN110442470B (en) 2019-07-26 2019-07-26 System stability monitoring and recovering method of communication equipment

Publications (2)

Publication Number Publication Date
CN110442470A true CN110442470A (en) 2019-11-12
CN110442470B CN110442470B (en) 2023-08-29

Family

ID=68431757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910682271.2A Active CN110442470B (en) 2019-07-26 2019-07-26 System stability monitoring and recovering method of communication equipment

Country Status (1)

Country Link
CN (1) CN110442470B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1595368A (en) * 2003-09-13 2005-03-16 华为技术有限公司 Abnormal monitoring equipment and method for multi-task system
CN101996106A (en) * 2010-12-17 2011-03-30 南京中兴力维软件有限公司 Method for monitoring software running state
CN107133130A (en) * 2017-05-19 2017-09-05 上海斐讯数据通信技术有限公司 Computer operational monitoring method and apparatus
CN107402844A (en) * 2017-07-14 2017-11-28 深圳市沃特沃德股份有限公司 Operating system method for restarting, device and accessory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1595368A (en) * 2003-09-13 2005-03-16 华为技术有限公司 Abnormal monitoring equipment and method for multi-task system
CN101996106A (en) * 2010-12-17 2011-03-30 南京中兴力维软件有限公司 Method for monitoring software running state
CN107133130A (en) * 2017-05-19 2017-09-05 上海斐讯数据通信技术有限公司 Computer operational monitoring method and apparatus
CN107402844A (en) * 2017-07-14 2017-11-28 深圳市沃特沃德股份有限公司 Operating system method for restarting, device and accessory system

Also Published As

Publication number Publication date
CN110442470B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
EP2624140A1 (en) Method and system for detecting anomaly of network processor
US20210109800A1 (en) Method and apparatus for monitoring device failure
EP3025233B1 (en) Robust hardware/software error recovery system
US9459949B2 (en) Methods and apparatus to provide failure detection
CN104426696B (en) A kind of method of troubleshooting, server and system
CN107783829B (en) Task processing method and device, storage medium and computer equipment
CN110083494A (en) The method and apparatus of hardware error are managed in multi-core environment
CN103268277A (en) Method and system for outputting log information
CN100395722C (en) Method for preserving abnormal state information of control system
CN114116280A (en) Interactive BMC self-recovery method, system, terminal and storage medium
CN109324959B (en) Method for automatically transferring data, server and computer readable storage medium
CN108958965A (en) A kind of BMC monitoring can restore the method, device and equipment of ECC error
CN105224426A (en) Physical host fault detection method, device and empty machine management method, system
CN110442470A (en) A kind of the system stability monitoring and restoration methods of communication equipment
US10599530B2 (en) Method and apparatus for recovering in-memory data processing system
CN111221683A (en) Double-flash hot backup method, system, terminal and storage medium for data center switch
CN103514086A (en) Extraction method and device for software error report
CN107273291B (en) Processor debugging method and system
CN106843022A (en) A kind of method for improving embedded control system output reliability
CN111143127B (en) Method, device, storage medium and equipment for supervising network equipment
CN114116330A (en) Server performance test method, system, terminal and storage medium
CN108984378B (en) Asynchronous processing method and device for log data
CN111444032A (en) Computer system fault repairing method, system and equipment
CN106844634B (en) Database transaction optimization method and system
JP6572722B2 (en) Event occurrence notification program, event occurrence notification method, and event occurrence notification device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant