CN113064762A - Service self-recovery method based on multiple detection - Google Patents

Service self-recovery method based on multiple detection Download PDF

Info

Publication number
CN113064762A
CN113064762A CN202110384903.4A CN202110384903A CN113064762A CN 113064762 A CN113064762 A CN 113064762A CN 202110384903 A CN202110384903 A CN 202110384903A CN 113064762 A CN113064762 A CN 113064762A
Authority
CN
China
Prior art keywords
service
information
monitoring
monitoring script
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110384903.4A
Other languages
Chinese (zh)
Other versions
CN113064762B (en
Inventor
程永新
宋辉
苏树昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Original Assignee
Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai New Torch Network Information Technology Ltd By Share Ltd filed Critical Shanghai New Torch Network Information Technology Ltd By Share Ltd
Priority to CN202110384903.4A priority Critical patent/CN113064762B/en
Publication of CN113064762A publication Critical patent/CN113064762A/en
Application granted granted Critical
Publication of CN113064762B publication Critical patent/CN113064762B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a service self-recovery method based on various detection, which comprises the following steps: s1: a monitoring script is configured in advance in a monitored server; s2: a timing task carried by a server is adopted to automatically run a monitoring script at regular time; s3: and the monitoring script judges the availability of the state monitoring service of the monitoring item, and when the monitoring script monitors that any monitoring item is abnormal, the monitoring script acquires and stores the fault environment information and then restarts the service. According to the method, the CURL page response detection and the process online state detection are combined with service fault operation environment information extraction of Jstack and Jmap commands, when the states of a service process and a test page are monitored, once a configured monitoring item is found to be offline or response returns to be abnormal, the information of a network connection state and related Java virtual machines of the JAVA service is stored through Jstack and Jmap related commands, and meanwhile, the service is restarted for self recovery; and the fault state of the program is kept while the service is recovered, and relevant information is provided for subsequent fault analysis.

Description

Service self-recovery method based on multiple detection
Technical Field
The present invention relates to a service self-recovery method, and more particularly, to a service self-recovery method based on multiple probes.
Background
Service monitoring and automatic recovery are basic requirements of high availability of a business system at present, and meanwhile, the inspection work of service states is also one of the works that operation and maintenance personnel often need to maintain. With the increasingly complex business requirements and the explosive increase of concurrency, the requirement on the availability of services is higher, and relevant analysis information is extracted for problem troubleshooting while the business is recovered as soon as possible, so that great working pressure is brought to operation and maintenance personnel. The conventional means obviously cannot meet the current requirements, and the maintenance efficiency is low. Comprehensive self-detection and automated execution and information collection are the development directions of service self-recovery.
The conventional self-monitoring service for operation and maintenance usually monitors the service process and the state of a monitoring test page only through a monitoring port, and once the monitoring port or the monitoring service is found to be not on line or the monitoring test page returns abnormal, the service is recovered by adopting a direct restarting method, so that relevant fault environment information such as network connection state and JVM information related to JAVA service and the like cannot be reserved, and great difficulty is brought to problem troubleshooting. Therefore, the prior art has yet to be improved.
Disclosure of Invention
The invention provides a service self-recovery method based on various detection, which adopts a combined mode of service process state and test page response state code to monitor the usability of the service, when any one of the monitoring items is abnormal, the current network connection state of a server and the related virtual machine information of JAVA service are stored, and the service is restarted for self-recovery; and the related failure environment information is kept while the service is recovered.
The technical scheme adopted by the invention for solving the technical problems is to provide a service self-recovery method based on various detections, which comprises the following steps: s1: a monitoring script is configured in advance in a monitored server; s2: a timing task carried by a server is adopted to automatically run a monitoring script at regular time; s3: and the monitoring script judges the availability of the state monitoring service of the monitoring item, and when the monitoring script monitors that any monitoring item is abnormal, the monitoring script acquires and stores the fault environment information and then restarts the service.
Further, the monitoring items in the step S3 include service process status monitoring and test page response status monitoring; judging whether the service process state is online or not by detecting whether the service process ID exists or not; and judging whether the response of the test page is normal or not by detecting the return value of the test page.
Further, a service process ID is inquired in the monitoring script through a service name, if the service process ID is returned, the service process is on line, and if the service process ID is not returned, the service process is in an off-line abnormal state.
Further, accessing a test page and acquiring a return value of the test page through a CURL command in the monitoring script, wherein if the acquired return value is normal, the test page responds normally, and the service is in a normal state; if no return value exists, the test page response is abnormal, and the service is in an abnormal state of no response in a false death.
Further, when the monitoring script is configured in step S1, a service variable and a test page address are defined in the monitoring script.
Further, the fault environment information includes network connection state information and Jave virtual machine information related to the service, the network connection state information and current connection concurrency number information are counted through a command of a Net Stat console in the monitoring script, and the network connection state information includes a routing table, actual network connection and state information of each network interface device.
Further, the Jave virtual machine information comprises Jave stack information and Jave heap memory information; the monitoring script calls a Jstack tracking tool, Java stack information is obtained according to the service process ID, the current thread snapshot information of the Java virtual machine is generated by the Java stack information, and the thread snapshot information comprises the stack information of each thread; the monitoring script calls a Jmap heap memory tracking tool to acquire memory mapping information or heap memory information of the Java process and reflect memory mirror images used by the Java heap, wherein the memory mirror images include system information, virtual machine attributes, complete thread transfer storage and state information of all classes and objects.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a service self-recovery method based on various detections, which combines the CURL page response detection and the process online state detection with the service failure operation environment information extraction of Jstack and Jmap commands, when monitoring the service process and the test page state, once the configured monitoring item is found to be not online or the response returns to abnormal, the Jstack and Jmap related commands are firstly used for storing the network connection state and the related Java virtual machine information of the JAVA service, and the service is restarted for self-recovery; and the fault state of the program is kept while the service is recovered, and relevant information is provided for subsequent fault analysis and disk replication.
Drawings
FIG. 1 is a flow chart of a method for self-recovery of services based on multiple probing according to an embodiment of the present invention;
FIG. 2 is a flow chart of a monitoring script according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
FIG. 1 is a flow chart of a method for self-recovery of services based on multiple probing according to an embodiment of the present invention; FIG. 2 is a flow chart of a monitoring script according to an embodiment of the present invention.
Referring to fig. 1, the service self-recovery method based on multiple probes according to the embodiment of the present invention includes the following steps:
s1: a monitoring script is configured in advance in a monitored server; when a monitoring script is configured, defining a service variable and defining a test page address in the monitoring script;
s2: a timing task carried by a server is adopted to automatically run a monitoring script at regular time;
s3: and the monitoring script judges the availability of the state monitoring service of the monitoring item, and when the monitoring script monitors that any monitoring item is abnormal, the monitoring script acquires and stores the fault environment information and then restarts the service.
Referring to fig. 2, in the service self-recovery method based on multiple probing according to the embodiment of the present invention, the monitoring item in step S3 includes service process state monitoring and test page response state monitoring; judging whether the service process state is online or not by detecting whether the service process ID exists or not; and judging whether the response of the test page is normal or not by detecting the return value of the test page.
And inquiring the ID of the service process through the service name in the monitoring script, if the ID of the service process is returned, the service process is on line, and if the ID of the service process is not returned, the service process is in an off-line abnormal state.
Accessing a test page and acquiring a return value of the test page through a CURL command in a monitoring script, wherein if the acquired return value is normal, the test page responds normally, and the service is in a normal state; if no return value exists, the test page response is abnormal, and the service is in an abnormal state of no response in a false death.
The CURL command is a powerful web tool that can access the test URL through the command line, serving to issue web requests, and then get and extract data for display on the standard output. The return value of the Web page is tested by using the CURL in the script, so that the running state of the Web service can be conveniently monitored at regular time, and the state that the service processing is falsely dead and has no response is eliminated. The Http Request message can be constructed by using a CURL command, Http Response returned by the server can be analyzed, Cookie characteristics are additionally supported, the basic functions of the Web browser can be completed, and protocols such as HTTPS/FTP/FTPS/TELNET/LDAP and the like are also supported. The file can be downloaded in the modes of Http, Ftp and the like, and can also be uploaded.
Specifically, the failure environment information in step S3 includes network connection status information and Jave virtual machine information related to the service, and statistics is performed on the network connection status information and current connection concurrency number information through a Net Stat console command in the monitoring script, where the network connection status information includes a routing table, actual network connection, and status information of each network interface device.
The Net Stat is a console command, a very useful tool for monitoring the TCP/IP network, which can display the routing tables, the actual network connections, and status information for each network interface device. The Net Stat is used for displaying statistical data related to IP, TCP, UDP and ICMP protocols, and is generally used for checking the network connection condition of each port of the computer. And printing and counting information such as network connection state, current connection concurrency and the like by using the Net Stat in the script to provide network connection conditions for subsequent problem investigation.
The Jave virtual machine information comprises Jave stack information and Jave heap memory information; the monitoring script calls a Jstack tracking tool, Java stack information is obtained according to the service process ID, the current thread snapshot information of the Java virtual machine is generated by the Java stack information, and the thread snapshot information comprises the stack information of each thread; the monitoring script calls a Jmap heap memory tracking tool to acquire memory mapping information or heap memory information of the Java process and reflect memory mirror images used by the Java heap, wherein the memory mirror images include system information, virtual machine attributes, complete thread transfer storage and state information of all classes and objects.
Jstack is a stack tracking tool of the Java virtual machine, and is used for printing out a given Java process ID or core file or Java stack information of a remote debugging service to generate current thread snapshot information of the virtual machine, wherein the current thread snapshot information comprises the stack information of each thread. The command is usually used to locate the thread stalling reason, and when the thread stalls, the stack information of each thread can be checked through the stack, so that the stalling reason can be analyzed. If the Java program crashes to generate the core file, the Jstack tool can be used to obtain the information of Java Stack and Native Stack of the core file, so that it can be easily known how the Java program crashes and where the program is in trouble. In addition, the Jstack tool can be attached to the running Java program, the information of Java stack and Native stack of the running Java program can be seen, and the Jstack tool is very useful if the running Java program is in the state of hung.
Jmap is a Heap memory tracking tool carried by the Java virtual machine, and can be mainly used for printing memory maps of Java processes or details (such as which objects are generated and the number of objects and the like) of Heap memory, namely a Heap Dump file. The method is mainly used for checking large objects with memory leakage and severe image performance, checking which object in a system is created most, analyzing the sizes occupied by various objects and the like, wherein a Dump file is a memory copy of a process. The heap Dump is a memory image reflecting the use of the Java heap, and mainly includes system information, virtual machine attributes, a complete thread Dump, states of all classes and objects, and the like. Generally, memory leaks are suspected in cases of memory shortage, GC abnormality, and the like. At this time we can make the heap Dump to look at specific conditions and analyze the reason.
In summary, in the service self-recovery method based on multiple probes of the embodiment of the present invention, the CURL page response detection and the process online state detection are combined with the service failure operation environment information extraction of the Jstack and Jmap commands, when the service process and the test page state are monitored, once the configured monitoring item is found to be offline or the response returns to be abnormal, the network connection state and the information of the JAVA virtual machine related to the JAVA service are stored by the Jstack and Jmap related commands, and the service is restarted to perform self-recovery; and the fault state of the program is kept while the service is recovered, and relevant information is provided for subsequent fault analysis and disk replication.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A service self-recovery method based on multiple detection is characterized by comprising the following steps:
s1: a monitoring script is configured in advance in a monitored server;
s2: a timing task carried by a server is adopted to automatically run a monitoring script at regular time;
s3: and the monitoring script judges the availability of the state monitoring service of the monitoring item, and when the monitoring script monitors that any monitoring item is abnormal, the monitoring script acquires and stores the fault environment information and then restarts the service.
2. The multiple probing based service self-healing method of claim 1, wherein the monitoring items in step S3 include service process status monitoring and test page response status monitoring; judging whether the service process state is online or not by detecting whether the service process ID exists or not; and judging whether the response of the test page is normal or not by detecting the return value of the test page.
3. The service self-recovery method based on multiple probing as claimed in claim 2, wherein the service process ID is queried in the monitoring script by the service name, and if the service process ID is returned, it indicates that the service process is online, and if the service process ID is not returned, it indicates that the service process is in an offline abnormal state.
4. The service self-recovery method based on multiple probing as claimed in claim 2, wherein the monitoring script accesses the test page and obtains the return value of the test page through the CURL command, if the obtained return value is normal, it indicates that the test page responds normally, and the service is in a normal state; if no return value exists, the test page response is abnormal, and the service is in an abnormal state of no response in a false death.
5. The multiple probing based service self-healing method of claim 1, wherein when the monitoring script is configured in step S1, a service variable and a test page address are defined in the monitoring script.
6. The diverse probing based service self-recovery method as claimed in claim 1, wherein the failure environment information includes network connection status information and Jave virtual machine information related to the service, the network connection status information and current connection concurrency number information are counted in the monitoring script through a Net Stat console command, and the network connection status information includes a routing table, actual network connection and status information of each network interface device.
7. The diverse-probe-based service self-recovery method of claim 6, wherein the Jave virtual machine information comprises Jave stack information and Jave heap memory information; the monitoring script calls a Jstack tracking tool, Java stack information is obtained according to the service process ID, the current thread snapshot information of the Java virtual machine is generated by the Java stack information, and the thread snapshot information comprises the stack information of each thread; the monitoring script calls a Jmap heap memory tracking tool to acquire memory mapping information or heap memory information of the Java process and reflect memory mirror images used by the Java heap, wherein the memory mirror images include system information, virtual machine attributes, complete thread transfer storage and state information of all classes and objects.
CN202110384903.4A 2021-04-09 2021-04-09 Service self-recovery method based on various detection Active CN113064762B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110384903.4A CN113064762B (en) 2021-04-09 2021-04-09 Service self-recovery method based on various detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110384903.4A CN113064762B (en) 2021-04-09 2021-04-09 Service self-recovery method based on various detection

Publications (2)

Publication Number Publication Date
CN113064762A true CN113064762A (en) 2021-07-02
CN113064762B CN113064762B (en) 2024-02-23

Family

ID=76566247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110384903.4A Active CN113064762B (en) 2021-04-09 2021-04-09 Service self-recovery method based on various detection

Country Status (1)

Country Link
CN (1) CN113064762B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method
CN113821415A (en) * 2021-11-24 2021-12-21 飞狐信息技术(天津)有限公司 Processing method of program fault and related device
CN114338419A (en) * 2021-12-15 2022-04-12 中电信数智科技有限公司 IPv6 global networking edge node monitoring and early warning method and system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004001555A2 (en) * 2002-06-25 2003-12-31 International Business Machines Corporation Method and system for monitoring performance of application in a distributed environment
CN102394774A (en) * 2011-10-31 2012-03-28 广东电子工业研究院有限公司 Service state monitoring and failure recovery method for controllers of cloud computing operating system
US20140059392A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Protecting virtual machines against storage connectivity failures
CN105320585A (en) * 2014-07-08 2016-02-10 北京启明星辰信息安全技术有限公司 Method and device for achieving application fault diagnosis
CN105490870A (en) * 2015-11-20 2016-04-13 浪潮电子信息产业股份有限公司 Method for monitoring operation state of Linux server in batch
WO2017049997A1 (en) * 2015-09-25 2017-03-30 华为技术有限公司 Virtual machine monitoring method, apparatus and system based on cloud computing service
CN107423198A (en) * 2017-07-10 2017-12-01 中核核电运行管理有限公司 A kind of EAM platform monitorings management method and system
CN107797901A (en) * 2017-10-25 2018-03-13 四川长虹电器股份有限公司 A kind of storehouse analysis and the implementation method of mail Realtime Alerts
CN108566314A (en) * 2018-03-06 2018-09-21 平安科技(深圳)有限公司 The acquisition methods and storage medium of status information under electronic device, cluster environment
CN110798375A (en) * 2019-09-29 2020-02-14 烽火通信科技股份有限公司 Monitoring method, system and terminal equipment for enhancing high availability of container cluster
CN111400125A (en) * 2020-02-13 2020-07-10 中国平安人寿保险股份有限公司 Memory overflow monitoring method, device, equipment and storage medium of JAVA process
CN111796954A (en) * 2020-05-27 2020-10-20 深圳壹账通智能科技有限公司 Watchdog control method, device, equipment and storage medium based on JVM

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004001555A2 (en) * 2002-06-25 2003-12-31 International Business Machines Corporation Method and system for monitoring performance of application in a distributed environment
CN102394774A (en) * 2011-10-31 2012-03-28 广东电子工业研究院有限公司 Service state monitoring and failure recovery method for controllers of cloud computing operating system
US20140059392A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Protecting virtual machines against storage connectivity failures
CN105320585A (en) * 2014-07-08 2016-02-10 北京启明星辰信息安全技术有限公司 Method and device for achieving application fault diagnosis
WO2017049997A1 (en) * 2015-09-25 2017-03-30 华为技术有限公司 Virtual machine monitoring method, apparatus and system based on cloud computing service
CN105490870A (en) * 2015-11-20 2016-04-13 浪潮电子信息产业股份有限公司 Method for monitoring operation state of Linux server in batch
CN107423198A (en) * 2017-07-10 2017-12-01 中核核电运行管理有限公司 A kind of EAM platform monitorings management method and system
CN107797901A (en) * 2017-10-25 2018-03-13 四川长虹电器股份有限公司 A kind of storehouse analysis and the implementation method of mail Realtime Alerts
CN108566314A (en) * 2018-03-06 2018-09-21 平安科技(深圳)有限公司 The acquisition methods and storage medium of status information under electronic device, cluster environment
WO2019169765A1 (en) * 2018-03-06 2019-09-12 平安科技(深圳)有限公司 Electronic device, method for acquiring state information in cluster environment, system, and storage medium
CN110798375A (en) * 2019-09-29 2020-02-14 烽火通信科技股份有限公司 Monitoring method, system and terminal equipment for enhancing high availability of container cluster
CN111400125A (en) * 2020-02-13 2020-07-10 中国平安人寿保险股份有限公司 Memory overflow monitoring method, device, equipment and storage medium of JAVA process
CN111796954A (en) * 2020-05-27 2020-10-20 深圳壹账通智能科技有限公司 Watchdog control method, device, equipment and storage medium based on JVM

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEIYU CHEN等: "《Dynamic monitor based service recovery for composite service in MANETs》", 《2008 11TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY》, 12 November 2008 (2008-11-12), pages 557 - 560 *
刘嘉裕: "基于分布式微服务全链路实时监控系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 1, 15 January 2019 (2019-01-15), pages 140 - 2514 *
张桐: "基于RMI的日志管理系统在GF生产线系统中的应用", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 3, 15 March 2015 (2015-03-15), pages 138 - 585 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113747171A (en) * 2021-08-06 2021-12-03 天津津航计算技术研究所 Self-recovery video decoding method
CN113747171B (en) * 2021-08-06 2024-04-19 天津津航计算技术研究所 Self-recovery video decoding method
CN113821415A (en) * 2021-11-24 2021-12-21 飞狐信息技术(天津)有限公司 Processing method of program fault and related device
CN114338419A (en) * 2021-12-15 2022-04-12 中电信数智科技有限公司 IPv6 global networking edge node monitoring and early warning method and system
CN114338419B (en) * 2021-12-15 2024-04-16 中电信数智科技有限公司 IPv6 global networking edge node monitoring and early warning method and system

Also Published As

Publication number Publication date
CN113064762B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
CN113064762B (en) Service self-recovery method based on various detection
US7941707B2 (en) Gathering information for use in diagnostic data dumping upon failure occurrence
US6167358A (en) System and method for remotely monitoring a plurality of computer-based systems
CN106844136B (en) Method and system for collecting program crash information
US20110107307A1 (en) Collecting Program Runtime Information
CN112506915B (en) Application data management system, processing method and device and server
CN111046011A (en) Log collection method, system, node, electronic device and readable storage medium
CN112416705A (en) Abnormal information processing method and device
CN111679955B (en) Monitoring diagnosis and snapshot analysis system for application server
CN113868021A (en) Method for detecting service state and automatically restarting
CN108984363A (en) A kind of method and system of concurrent testing
CN110806966A (en) Log management method and device, electronic equipment and computer storage medium
CN110149421A (en) Method for monitoring abnormality, system, device and the computer equipment of domain name system
CN114328243A (en) Abnormal operation data processing method, device, equipment and storage medium
CN113110965A (en) Abnormal information monitoring method and device, computer storage medium and terminal
CN103731315A (en) Server failure detecting method
US6530041B1 (en) Troubleshooting apparatus troubleshooting method and recording medium recorded with troubleshooting program in network computing environment
US9354962B1 (en) Memory dump file collection and analysis using analysis server and cloud knowledge base
CN109684220A (en) A kind of browser compatibility analysis method based on event replay
CN114398272A (en) Pressure measurement method and pressure measurement device for combination performance bottleneck positioning
CN115098378A (en) Method and device for classifying and aggregating log fragments based on abnormal breakpoints
CN116737514B (en) Automatic operation and maintenance method based on log and probe analysis
CN112527594A (en) Hard disk inspection method, device and system
JP2007141193A (en) Method for detecting memory leak applicable to real time property for wireless device
CN109376030B (en) System for capturing embedded operating system exceptions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant