CN111752741A - System performance detection method and device - Google Patents

System performance detection method and device Download PDF

Info

Publication number
CN111752741A
CN111752741A CN202010613893.2A CN202010613893A CN111752741A CN 111752741 A CN111752741 A CN 111752741A CN 202010613893 A CN202010613893 A CN 202010613893A CN 111752741 A CN111752741 A CN 111752741A
Authority
CN
China
Prior art keywords
data
detected
recovered
abnormal
target information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010613893.2A
Other languages
Chinese (zh)
Inventor
朱嘉伟
杨军
周杰
卢道和
陈刚
程志峰
罗海湾
李勋棋
汪晓雪
周琪
郭英亚
李兴龙
胡仲臣
周佳振
文玉茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202010613893.2A priority Critical patent/CN111752741A/en
Publication of CN111752741A publication Critical patent/CN111752741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for detecting system performance, which comprises the following steps: the method comprises the steps of obtaining a detection instruction, wherein the detection instruction comprises parameters of a system to be detected and a server IP of the system to be detected, obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected, executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed, diagnosing the data to be diagnosed to obtain result data, analyzing the result data to obtain normal data and abnormal data of the system to be detected, detecting the abnormal data in real time, reducing time and labor consumed by abnormal data detection when the system performance is abnormal, achieving automatic recovery of the abnormal data, and improving system performance detection efficiency.

Description

System performance detection method and device
Technical Field
The invention relates to the field of financial technology (Fintech), in particular to a method and a device for detecting system performance.
Background
With the development of computer technology, more and more technologies (such as distributed, cloud computing or big data) are applied in the financial field, the traditional financial industry is gradually shifting to the financial technology, the big data technology is no exception, but higher requirements are also put forward on the big data technology due to the security and real-time requirements of the financial and payment industries.
Based on system performance monitoring, there are generally two methods, one is to complete monitoring of performance indexes such as a server CPU, a memory, an IO, a business transaction, and the like by installing an agent, then configure a relevant alarm policy and a parameter threshold value of each system through a terminal interface, and display an alarm result on a terminal. And the other is to capture the processing conditions of the upstream and downstream systems according to the serial number, correlate the data streams of the upstream and downstream systems or products by analyzing transaction channel information, perform multidimensional monitoring, and display the monitoring data by using a terminal.
However, the two methods have the problems that the former is minute-level timing detection, which lacks effectiveness, and is limited to acquiring server performance and application process monitoring, lacks analysis on real-time operation performance of the system, and also lacks analysis processing on virtual machine byte codes, so that when the system fails abnormally, abnormal information which is only server basic resources and service transactions is displayed at a terminal, and abnormal threads, classes and methods of the system cannot be captured and displayed in real time. The latter is lack of effectiveness, mainly focuses on the abnormality among the related data streams of each system, and cannot locate the on-line performance abnormality of the system in real time. And in the aspect of aiming at the problem of abnormal positioning, after the operation and maintenance personnel need to troubleshoot the problem in a fixed place, the log is checked by adjusting the log mode, or the log is analyzed by making a file, the whole abnormal positioning consumes a large amount of time and manpower, the efficiency of the abnormal positioning is low, the abnormal positioning time is prolonged, the fault time is prolonged, and the risk and the loss are improved.
Disclosure of Invention
The embodiment of the invention provides a system performance detection method and device, which are used for improving system anomaly detection efficiency and realizing automatic recovery of anomaly data.
In a first aspect, an embodiment of the present invention provides a method for detecting system performance, including:
acquiring a detection instruction; the detection instruction comprises parameters of the system to be detected and a server IP of the system to be detected;
obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected; executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed;
and diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
According to the technical scheme, the target information corresponding to the system to be detected is obtained through the detection instruction, then the data to be diagnosed is obtained according to the target information, so that the diagnosis tool can detect the relevant parameters of the system to be detected to obtain the result data, and the abnormal data of the result data is analyzed, so that the abnormal data can be detected in real time, the time and labor consumed by detecting the abnormal data when the system performance is abnormal are reduced, and the system performance detection efficiency is improved.
Optionally, the obtaining target information according to the parameter of the system to be detected and the server IP of the system to be detected includes:
converting the parameters of the system to be detected and the server IP of the system to be detected into data in a preset format;
and sending the data in the preset format to a configuration management database to obtain the corresponding target information of the system to be detected.
In the technical scheme, the data of the system to be detected and the data of the corresponding server are obtained according to the parameters of the system to be detected and the server IP of the system to be detected, the data formats of different systems to be detected and corresponding servers can be unified, the performance of different systems to be detected can be detected by the same method, and the range of system performance detection is widened.
Optionally, the preset script program is provided with an introduction parameter;
executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed, wherein the preset script program comprises:
and combining the detection instruction and the target information with the introduced parameters to obtain input parameters of the preset script program, and sending the input parameters to the preset script program to obtain data to be diagnosed in a corresponding format for diagnosing by a diagnostic tool.
According to the technical scheme, the data to be diagnosed, which can be used for diagnosing by the diagnostic tool, is obtained by combining the detection instruction and the target information with the introduced parameters, so that the diagnostic tool diagnoses the data to be diagnosed, diagnoses the related data of the system to be diagnosed, and no operation and maintenance personnel need to find the abnormal position of the system by methods such as log inspection and the like, thereby reducing the time and labor consumed by detecting the abnormal data when the system performance is abnormal, and improving the detection efficiency of the system performance.
Optionally, the analyzing the result data to obtain normal data and abnormal data of the system to be detected includes:
using a first circulation text to perform line-by-line data disassembly and analysis on the result data to obtain a state bit identifier of each line of data;
if the status identification bit is a first identification, determining the data of the status bit identification in each row of data belonging to the first identification as the normal data;
and if the state identification bit is a second identification, determining the data of the state bit identification in each row of data belonging to the second identification as the abnormal data.
According to the technical scheme, according to the preset first identification and the second identification, abnormal data in the result data are detected, and time and labor consumed in abnormal data detection are reduced, so that the fault duration is reduced, and the system performance detection efficiency is improved.
Optionally, after obtaining the data to be recovered, the method further includes:
performing data reorganization on the abnormal data to obtain data to be recovered;
sending the data to be recovered to an alarm platform to enable the alarm platform to carry out data alarm and obtain alarm data;
splicing the alarm data and the target information into data with a preset format, and matching a standard operation program of the data to be recovered; the standard operation program is preset by a user;
and executing a standard operation program of the data to be recovered, and performing abnormal data recovery on the data to be recovered.
According to the technical scheme, the alarm data corresponding to the abnormal data are matched according to the preset standard operation program, and then the data to be recovered are recovered to normal data through the standard operation program, so that the automatic recovery of the abnormal data is realized, the operation and maintenance efficiency is improved, the time from the system abnormity to the system normal recovery is shortened, and the manual labor is liberated.
Optionally, the performing data marshalling on the abnormal data to obtain data to be recovered includes:
and separating each element of each row of data of the abnormal data by using a separator according to the second circulation text to obtain the data to be recovered.
In the data processing scheme, each line of abnormal data is detected according to the second cyclic text, elements in each line of abnormal data are separated and analyzed to obtain the status bit identifier of each element, normal elements and data to be recovered in the abnormal data are determined according to the status bit identifier of each element, the specific position of the data to be recovered can be obtained according to the normal data, the detection precision of the abnormal data is improved, and the detection efficiency of the system performance is improved.
Optionally, after performing the abnormal data recovery on the data to be recovered, the method further includes:
and sending alarm stop information to the alarm platform so that the alarm platform stops alarming according to the alarm stop information.
In the technical scheme, after the abnormal data is recovered, the corresponding alarm data is stopped, so that the time from the system abnormality to the system recovery is shortened.
In a second aspect, an embodiment of the present invention provides an apparatus for detecting system performance, including:
the acquisition module is used for acquiring a detection instruction; the detection instruction comprises parameters of the system to be detected and a server IP of the system to be detected;
the processing module is used for obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected; executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed;
and diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
Optionally, the processing module is specifically configured to:
converting the parameters of the system to be detected and the server IP of the system to be detected into data in a preset format;
and sending the data in the preset format to a configuration management database to obtain the corresponding target information of the system to be detected.
Optionally, the preset script program is provided with an introduction parameter;
the processing module is specifically configured to:
and combining the detection instruction and the target information with the introduced parameters to obtain input parameters of the preset script program, and sending the input parameters to the preset script program to obtain data to be diagnosed in a corresponding format for diagnosing by a diagnostic tool.
Optionally, the processing module is specifically configured to:
using a first circulation text to perform line-by-line data disassembly and analysis on the result data to obtain a state bit identifier of each line of data;
if the status identification bit is a first identification, determining the data of the status bit identification in each row of data belonging to the first identification as the normal data;
and if the state identification bit is a second identification, determining the data of the state bit identification in each row of data belonging to the second identification as the abnormal data.
Optionally, the processing module is specifically configured to:
after the data to be recovered is obtained, performing data reorganization on the abnormal data to obtain the data to be recovered;
sending the data to be recovered to an alarm platform to enable the alarm platform to carry out data alarm and obtain alarm data;
splicing the alarm data and the target information into data with a preset format, and matching a standard operation program of the data to be recovered; the standard operation program is preset by a user;
and executing a standard operation program of the data to be recovered, and performing abnormal data recovery on the data to be recovered.
Optionally, the processing module is specifically configured to:
and separating each element of each row of data of the abnormal data by using a separator according to the second circulation text to obtain the data to be recovered.
Optionally, the processing module is further configured to:
and after the abnormal data of the data to be recovered is recovered, sending alarm stop information to the alarm platform so that the alarm platform stops alarming according to the alarm stop information.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the system performance detection method according to the obtained program.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to enable a computer to execute the method for detecting the system performance.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a system architecture diagram according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a method for detecting system performance according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a system performance detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 exemplarily shows a system architecture to which an embodiment of the present invention is applicable, and the system architecture includes a terminal 100, a performance detection system 200, a container management system 300, a virtual machine 400, and a functional system 500.
The terminal 100 is used for sending and receiving management of the detection command, including establishing a communication group (e.g., a system performance analysis WeChat communication group) and sending and receiving the detection command. For example, triggering a logic code switch (e.g., "robot flag") according to a fixed instruction (e.g., "install robot"), activating a communication group for performance detection of the system to be detected, splicing parameters of the system to be detected and the server IP into a json data format after receiving an event triggered and submitted by a user clicking a button, and pushing the json data format to the performance detection system 200.
The performance detection system 200 includes: a management module 210, a transparent transmission module 220, an marshalling module 230, and a recovery module 240.
The management module 210 is configured to transmit and convert data according to a transmission protocol (e.g., a post method of an HTTP protocol), and the management module 210 is constructed by a backend framework (e.g., a Spring Boot).
The transparent transmission module 220 is configured to execute the target information by using a preset script program, obtain data to be diagnosed, and send the data to be diagnosed to the container management system 300 or the virtual machine 400, so that a diagnostic tool (e.g., arths) in the container management system 300 or the virtual machine 400 performs diagnosis to obtain result data.
The compiling module 230 is configured to use a first loop text (e.g., while-r loop statements) to disassemble and analyze each line of data in the result data, distinguish normal data from abnormal data according to status bit identifiers of each line of data, and execute a second loop text (e.g., for loop statements) on the abnormal data to obtain data to be recovered.
The recovery module 240 is configured to send the data to be recovered to the functional system 500, obtain the alarm data, and match the corresponding standard operation program according to the alarm data, so that the data to be recovered is recovered to the normal data.
The container management system 300 is configured to establish an instruction task according to the data to be diagnosed sent by the transparent transmission module 220, diagnose the data to be diagnosed according to the deployed diagnostic tool of the system to be diagnosed, obtain a data result, and send the data result to the performance detection system 200. And the method is also used for calling a standard operating program to recover abnormal data.
The virtual machine 400 is configured to establish an instruction task according to the data to be diagnosed sent by the transparent transmission module 220, diagnose the data to be diagnosed according to the deployed diagnostic tool of the system to be diagnosed, obtain a data result, and send the data result to the performance detection system 200. And the method is also used for calling a standard operating program to recover abnormal data.
The functional system 500 includes: a configuration management database 510, an alarm platform 520, and program modules 530.
The configuration management database 510 is used to configure basic data such as deployment areas of the systems, IP servers, databases of the systems, and the like.
The alarm platform 520 is used for performing an alarm according to the abnormal data and recovering and eliminating the alarm in the abnormal situation.
Program module 530, which is used to provide standard operating procedures for exception data recovery for various systems, servers, databases, and threads.
It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.
Based on the above description, fig. 2 exemplarily shows a flow of a method for system performance detection provided by an embodiment of the present invention, and the flow can be performed by an apparatus for system performance detection.
As shown in fig. 2, the process specifically includes:
step 201, acquiring a detection instruction.
In the embodiment of the present invention, the detection instruction includes a parameter of the system to be detected and a server IP of the system to be detected, for example, the detection instruction includes @ robot sys _ name IP dashboard (an instruction to detect a real-time data panel of the current system to be detected), @ robot sys _ name IP thread (an instruction to detect thread stack information of a current virtual machine), and @ robot sys _ name IP jvm (an instruction to detect information of a current virtual machine), where the real-time data panel includes a thread ID, a thread name, a memory, running time, and the like, and the thread stack information includes a query thread, a thread name, CPU consumption, a blocked thread, an interrupted thread, and the like.
Step 202, obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected; and executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed.
According to the embodiment of the invention, after the detection instruction is received, the detection instruction is sent to the configuration management database to obtain the target information, and the target information is used as a parameter to execute the preset script program to obtain the data to be diagnosed.
Further, converting the parameters of the system to be detected and the server IP of the system to be detected into data in a preset format; and sending the data in the preset format to a configuration management database to obtain the corresponding target information of the system to be detected.
In the embodiment of the invention, after the detection instruction is obtained, the detection instruction is firstly converted into data in a preset format, for example, the parameter of the system to be detected in the detection instruction and the server IP of the system to be detected are spliced into a json data format (preset format), then the converted data is sent to the configuration management database to obtain the target information corresponding to the converted data, for example, a post method of an HTTP protocol is used to request an application programming interface of the configuration management database, and the target information (deployment area, server number and the like) of the system to be detected is determined according to the configuration management database.
Furthermore, leading-in parameters are set in the preset script program; and combining the detection instruction and the target information to introduce parameters to obtain input parameters of a preset script program, and sending the input parameters to the preset script program to obtain data to be diagnosed in a corresponding format for diagnosing by the diagnosis tool.
In the embodiment of the invention, before the execution of the preset script program, input parameters of the preset script program are obtained through a detection instruction and target information, for example, when a user clicks a button to trigger a submission event, the detection instruction is received, the detection instruction is @ robot sys _ name IP dashboard, @ robot sys _ name IP thread and @ robot sys _ name IPjvm, then the detection instruction and the target information are combined with an introduced parameter "-m" in the shell script program to obtain the input parameters of the preset script program, namely, wesh _ arms _ main.m-scanner sysname IP, sh _ arms _ main.sh-m-thread sysnet IP and sh _ arms _ main.m jvm sysn IP, then the obtained input parameters are sent to the preset script program, the preset script program obtains data to be diagnosed corresponding to the input parameters after the execution of the preset script program, for example, the preset script program is the shell program, and sending the obtained input parameters to a shell script program, and then executing the shell script program to obtain corresponding data to be diagnosed, namely dashboards (real-time panel data), threads (thread stack information) and jvm (thread information such as current virtual machine activity, maximum activity, deadlock and the like).
And 203, diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
In the embodiment of the invention, the data to be diagnosed is diagnosed by a container management system (such as a kubernets, K8S container arrangement engine) or a diagnosis tool deployed by a virtual machine, so as to obtain a data result. For example, a diagnosis tool (e.g., an agent that is not started up is deployed in advance when a server is initialized) deployed in advance in a virtual machine or a container management system is executed through an SSH (Secure Shell) protocol, when the diagnosis tool obtains the data to be diagnosed obtained through a Shell script program, an instruction task corresponding to the data to be diagnosed is established, the data to be diagnosed is captured through the Shell script program, and the data to be diagnosed is analyzed and detected, for example: when the data to be diagnosed is obtained as dashboards (real-time panel data), threads (thread stack information) and jvm (thread information such as current virtual machine activity, maximum activity, deadlock and the like), establishing an instruction task corresponding to the data to be diagnosed as follows: sh we _ arths, sh dashboard (obtains real-time panel data specifying refresh times), sh we _ arths, sh jvm (obtains thread information such as active, maximum active, deadlock of the current virtual machine). And then capturing data to be diagnosed through a shell script program (we _ arths.sh), analyzing and detecting to obtain a corresponding result, integrating and summarizing the result according to the shell script program to obtain result data, and storing the result data in a local file.
The diagnostic tool is obtained by modifying an open source diagnostic tool Arthas by using JAVA language, specifically, an interactive mode of telnet protocol of the diagnostic tool Arthas is disconnected and encapsulated into a diagnostic agent mode, a protocol transmission interface is opened to receive parameter instructions for data capture, and the diagnostic agent mode is deployed in a virtual machine or a container management system.
Further, using a first circulating text to perform data line-by-line data disassembly and analysis on the result data to obtain a status bit identifier of each line of data;
if the status identification bit is a first identification, determining the data of the status bit identification in each row of data belonging to the first identification as the normal data;
and if the state identification bit is a second identification, determining the data of the state bit identification in each row of data belonging to the second identification as the abnormal data.
According to the embodiment of the invention, each line of data in the result data is analyzed according to the first circulation text, and abnormal data in the result data is distinguished according to the preset first identification and second identification and the state bit identification of each line of data. For example, the result data file is disassembled and analyzed line by using while-r circulating text, normal data and abnormal data are distinguished according to the status bit identifier ststus _ flag of each line of data and the first identifier and the second identifier, and then the distinguished normal data file or abnormal data file is recorded.
Then, after the data to be recovered is obtained, performing data reorganization on the abnormal data to obtain the data to be recovered; sending the data to be recovered to an alarm platform so that the alarm platform carries out data alarm and obtains alarm data; splicing the alarm data and the target information into data with a preset format, and matching a standard operation program of the data to be recovered; wherein, the standard operation program is preset by a user; and executing a standard operation program of the data to be recovered, and performing abnormal data recovery on the data to be recovered.
In the embodiment of the present invention, after the abnormal data is analyzed, the abnormal data is subjected to data reorganization to obtain data to be recovered, and specifically, each element of each line of data of the abnormal data is separated by using a separator according to the second loop text to obtain the data to be recovered.
Illustratively, after the normal data and the abnormal data are obtained, the abnormal data are processed by using a second circulation text, elements in each row of abnormal data are separated to obtain elements in each row of data, so that a diagnosis tool diagnoses each element to determine the abnormal elements, and then the positions of the abnormal elements in the abnormal data are positioned by combining with the normal file to obtain the data to be recovered. For example, a for loop statement is used to loop the exception data, a "| |" separator is used, a text processing tool (such as Awk) is used to intercept the status bit identifier of each row of data elements in the exception data, the status bit identifier of each element is analyzed to obtain an exception element in the exception data, then an insertion statement (such as insert) is used to process the exception element, the exception element with errors such as comma format and the like in the exception data is corrected and filtered, the exception element in the filtered exception data is determined, the filtered exception element is integrated and archived to obtain data to be recovered (such as report _ artas), and the data to be recovered is stored locally.
According to the embodiment of the invention, the corresponding alarm data is obtained through the data to be recovered, and then the standard operation program corresponding to the data to be recovered is matched according to the alarm data, so that the standard operation program can recover abnormal data.
It should be noted that, the diagnostic tool in the above technical solution further includes a diagnostic tool btrace, and the alarm platform further includes an Open source operation and maintenance monitoring system Open-Falcon.
Illustratively, a standard operation program preset by a user is stored in a program module, after data to be recovered is obtained, the data to be recovered is sent to an alarm platform, so that the alarm platform alarms an abnormal area in a system to be detected according to the data to be recovered, alarm data generated by the alarm platform is obtained, then the alarm data, parameters of the system to be detected and target information corresponding to a server IP of the system to be detected are spliced into data in a preset format, the data are matched with the standard operation program, then the successfully matched standard operation program is executed, and abnormal data recovery is performed on the data to be recovered. For example, data to be restored is converted into a json data format, a post method of an HTTP protocol is used to request an application program interface (ims _ alarm _ collector _ alarm.do) provided by an alarm platform to obtain alarm data corresponding to the data to be restored, the data to be restored and target information are spliced into the json data format, a post method of the HTTP protocol is used to request an application program interface (get _ sop _ collector.do) provided by a program module to obtain a standard operation program of matched data to be restored, then a fastjson method is used to analyze the operation program, for example { "sysnet": ab "," ip ": ip _ data", "dcn": dcn _ data "}, data in parentheses are extracted, and the data are analyzed in order of" key: value' is formed in pairs, data is separated from data by a number, and finally a standard operation program is analyzed and executed to restore the data to be restored to normal data.
And after abnormal data recovery is carried out on the data to be recovered, sending alarm stop information to the alarm platform so that the alarm platform stops alarming according to the alarm stop information.
In the embodiment of the invention, after the abnormal data is recovered to the normal data, the alarm platform stops alarming through the transmission protocol, for example, after the abnormal data is recovered to the normal data, a post method of the HTTP protocol is used for requesting an externally provided application program interface of the alarm platform to send the alarm stop information, so that the alarm platform stops alarming.
According to the embodiment of the invention, the detection instruction issued by the user is obtained through the terminal, the system performance is positioned in real time, the program automatically executes the detection instruction, the analysis is completed, and the feedback result is automatically executed, each index of the system automatically performs performance detection, and the abnormal data is automatically recovered by combining the alarm platform and the standard operation program, so that an effective closed loop is formed, the artificial participation is reduced, the time length of the influence of the abnormal fault of the system is shortened, the system performance detection and troubleshooting efficiency is improved, and the method is simultaneously applied to the virtual machine and the container configuration management system through the preset diagnosis tool, so that the use range is expanded.
Based on the same technical concept, fig. 3 schematically illustrates a structure provided by an embodiment of the present invention, and a flow that the apparatus can execute.
As shown in fig. 3, the apparatus specifically includes:
an obtaining module 301, configured to obtain a detection instruction; the detection instruction comprises parameters of the system to be detected and a server IP of the system to be detected;
the processing module 302 is configured to obtain target information according to the parameter of the system to be detected and the server IP of the system to be detected; executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed;
and diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
Optionally, the processing module 302 is specifically configured to:
converting the parameters of the system to be detected and the server IP of the system to be detected into data in a preset format;
and sending the data in the preset format to a configuration management database to obtain the corresponding target information of the system to be detected.
Optionally, the preset script program is provided with an introduction parameter;
the processing module 302 is specifically configured to:
and combining the detection instruction and the target information with the introduced parameters to obtain input parameters of the preset script program, and sending the input parameters to the preset script program to obtain data to be diagnosed in a corresponding format for diagnosing by a diagnostic tool.
Optionally, the processing module 302 is specifically configured to:
using a first circulation text to perform line-by-line data disassembly and analysis on the result data to obtain a state bit identifier of each line of data;
if the status identification bit is a first identification, determining the data of the status bit identification in each row of data belonging to the first identification as the normal data;
and if the state identification bit is a second identification, determining the data of the state bit identification in each row of data belonging to the second identification as the abnormal data.
Optionally, the processing module 302 is specifically configured to:
after the data to be recovered is obtained, performing data reorganization on the abnormal data to obtain the data to be recovered;
sending the data to be recovered to an alarm platform to enable the alarm platform to carry out data alarm and obtain alarm data;
splicing the alarm data and the target information into data with a preset format, and matching a standard operation program of the data to be recovered; the standard operation program is preset by a user;
and executing a standard operation program of the data to be recovered, and performing abnormal data recovery on the data to be recovered.
Optionally, the processing module 302 is specifically configured to:
and separating each element of each row of data of the abnormal data by using a separator according to the second circulation text to obtain the data to be recovered.
Optionally, the processing module 302 is further configured to:
and after the abnormal data of the data to be recovered is recovered, sending alarm stop information to the alarm platform so that the alarm platform stops alarming according to the alarm stop information.
Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the system performance detection method according to the obtained program.
Based on the same technical concept, the embodiment of the invention also provides a computer-readable storage medium, which stores computer-executable instructions for causing a computer to execute the method for detecting the system performance.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method for system performance detection, comprising:
acquiring a detection instruction; the detection instruction comprises parameters of the system to be detected and a server IP of the system to be detected;
obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected; executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed;
and diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
2. The method according to claim 1, wherein the obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected comprises:
converting the parameters of the system to be detected and the server IP of the system to be detected into data in a preset format;
and sending the data in the preset format to a configuration management database to obtain the corresponding target information of the system to be detected.
3. The method of claim 1, wherein the preset script program is provided with an import parameter;
executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed, wherein the preset script program comprises:
and combining the detection instruction and the target information with the introduced parameters to obtain input parameters of the preset script program, and sending the input parameters to the preset script program to obtain data to be diagnosed in a corresponding format for diagnosing by a diagnostic tool.
4. The method of claim 1, wherein analyzing the result data to obtain normal data and abnormal data of the system to be detected comprises:
using a first circulation text to perform line-by-line data disassembly and analysis on the result data to obtain a state bit identifier of each line of data;
if the status identification bit is a first identification, determining the data of the status bit identification in each row of data belonging to the first identification as the normal data;
and if the state identification bit is a second identification, determining the data of the state bit identification in each row of data belonging to the second identification as the abnormal data.
5. The method of claim 1, wherein after obtaining the data to be recovered, further comprising:
performing data reorganization on the abnormal data to obtain data to be recovered;
sending the data to be recovered to an alarm platform to enable the alarm platform to carry out data alarm and obtain alarm data;
splicing the alarm data and the target information into data with a preset format, and matching a standard operation program of the data to be recovered; the standard operation program is preset by a user;
and executing a standard operation program of the data to be recovered, and performing abnormal data recovery on the data to be recovered.
6. The method of claim 5, wherein the performing data marshalling on the abnormal data to obtain data to be recovered comprises:
and separating each element of each row of data of the abnormal data by using a separator according to the second circulation text to obtain the data to be recovered.
7. The method according to any one of claims 1 to 6, wherein after performing abnormal data recovery on the data to be recovered, further comprising:
and sending alarm stop information to the alarm platform so that the alarm platform stops alarming according to the alarm stop information.
8. An apparatus for system performance testing, comprising:
the acquisition module is used for acquiring a detection instruction; the detection instruction comprises parameters of the system to be detected and a server IP of the system to be detected;
the processing module is used for obtaining target information according to the parameters of the system to be detected and the server IP of the system to be detected; executing a preset script program according to the detection instruction and the target information to obtain data to be diagnosed;
and diagnosing the data to be diagnosed to obtain result data, and analyzing the result data to obtain normal data and abnormal data of the system to be detected.
9. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to perform the method of any of claims 1 to 7 in accordance with the obtained program.
10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202010613893.2A 2020-06-30 2020-06-30 System performance detection method and device Pending CN111752741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010613893.2A CN111752741A (en) 2020-06-30 2020-06-30 System performance detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010613893.2A CN111752741A (en) 2020-06-30 2020-06-30 System performance detection method and device

Publications (1)

Publication Number Publication Date
CN111752741A true CN111752741A (en) 2020-10-09

Family

ID=72676872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010613893.2A Pending CN111752741A (en) 2020-06-30 2020-06-30 System performance detection method and device

Country Status (1)

Country Link
CN (1) CN111752741A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948217A (en) * 2021-03-29 2021-06-11 腾讯科技(深圳)有限公司 Server repair checking method and device, storage medium and electronic equipment
CN113590369A (en) * 2021-07-23 2021-11-02 上海淇玥信息技术有限公司 Method and device for virtual machine diagnosis and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948217A (en) * 2021-03-29 2021-06-11 腾讯科技(深圳)有限公司 Server repair checking method and device, storage medium and electronic equipment
CN113590369A (en) * 2021-07-23 2021-11-02 上海淇玥信息技术有限公司 Method and device for virtual machine diagnosis and electronic equipment
CN113590369B (en) * 2021-07-23 2024-05-28 上海淇玥信息技术有限公司 Method and device for virtual machine diagnosis and electronic equipment

Similar Documents

Publication Publication Date Title
EP3036633B1 (en) Cloud deployment infrastructure validation engine
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
Lou et al. Software analytics for incident management of online services: An experience report
US8041996B2 (en) Method and apparatus for time-based event correlation
CN107807877B (en) Code performance testing method and device
US20130311977A1 (en) Arrangement and method for model-based testing
WO2016188100A1 (en) Information system fault scenario information collection method and system
EP3591485B1 (en) Method and device for monitoring for equipment failure
CN111752741A (en) System performance detection method and device
CN109240851A (en) A kind of autonomous type realization self-healing method and system of batch BMC
US11740999B2 (en) Capturing transition stacks for evaluating server-side applications
CN110489317A (en) Cloud system task run method for diagnosing faults and system based on workflow
CN115664939A (en) Comprehensive operation and maintenance method and device based on automation technology and storage medium
CN109634175B (en) Method and system for controlling dynamic verification of configuration program
CN115114064A (en) Micro-service fault analysis method, system, equipment and storage medium
CN116107794B (en) Ship software fault automatic diagnosis method, system and storage medium
CN117220917A (en) Network real-time monitoring method based on cloud computing
CN111813872B (en) Method, device and equipment for generating fault troubleshooting model
CN115529227A (en) Link tracking and abnormity diagnosis method based on Web request
CN112181759A (en) Method for monitoring micro-service performance and diagnosing abnormity
CN116414609A (en) Fault analysis method, device, electronic equipment and storage medium
CN113138872A (en) Abnormal processing device and method for database system
CN113037550B (en) Service fault monitoring method, system and computer readable storage medium
CN106339285A (en) Method for analyzing unexpected restart of LINUX system
CN106649039B (en) A kind of fault-tolerant method of C language monitoring software under embedded Linux system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination