CN113568804A - Web application-oriented performance bottleneck accurate positioning system - Google Patents

Web application-oriented performance bottleneck accurate positioning system Download PDF

Info

Publication number
CN113568804A
CN113568804A CN202110633754.0A CN202110633754A CN113568804A CN 113568804 A CN113568804 A CN 113568804A CN 202110633754 A CN202110633754 A CN 202110633754A CN 113568804 A CN113568804 A CN 113568804A
Authority
CN
China
Prior art keywords
data
bottleneck
module
performance
positioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110633754.0A
Other languages
Chinese (zh)
Inventor
李晖
丁玺润
闵圣天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Youlian Borui Technology Co ltd
Original Assignee
Guizhou Youlian Borui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Youlian Borui Technology Co ltd filed Critical Guizhou Youlian Borui Technology Co ltd
Priority to CN202110633754.0A priority Critical patent/CN113568804A/en
Publication of CN113568804A publication Critical patent/CN113568804A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution

Abstract

A performance bottleneck accurate positioning system for Web application comprises a data display module and a data query module at the front end, and a system monitoring module, a data storage module, an anomaly detection module and a bottleneck positioning module at the rear end. According to the performance bottleneck accurate positioning system for Web application, the mutual influence among the performance indexes of each level is considered in the layered positioning, the performance index data of which level is abnormal can be clearly known through layered abnormal detection, and the bottleneck positioning accuracy is improved. After the bottleneck is positioned at a specific certain level, the corresponding codes of the system are positioned by matching with the calling link data when the bottleneck appears, so that the code-level accurate positioning of the performance bottleneck is realized.

Description

Web application-oriented performance bottleneck accurate positioning system
Technical Field
The invention relates to the field of computers, and particularly provides a performance bottleneck accurate positioning system for Web application.
Background
The microservice architecture is a mainstream architecture mode of the current complex Web application, and essentially splits the whole service into a plurality of services, and the service flow of the foreground and the background can be called and transmitted among the plurality of microservices, so that the internal calling process of the Web system becomes extremely complex, and when the system is abnormal, the abnormal condition is difficult to search and locate. As the existing mainstream application form, Web application has a large number of user groups, the user experience is poor due to the reduction of the service quality, huge economic loss can be brought to enterprises, and higher requirements are put forward on the service performance of the Web application. The degradation of service quality is usually caused by system performance bottleneck, and how to quickly solve the system performance bottleneck problem is the key to improve application performance and thus provide satisfactory service for users.
The performance index data reflects the running state of the system and is a main reference basis for measuring the performance problem of the system. The system often accompanies the abnormity of performance indexes while generating performance bottlenecks, so the analysis of the performance bottlenecks mainly detects the abnormity of performance index data. The existing performance bottleneck analysis method mainly analyzes hardware resource indexes (CPU, IO, memory and the like) in the performance indexes. The Web application usually has an N-layer structure, and the components at each layer and the performance indexes thereof affect each other, and only the hardware resource indexes are analyzed while the influence of other performance indexes (such as middleware indexes and database indexes) on the hardware resource indexes is ignored, so that it is often difficult to find the real reason of the bottleneck generated by the system, and the efficiency of optimizing the system performance is reduced.
The performance bottleneck is mostly related to the code defect of the application program, and the application program with low-quality code is easy to become a fuse of the hardware resource bottleneck. When the performance bottleneck problem of the system is analyzed, if the accurate position of the bottleneck can be positioned, the corresponding code can also be accurately positioned, which is very helpful for analyzing the real reason of the bottleneck, and has very high research and application values.
Disclosure of Invention
In order to overcome the technical defects in the prior art, the invention provides a performance bottleneck accurate positioning system for Web application, which can effectively solve the problems in the background technology.
The invention is realized by the following technical scheme:
a performance bottleneck accurate positioning system for Web application comprises a data display module and a data query module at the front end, and a system monitoring module, a data storage module, an abnormality detection module and a bottleneck positioning module at the rear end, wherein the data display module is used for visually displaying collected data and an abnormality detection result; the data query module is used for querying data in a specific time range; the system monitoring module is used for monitoring a researched system by applying a performance management tool to collect data and feeding the data back to the front-end data display module for display; the anomaly detection module is used for detecting the performance index data and feeding an anomaly detection result back to the bottleneck positioning module for analysis; the bottleneck positioning module is used for determining the position of the bottleneck according to the information fed back by the anomaly detection module by combining a bottleneck positioning strategy, then matching the bottleneck point timestamp with the calling link, accurately positioning the corresponding code, and feeding the positioning result back to the data display module for display; the performance index data and the link data are stored in a MySQL database.
Advantageously, the positioning system further comprises a database module, wherein the database module comprises a call link information table (segment), a JVM _ CPU information table (JVM _ CPU), a heap memory information table (JVM _ memory _ heap), a GC information table (GC _ count), and an SQL information table (top _ n _ SQL), and the database module is used for storing call link data and performance index data.
Advantageously, the call link information table (segment) is used to store call path information for each request of the system.
Advantageously, the JVM _ CPU information table (JVM _ CPU) is used to store information related to the utilization of the system JVM _ CPU.
Advantageously, the heap memory information table (jvm _ memory _ heap) is used for storing information related to the utilization rate of the system heap memory and the like.
Advantageously, the GC information table (GC _ count) is used for storing information related to the number of GC operations performed by the system, the time, and the like.
Advantageously, the SQL information table (top _ n _ SQL) is used for recording related information such as execution time, link ID and the like of SQL statements with longer partial execution time.
Furthermore, the development environment of the performance bottleneck accurate positioning system is based on a Windows10 system, Java is used as a development language, an IEDA and Webstorm are used as a system integration development tool, the positioning system adopts a front-end and rear-end separated development mode, a background development framework adopts Spring + Spring MVC + Spring boot + MyBatis, and a front end adopts Vue + Element-ui + Echarts for development.
Furthermore, the positioning system also comprises a system interface module, the system interface module is used for providing service energy to the outside and interacting with the front-end page, the operation of the user on the front-end page can send corresponding requests to the interface to operate the data table, and the requests realize the interaction between the front-end page and the database by calling service. In actual use, if the query operation is carried out, the result obtained by directly querying the database is returned to the front end; if the operation is abnormal detection operation, the back-end data detection positioning module needs to be called, and then the analysis positioning result is fed back to the front end.
Furthermore, the positioning system further comprises a detection positioning module, wherein the detection positioning module is used for carrying out abnormity detection on the performance index data, confirming the bottleneck generation level according to a performance bottleneck positioning strategy, and positioning to a specific code by combining with a calling link. In actual use, firstly, corresponding data is inquired according to a time range transmitted from a front end; secondly, performing abnormity detection on the performance index data in the selection range through a combined detector; then analyzing according to the bottleneck positioning strategy and giving a bottleneck positioning result; and finally, matching the time stamp for calling the link data with the time stamp when the bottleneck occurs, accurately positioning the corresponding code, and returning the positioning result to the front end for displaying.
Further, the system monitoring module is configured to monitor the monitored system Uniplore by applying a performance management tool SkyWalking, and update performance index data and call link data to a database in real time, where the SkyWalking is mainly divided into two parts: the system comprises an Agent probe and a Collection Server, wherein the Agent probe and a monitored system are deployed in a peer directory of a Server A, a configuration file corresponding to a Tomcat Server is modified, the Agent sends data to a data Collection service Collection Server on a Server B at intervals of seconds, the Collection Server stores the data in a database MySQL in real time, and the MySQL and the Collection Server are installed on the same Server node.
Advantageously, the operation flow of the positioning system comprises the following steps:
s1, collecting performance index time sequence data of different types of Web application systems through a system monitoring tool;
s2, preprocessing the data;
s3, manually calibrating the data for training the abnormal detection model by using a data calibration tool;
s4, performing anomaly detection on the performance index data (time series) to obtain data anomaly points;
s5, analyzing the abnormal detection result to determine the position of the bottleneck;
and S6, realizing accurate positioning of bottleneck code level through matching the call link.
More beneficially, in the process of collecting performance index time series data of different types of the Web application system through the system monitoring tool, based on a one-stop big data analysis platform Uniplore of an actual application, the performance index data and the call link data are collected by monitoring the performance index time series data through the APM tool, and the collected data include hardware resource index CPU utilization rate, middleware index GC frequency, database index SQL execution time length and the like.
More beneficially, in the process of preprocessing the data, the collected performance index data is mainly integrated, and the required data is extracted, including data format conversion, alignment of timestamps of the performance index data of each level, elimination of useless fields and the like.
More beneficially, in the process of manually calibrating data for training the anomaly detection model by using the data calibration tool, the data calibration tool trainst is used to manually calibrate the data, a data file (CSV format) to be calibrated is imported into a data calibration page, and the data file needs to contain a timestamp (timestamp), a value (value) and a label attribute, wherein the timestamp needs to be in an ISO8601 format, such as 2019-03-13T21:11:29+00:00 or 2019-03-13T21:11:29Z, and an original timestamp (such as 20190313211129) needs to be converted into the format before calibration; the label default values are all 0 (the non-abnormal point label is 0), and the label is mainly divided into two parts: displaying the whole data and amplifying and displaying the data in the selected range.
More advantageously, in the process of obtaining data anomaly points by anomaly detection of performance index data (time series), different single detectors are designed based on the anomaly types of the collected data, and the different single detectors are combined into a combined detector capable of detecting multiple anomaly types to perform anomaly detection on the system performance index data, so that the accuracy of anomaly detection is improved.
Further, the detection operators of the single detector are divided into two types, one is a transform operator (converter), and one is a detection operator (detector), and the detector includes a threshold detector for detecting an outlier, a quantile detector for detecting an outlier, and a cluster detector for detecting an outlier cluster.
Further, the threshold calculation formula is Min-c < Gused < Max + c, where Min and Max are the set minimum and maximum thresholds, and c is the (Min, Max) increment and decrement factors, and when the data value exceeds the set range, the data value is regarded as an abnormal value.
Furthermore, the formula of the quantile detector is MinQ-c multiplied by R, MaxQ + c multiplied by R; minq ═ M × qmin;MaxQ=M×qmaxWhere M is the maximum value of the data, qminIn low fractional percentage, qmaxFor high fractional percentage, R is MinQ-MaxQ, MinQ and MaxQ are respectively the minimum and maximum of the fractional interval, and c is the interval reductionAnd the amplifying factor is used for accurately debugging the interval range according to different data, and when the value exceeds the set quantile interval range, the value is regarded as an abnormal value.
Still further, the operation of the cluster detector comprises the steps of: randomly choosing the center of k, C ═ C1c2c3...ck(ii) a Calculating the Euclidean distance, argmindis (c), from each data point to the centeri,x)2,ciE is C; the center is updated according to the formula,
Figure BDA0003104616600000051
steps 2 and 3 are repeated until the central data point no longer changes. A plurality of outliers of a cluster can be detected at once by the cluster detector.
Advantageously, the single detector further comprises a converter, the converter transforms the time series data points based on the sliding of the time window, the characteristics of the transformed data points are unchanged, the trend of the overall curve is changed, and the implementation process of the converter is as follows: sliding on time series data using two immediately adjacent sliding windows, for time series data center xcWherein x isc∈{x1 x2 x3...xt}; calculating agg (agg belongs to mean, mean) of the data center point adjacent to the sliding]) Mindian and mean are functions for finding the median and mean within a time window, i.e. ai=agg(wi),ai+1=agg(wi+1) (ii) a Calculating the transformed value of the data center point, xc=abs(ai+1-ai)。
Beneficially, the combined detector is composed of a converter and different single detectors in series or in parallel, detected data pass through the converter and the three detectors to obtain a final abnormal detection result, and a complex time sequence data curve is decomposed and transformed through the converter to be converted into a simple curve easy to process; then detecting the curve transformed by the converter through a threshold detector and a quantile detector; finally, the detection result is input to a clustering detector to detect undetected aggregative anomalies.
Further, in the process of analyzing the abnormal detection result to determine the bottleneck position, the priority of the problem is respectively a server problem, a network problem and a customer service problem from big to small, wherein the hardware mainly analyzes hardware resource indexes, and the software mainly relates to middleware indexes, database indexes and application program codes.
Further, according to the distribution characteristics of performance bottlenecks, the performance bottlenecks are mainly divided into three categories, namely hardware layer bottlenecks, middleware layer bottlenecks and application program bottlenecks, wherein the hardware layer bottlenecks refer to system performance problems caused by overhigh utilization rate of hardware resources of a server and comprise problems in the aspects of CPU (central processing unit), internal memory and disk IO (input/output); the middleware layer bottleneck comprises bottlenecks in application software and database systems such as a Web server and an application server, for example, a bottleneck caused by unreasonable parameter setting of a JDBC connection pool configured on a middleware platform; the application program bottleneck refers to application software written by a developer aiming at a certain application purpose of a user, and the application program layer bottleneck mainly comprises performance bottlenecks caused by unreasonable program design (serial processing, insufficient processing thread of a request, no buffer, no cache and the like), database design defects, unreasonable JVM parameters, slow SQL, unreasonable program architecture planning and the like.
Beneficially, the specific flow of locating the performance bottleneck is: firstly, carrying out abnormity detection on hardware layer resource indexes; then, performing abnormity detection on the performance indexes of the middleware layer or the database layer in the abnormal time period to obtain abnormal conditions of index data of different levels in the same time period; and finally, determining the level where the bottleneck really appears according to different judgment bases formulated based on the correlation among the performance indexes of different levels. The bottleneck is positioned to different levels through layered positioning, and the analysis efficiency of the calling link in the follow-up process can be improved.
Advantageously, in the process of realizing accurate positioning of bottleneck code levels through matching call links, performance bottlenecks between layers affect each other, and when a performance index of a lower layer is bottlenecked, a performance index of an upper layer is likely to be also bottlenecked. Normally, the bottleneck of the upper layer component does not affect the performance of the lower layer component, the normal lower layer component is within the constraint range of the upper layer component, but if the bottleneck occurs in the lower layer component, the upper layer component is necessarily affected within a certain time range. For example, if the performance bottleneck is caused by the application program, the system problem may be first reflected on the Web server, and as the errors of the Web server increase, more hardware resources are consumed, so that the utilization rate of the hardware resources increases, and a hardware layer bottleneck is caused.
The invention has the beneficial effects that: the utility model provides a performance bottleneck accurate positioning system towards Web application, hierarchical location has considered the mutual influence between the performance index of each level, detects through the layering anomaly, can clearly know which level's performance index data has appeared unusually, has increased the degree of accuracy of bottleneck location. After the bottleneck is positioned at a specific certain level, the corresponding codes of the system are positioned by matching with the calling link data when the bottleneck appears, so that the code-level accurate positioning of the performance bottleneck is realized. The method has the advantages that the application program is deeply analyzed in combination with the level of the bottleneck, the real reason of the bottleneck is determined, the application program is optimized, the performance optimization of the system is realized more efficiently based on the performance index data, the Web application level architecture is combined, the performance bottleneck layered positioning strategy is provided according to the correlation among the performance index data of different levels, and the bottleneck can be positioned more accurately. Based on the abnormal type of the performance index time sequence data collected by Skywalk, a combined detector capable of detecting various abnormal types is designed, so that the abnormal of different types can be detected more comprehensively, and the accuracy of abnormal detection is improved. By utilizing a system monitoring tool, a performance bottleneck accurate positioning system facing Web application is designed and realized based on performance index time sequence data and a request calling link, so that the accurate positioning of the performance bottleneck is more true for analyzing the cause of bottleneck generation.
Drawings
FIG. 1 is a framework diagram of a performance bottleneck accurate positioning system for Web applications according to the present invention;
FIG. 2 is a diagram of a database relational schema in accordance with the present invention;
FIG. 3 is a diagram of a call link table structure according to the present invention;
FIG. 4 is a diagram of a JVM-CPU table structure according to the present invention;
FIG. 5 is a diagram of a heap memory table of the present invention;
FIG. 6 is a view showing a structure of a GC information table in the present invention;
FIG. 7 is a block diagram of an SQL message table according to the invention;
FIG. 8 is a front-end and back-end interaction diagram of the present invention;
FIG. 9 is a back-end data processing flow of the present invention;
FIG. 10 is a schematic diagram of the system monitoring deployment of the present invention;
FIG. 11 is a block diagram of a performance bottleneck positioning concept of the present invention;
FIG. 12 is a schematic diagram of the operation of a single detector of the present invention;
FIG. 13 is a schematic diagram of the operation of the combined detector of the present invention;
FIG. 14 is a flow chart of bottleneck analysis in the present invention;
FIG. 15 is a flow chart of performance bottleneck positioning in the present invention;
fig. 16 is a diagram illustrating the performance bottleneck stratification in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a performance bottleneck accurate positioning system for Web application mainly includes a data display module and a data query module at a front end, and a system monitoring module, a data storage module, an anomaly detection module and a bottleneck positioning module at a rear end, where the data display module is used to visually display collected data and anomaly detection results; the data query module is used for querying data in a specific time range; the system monitoring module is used for monitoring a researched system by applying a performance management tool to collect data and feeding the data back to the front-end data display module for display; the anomaly detection module is used for detecting the performance index data and feeding an anomaly detection result back to the bottleneck positioning module for analysis; the bottleneck positioning module is used for determining the position of the bottleneck according to the information fed back by the anomaly detection module by combining a bottleneck positioning strategy, then matching the bottleneck point timestamp with the calling link, accurately positioning the corresponding code, and feeding the positioning result back to the data display module for display; the performance index data and the link data are stored in a MySQL database.
Specifically, as shown in fig. 2, the positioning system further includes a database module, where the database includes a call link information table (segment), a JVM _ CPU information table (JVM _ CPU), a heap memory information table (JVM _ memory _ heap), a GC information table (GC _ count), and an SQL information table (top _ n _ SQL), and the database is used to store the call link data and the performance index data.
In a preferred embodiment, as shown in fig. 3, the call link information table (segment) is used to store the call path information of each request of the system.
In a preferred embodiment, as shown in fig. 4, the JVM _ CPU information table (JVM _ CPU) is used to store information about the utilization rate of the system JVM _ CPU.
In a preferred embodiment, as shown in fig. 5, the heap memory information table (jvm _ memory _ heap) is used to store information related to the utilization rate of the system heap memory.
In a preferred embodiment, as shown in fig. 6, the GC information table (GC _ count) is used to store information related to the number of GC operations performed by the system, time, and the like.
In a preferred embodiment, as shown in fig. 7, the SQL information table (top _ n _ SQL) is used to record related information such as execution time and link ID of SQL statements with a long partial execution time.
Preferably, the development environment of the performance bottleneck accurate positioning system is based on a Windows10 system, Java is used as a development language, an IEDA and Webstorm are used as a system integration development tool, the positioning system adopts a front-end and rear-end separated development mode, a background development framework adopts Spring + Spring MVC + Spring boot + MyBatis, and a front end adopts Vue + Element-ui + Echarts for development.
Specifically, as shown in fig. 8, the positioning system further includes a system interface module, where the system interface module is configured to provide service capability to the outside and interact with a front-end page, and a user operating the front-end page sends a corresponding request to the interface to operate a data table, and the requests implement interaction between the front-end page and a database by calling a service. In actual use, if the query operation is carried out, the result obtained by directly querying the database is returned to the front end; if the operation is abnormal detection operation, the back-end data detection positioning module needs to be called, and then the analysis positioning result is fed back to the front end.
Specifically, as shown in fig. 9, the positioning system further includes a detection positioning module, where the detection positioning module is configured to perform anomaly detection on the performance index data, confirm a bottleneck generation level according to a performance bottleneck positioning policy, and position a specific code in combination with a call link. In actual use, firstly, corresponding data is inquired according to a time range transmitted from a front end; secondly, performing abnormity detection on the performance index data in the selection range through a combined detector; then analyzing according to the bottleneck positioning strategy and giving a bottleneck positioning result; and finally, matching the time stamp for calling the link data with the time stamp when the bottleneck occurs, accurately positioning the corresponding code, and returning the positioning result to the front end for displaying.
Specifically, as shown in fig. 10, the system monitoring module is configured to monitor the monitored system Uniplore by applying a performance management tool SkyWalking, and update performance index data and call link data into a database in real time, where the SkyWalking is mainly divided into two parts: the system comprises an Agent probe and a Collection Server, wherein the Agent probe and a monitored system are deployed in a peer directory of a Server A, a configuration file corresponding to a Tomcat Server is modified, the Agent sends data to a data Collection service Collection Server on a Server B at intervals of seconds, the Collection Server stores the data in a database MySQL in real time, and the MySQL and the Collection Server are installed on the same Server node.
Specifically, as shown in fig. 11, the operation flow of the positioning system includes the following steps:
s1, collecting performance index time sequence data of different types of Web application systems through a system monitoring tool;
s2, preprocessing the data;
s3, manually calibrating the data for training the abnormal detection model by using a data calibration tool;
s4, performing anomaly detection on the performance index data (time series) to obtain data anomaly points;
s5, analyzing the abnormal detection result to determine the position of the bottleneck;
and S6, realizing accurate positioning of bottleneck code level through matching the call link.
Specifically, in the process of collecting performance index time series data of different types of Web application systems through a system monitoring tool, based on a one-stop big data analysis platform Uniplore of practical application, an APM tool is used for monitoring the performance index time series data to collect performance index data and call link data, and the collected data comprises hardware resource index CPU utilization rate, middleware index GC frequency, database index SQL execution time and the like.
Specifically, in the process of preprocessing data, collected performance index data are mainly integrated, and required data are extracted, including data format conversion, alignment of timestamps of various levels of performance index data, elimination of useless fields and the like.
Specifically, in the process of manually calibrating data for training an abnormal detection model by using a data calibration tool, the data calibration tool train is used for manually calibrating the data, a data file (CSV format) to be calibrated is imported into a data calibration page and needs to contain a timestamp (timestamp), a value (value) and a label attribute, wherein the timestamp needs to be in an ISO8601 format, such as 2019-03-13T21:11:29+00:00 or 2019-03-13T21:11:29Z, and an original timestamp (such as 20190313211129) needs to be converted into the format before calibration; the label default values are all 0 (the non-abnormal point label is 0), and the label is mainly divided into two parts: displaying the whole data and amplifying and displaying the data in the selected range.
Specifically, in the process of obtaining data anomaly points by performing anomaly detection on performance index data (time series), different single detectors are designed based on the anomaly types of collected data, and the different single detectors are combined into a combined detector capable of detecting multiple anomaly types to perform anomaly detection on the system performance index data, so that the accuracy of anomaly detection is improved.
Preferably, as shown in fig. 12, the detection operators of the single detector are divided into two types, one is a transform operator (converter), and one is a detection operator (detector), and the detector includes a threshold detector for detecting an outlier, a quantile detector for detecting an outlier, and a cluster detector for detecting an outlier cluster.
Specifically, the threshold calculation formula is Min-c < Gused < Max + c, where Min and Max are set minimum and maximum thresholds, and c is an (Min, Max) increasing and decreasing factor, and when the data value exceeds a set range, the data value is regarded as an abnormal value.
Specifically, the formula of the quantile detector is MinQ-c × R, MaxQ + c × R; minq ═ M × qmin;MaxQ=M×qmaxWhere M is the maximum value of the data, qminIn low fractional percentage, qmaxAnd c is an interval scaling factor used for accurately debugging an interval range according to different data, and when the value exceeds the set fractional interval range, the value is regarded as an abnormal value.
Specifically, the operation of the cluster detector comprises the following steps: randomly choosing the center of k, C ═ C1 c2c3...ck(ii) a Calculating the Euclidean distance, argmindis (c), from each data point to the centeri,x)2,ciE is C; the center is updated according to the formula,
Figure BDA0003104616600000112
Figure BDA0003104616600000111
steps 2 and 3 are repeated until the central data point no longer changes. A plurality of outliers of a cluster can be detected at once by the cluster detector.
Specifically, the single detector further comprises a converter, the converter is based on time window sliding, time series data points are converted, the characteristics of the converted data points are unchanged, the trend of the overall curve is changed, and the implementation process of the converter is as follows: sliding on time series data using two immediately adjacent sliding windows, for time series data center xcWherein x isc∈{x1 x2 x3...xt}; calculating agg (agg belongs to mean, mean) of the data center point adjacent to the sliding]) Mindian and mean are functions for finding the median and mean within a time window, i.e. ai=agg(wi),ai+1=agg(wi+1) (ii) a Calculating the transformed value of the data center point, xc=abs(ai+1-ai)。
Specifically, as shown in fig. 13, the combined detector is composed of a converter and different single detectors connected in series or in parallel, detected data passes through one converter and three detectors to obtain a final abnormal detection result, and a complex time series data curve is decomposed and transformed by the converter to be converted into a simple curve easy to process; then detecting the curve transformed by the converter through a threshold detector and a quantile detector; finally, the detection result is input to a clustering detector to detect undetected aggregative anomalies.
Specifically, as shown in fig. 14, in the process of analyzing the anomaly detection result to determine the bottleneck position, the priority of the problem is from large to small, and the problem is a service-side problem, a network problem, and a customer-side problem, where the hardware mainly analyzes hardware resource indexes, and the software mainly relates to middleware indexes, database indexes, and application codes.
In a preferred embodiment, according to the distribution characteristics of performance bottlenecks, the performance bottlenecks are mainly divided into three categories, namely hardware layer bottlenecks, middleware layer bottlenecks and application program bottlenecks, wherein the hardware layer bottlenecks refer to system performance problems caused by overhigh utilization rate of server hardware resources and comprise problems in the aspects of a CPU (central processing unit), a memory and a disk IO (input/output); the middleware layer bottleneck comprises bottlenecks in application software and database systems such as a Web server and an application server, for example, a bottleneck caused by unreasonable parameter setting of a JDBC connection pool configured on a middleware platform; the application program bottleneck refers to application software written by a developer aiming at a certain application purpose of a user, and the application program layer bottleneck mainly comprises performance bottlenecks caused by unreasonable program design (serial processing, insufficient processing thread of a request, no buffer, no cache and the like), database design defects, unreasonable JVM parameters, slow SQL, unreasonable program architecture planning and the like.
Specifically, as shown in fig. 15, the specific process of locating the performance bottleneck is as follows: firstly, carrying out abnormity detection on hardware layer resource indexes; then, performing abnormity detection on the performance indexes of the middleware layer or the database layer in the abnormal time period to obtain abnormal conditions of index data of different levels in the same time period; and finally, determining the level where the bottleneck really appears according to different judgment bases formulated based on the correlation among the performance indexes of different levels. The bottleneck is positioned to different levels through layered positioning, and the analysis efficiency of the calling link in the follow-up process can be improved.
Specifically, as shown in fig. 16, in the process of implementing accurate positioning of a bottleneck code level through a matching call link, performance bottlenecks between layers affect each other, and when a performance index of a lower layer is bottlenecked, it is likely that a performance index of an upper layer also is bottlenecked. Normally, the bottleneck of the upper layer component does not affect the performance of the lower layer component, the normal lower layer component is within the constraint range of the upper layer component, but if the bottleneck occurs in the lower layer component, the upper layer component is necessarily affected within a certain time range. For example, if the performance bottleneck is caused by the application program, the system problem may be first reflected on the Web server, and as the errors of the Web server increase, more hardware resources are consumed, so that the utilization rate of the hardware resources increases, and a hardware layer bottleneck is caused.
In a preferred embodiment, if only the JVM _ CPU is abnormal at the same time point, it can be determined that the bottleneck is in the hardware layer. The cause of the bottleneck also needs to be analyzed in combination with the calling link location to the corresponding code to determine. For example, when a bottleneck occurs in the CPU, it is found by checking the code that the system is executing an IO-intensive task, and the CPU is in a waiting state for a long time, which indicates that the real bottleneck of the system is IO, and further analysis is required. If there are no problems with the code and the application software, it means that the hardware resources need to be reconfigured.
In a preferred embodiment, if both JVM _ CPU and GC _ COUNT are abnormal, it can be determined that a bottleneck is present in the middleware layer. It is likely that frequent GC causes excessive consumption of CPU resources, and it is necessary to analyze whether there is memory leak or database connection is not released in time in combination with corresponding code at the same time when a bottleneck occurs in the system.
In another preferred embodiment, if the JVM _ CPU exception matches TOP _ N _ SQL, a bottleneck can be determined to be present at the application level. The SQL sentence is improperly written, so that the execution time is too long, and CPU resources are excessively consumed.
The invention has the beneficial effects that: the utility model provides a performance bottleneck accurate positioning system towards Web application, hierarchical location has considered the mutual influence between the performance index of each level, detects through the layering anomaly, can clearly know which level's performance index data has appeared unusually, has increased the degree of accuracy of bottleneck location. After the bottleneck is positioned at a specific certain level, the corresponding codes of the system are positioned by matching with the calling link data when the bottleneck appears, so that the code-level accurate positioning of the performance bottleneck is realized. The method has the advantages that the application program is deeply analyzed in combination with the level of the bottleneck, the real reason of the bottleneck is determined, the application program is optimized, the performance optimization of the system is realized more efficiently based on the performance index data, the Web application level architecture is combined, the performance bottleneck layered positioning strategy is provided according to the correlation among the performance index data of different levels, and the bottleneck can be positioned more accurately. Based on the abnormal type of the performance index time sequence data collected by Skywalk, a combined detector capable of detecting various abnormal types is designed, so that the abnormal of different types can be detected more comprehensively, and the accuracy of abnormal detection is improved. By utilizing a system monitoring tool, a performance bottleneck accurate positioning system facing Web application is designed and realized based on performance index time sequence data and a request calling link, so that the accurate positioning of the performance bottleneck is more true for analyzing the cause of bottleneck generation.

Claims (10)

1. A performance bottleneck accurate positioning system for Web application is characterized in that: the system comprises a data display module and a data query module at the front end, and a system monitoring module, a data storage module, an abnormality detection module and a bottleneck positioning module at the rear end, wherein the data display module is used for visually displaying collected data and an abnormality detection result; the data query module is used for querying data in a specific time range; the system monitoring module is used for monitoring a researched system by applying a performance management tool to collect data and feeding the data back to the front-end data display module for display; the anomaly detection module is used for detecting the performance index data and feeding an anomaly detection result back to the bottleneck positioning module for analysis; the bottleneck positioning module is used for determining the position of the bottleneck according to the information fed back by the anomaly detection module by combining a bottleneck positioning strategy, then matching the bottleneck point timestamp with the calling link, accurately positioning the corresponding code, and feeding the positioning result back to the data display module for display; the performance index data and the link data are stored in a MySQL database.
2. The system according to claim 1, wherein the system is configured to accurately locate the performance bottleneck of the Web application: the system further comprises a database module, wherein the database comprises a calling link information table segment, a JVM _ CPU information table JVM _ CPU, a heap memory information table JVM _ memory _ heap, a GC information table GC _ count and an SQL information table top _ n _ SQL, and the database is used for storing calling link data and performance index data.
3. The system according to claim 2, wherein the system is configured to accurately locate the performance bottleneck of the Web application: the system comprises a front-end page, a database and a system interface module, wherein the front-end page is used for providing service energy to the outside, the front-end page is used for interacting with the front-end page, the operation of a user on the front-end page can send corresponding requests to the interface to operate the data table, and the requests realize the interaction between the front-end page and the database by calling service.
4. The system according to claim 3, wherein the system is configured to accurately locate the performance bottleneck of the Web application: the system also comprises a detection positioning module, wherein the detection positioning module is used for carrying out abnormity detection on the performance index data, confirming the level of bottleneck occurrence according to a performance bottleneck positioning strategy and positioning to a specific code by combining a calling link. At this point.
5. The system according to claim 4, wherein the system is configured to accurately locate the performance bottleneck of the Web application: the operation flow of the positioning system comprises the following steps:
s1, collecting performance index time sequence data of different types of Web application systems through a system monitoring tool;
s2, preprocessing the data;
s3, manually calibrating the data for training the abnormal detection model by using a data calibration tool;
s4, performing anomaly detection on the performance index data to obtain data anomaly points;
s5, analyzing the abnormal detection result to determine the position of the bottleneck;
and S6, realizing accurate positioning of bottleneck code level through matching the call link.
6. The system according to claim 5, wherein the system is configured to accurately locate the performance bottleneck of the Web application: in the process of obtaining the data abnormal point by performing the abnormal detection on the performance index data, different single detectors are designed based on the abnormal type of the collected data, and the different single detectors are combined into a combined detector capable of detecting various abnormal types to perform the abnormal detection on the system performance index data.
7. The system of claim 6, wherein the system comprises: the calculation formula of the threshold detector is that Min-c < Gused < Max + c, wherein Min and Max are set minimum and maximum threshold values, c is (Min, Max) increasing and decreasing factor, and when the data value exceeds the set range, the data value is regarded as an abnormal value.
8. The system of claim 6, wherein the system comprises: the formula of the position-dividing detector is MinQ-c multiplied by R, MaxQ + c multiplied by R; minq ═ M × qmin;MaxQ=M×qmaxWhere M is the maximum value of the data, qminIn low fractional percentage, qmaxAnd c is an interval scaling factor used for accurately debugging an interval range according to different data, and when the value exceeds the set fractional interval range, the value is regarded as an abnormal value.
9. The system of claim 6, wherein the system comprises: the operation of the cluster detector comprises the steps of: randomly choosing the center of k, C ═ C1 c2 c3 ... ck(ii) a Calculating the Euclidean distance, argmindis (c), from each data point to the centeri,x)2,ciE is C; the center is updated according to the formula,
Figure FDA0003104616590000021
until the central data point no longer changes.
10. The system of claim 6, wherein the system comprises: the single detector further includes a converter that transforms time series data points based on time window slidingThe characteristic of the transformed data point is unchanged, the trend of the whole curve is changed, and the implementation process of the converter is as follows: sliding on time series data using two immediately adjacent sliding windows, for time series data center xcWherein x isc∈{x1 x2 x3 ... xt}; calculating agg (agg belongs to mean, mean) of the data center point adjacent to the sliding]) Mindian and mean are functions for finding the median and mean within a time window, i.e. ai=agg(wi),ai+1=agg(wi+1) (ii) a Calculating the transformed value of the data center point, xc=abs(ai+1-ai)。
CN202110633754.0A 2021-06-07 2021-06-07 Web application-oriented performance bottleneck accurate positioning system Withdrawn CN113568804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110633754.0A CN113568804A (en) 2021-06-07 2021-06-07 Web application-oriented performance bottleneck accurate positioning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110633754.0A CN113568804A (en) 2021-06-07 2021-06-07 Web application-oriented performance bottleneck accurate positioning system

Publications (1)

Publication Number Publication Date
CN113568804A true CN113568804A (en) 2021-10-29

Family

ID=78161821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110633754.0A Withdrawn CN113568804A (en) 2021-06-07 2021-06-07 Web application-oriented performance bottleneck accurate positioning system

Country Status (1)

Country Link
CN (1) CN113568804A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312104A (en) * 2023-11-30 2023-12-29 青岛民航凯亚系统集成有限公司 Visual link tracking method and system based on airport production operation system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312104A (en) * 2023-11-30 2023-12-29 青岛民航凯亚系统集成有限公司 Visual link tracking method and system based on airport production operation system
CN117312104B (en) * 2023-11-30 2024-03-12 青岛民航凯亚系统集成有限公司 Visual link tracking method and system based on airport production operation system

Similar Documents

Publication Publication Date Title
CN110399293B (en) System test method, device, computer equipment and storage medium
US10452625B2 (en) Data lineage analysis
CN100589418C (en) The generation method and the generation system of alarm correlation rule
US10592327B2 (en) Apparatus, system, and method for analyzing logs
Jiang et al. Efficient fault detection and diagnosis in complex software systems with information-theoretic monitoring
Roschke et al. A flexible and efficient alert correlation platform for distributed ids
US20090307347A1 (en) Using Transaction Latency Profiles For Characterizing Application Updates
CN111259073A (en) Intelligent business system running state studying and judging system based on logs, flow and business access
CN110647447A (en) Abnormal instance detection method, apparatus, device and medium for distributed system
Cao et al. Timon: A timestamped event database for efficient telemetry data processing and analytics
CN113568804A (en) Web application-oriented performance bottleneck accurate positioning system
CN110825725A (en) Data quality checking method and system based on double helix management
He et al. Graph based incident extraction and diagnosis in large-scale online systems
US20230004487A1 (en) System and method for anomaly detection and root cause automation using shrunk dynamic call graphs
CN109101403A (en) A kind of pair of mobile terminal generates the method and system that SQL is monitored in real time
CN112381583A (en) Power consumption calculation method and device based on distributed memory calculation technology
CN104794031A (en) Cloud system fault detection method combining self-adjustment strategy with virtualization technology
US11487746B2 (en) Business impact analysis
CN114531338A (en) Monitoring alarm and tracing method and system based on call chain data
CN111274112B (en) Application program pressure measurement method, device, computer equipment and storage medium
WO2016013099A1 (en) Feature data management system and feature data management method
Sandeep et al. Performance analyzer: An approach for performance analysis of enterprise servers
CN116361391B (en) Method and device for detecting and repairing structural abnormality of data synchronization table
Horovitz et al. Online Automatic Characteristics Discovery of Faulty Application Transactions in the Cloud.
US11907097B1 (en) Techniques for processing trace data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211029