CN115426278A - Web application real-time anomaly analysis method and system - Google Patents

Web application real-time anomaly analysis method and system Download PDF

Info

Publication number
CN115426278A
CN115426278A CN202211053134.0A CN202211053134A CN115426278A CN 115426278 A CN115426278 A CN 115426278A CN 202211053134 A CN202211053134 A CN 202211053134A CN 115426278 A CN115426278 A CN 115426278A
Authority
CN
China
Prior art keywords
time
alarm
abnormal
information
abnormity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211053134.0A
Other languages
Chinese (zh)
Inventor
刘鹏威
孙誉航
孙晓龙
李绍俊
庞景秋
齐井春
陈兴钰
崔放
李忆平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Jiacheng Information Technology Co ltd
Original Assignee
Changchun Jiacheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Jiacheng Information Technology Co ltd filed Critical Changchun Jiacheng Information Technology Co ltd
Priority to CN202211053134.0A priority Critical patent/CN115426278A/en
Publication of CN115426278A publication Critical patent/CN115426278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a system for analyzing Web application real-time abnormity, wherein the method for analyzing the Web application real-time abnormity comprises the following steps: s1, capturing abnormity; s2, analyzing abnormal information; s3, alarming abnormally; and S4, exception handling. A system of a Web application real-time anomaly analysis method comprises a client and an analysis system, wherein the client synchronizes captured anomaly data to the analysis system through anomaly capture, and the analysis system performs real-time analysis, alarm and processing on the anomaly data and stores the data in real time. According to the invention, the online Web application is monitored in real time through a real-time monitoring function, the obtained page abnormity and interface abnormity are synchronized to the analysis system in real time, the analysis and positioning are carried out, the serious problems are alarmed, the analysis result and key indexes are pushed to developers for processing, and the purposes of rapidly responding, reducing the labor and time cost and ensuring the stable operation of the system are achieved.

Description

Web application real-time anomaly analysis method and system
Technical Field
The invention relates to an anomaly analysis method and system, in particular to a real-time anomaly analysis method and system for Web application, which can be applied to the field of Web application development.
Background
In the present day that Web technology is continuously developed and customer requirements are increasingly complex, the primary requirement for Web applications is to provide users with stable and high-quality services. However, as the complexity of the Web application increases, some deep level or other unpredictable external factors may cause errors or anomalies in the online use process of the Web application. How to provide stable service for a user is the subject that every online Web application cannot be bypassed, in most cases, in the prior art, an abnormality is found through a server log or a user feedback problem, and then operation and maintenance personnel and developers reproduce and process the abnormality.
For how to find the abnormality, the prior art mainly uses a server log or a user to find the abnormality in the using process for feedback, and a system maintainer and a developer solve the abnormality. There is a problem that an anomaly is discovered and resolved after it has caused a certain effect.
For the processing of the abnormality, the existing general flow is that after finding the Web application abnormality, the operation and maintenance personnel feed back the developer to modify, and in the project with front and back ends separated, the technical responsible personnel need to analyze and locate the abnormality problem, and then the problem is reproduced and modified by the corresponding front-end or back-end developer. And entering a test flow after the modification is finished, constructing and releasing after the modification reaches a factory standard, and finally informing a user that the secondary problem is solved. In the process, the online Web application always has the abnormality, the problem reproduction difficulty of developers is high, and for some applications with large user base number and high system dependence degree, the method has the defects of high time cost, slow response and possibility of causing actual loss to users due to the fact that the abnormality cannot be repaired in the first time.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a method and a system for analyzing the real-time abnormity of the Web application, which are used for monitoring the real-time abnormity, analyzing the abnormity and alarming the target application on the line based on the Web application, thereby effectively avoiding or reducing the influence of the abnormity of the Web application on a user. The method and the device can be used for monitoring and analyzing the Web application in real time, finding the problems in time and assisting in solving the problems, thereby ensuring the stable operation of the Web application.
In order to solve the technical problems, the invention adopts the technical scheme that: a Web application real-time anomaly analysis method comprises the following monitoring and analysis processes of anomalies:
s1, capturing abnormality;
s2, analyzing abnormal information;
s3, alarming abnormally;
and S4, exception handling.
Preferably, the method specifically comprises the following steps:
s1, the abnormal capture specifically comprises the following processes:
s11, capturing page-level JavaScript error;
s12, capturing page performance abnormity;
s13, capturing the success rate and the performance abnormity of the api interface;
s14, storing abnormal information;
s2, the abnormal information analysis specifically comprises the following processes:
s21, carrying out abnormal analysis on JavaScript error;
s22, analyzing page performance abnormity;
s23, analyzing the success rate and the performance abnormity of the api interface;
s24, visualizing abnormal data;
s3, the abnormal alarm specifically comprises the following processes:
s31, alarming abnormal JavaScript error;
s32, alarming for page performance abnormity;
s33, alarming for the success rate and the performance abnormity of the api interface;
and S4, exception handling.
Preferably, in step S11, the capturing of the JavaScript error at the page level includes the following steps:
in the front-end project, firstly, global exception capture is carried out, then special exception capture is used for assistance, and finally, exception is classified;
after the abnormal capture is completed, acquiring abnormal external environment information and uploading the information to an analysis system, wherein the external information required to be acquired comprises: operating system and version, browser and version, user information, url, network environment, time;
in step S12, the capturing of page performance exception includes the following steps:
aiming at page performance, the captured indexes comprise DNS query time, TCP link time, DOM tree analyzing time, white screen time, domready time and onload time, and the indexes with the performance of the page are obtained in window.
Regarding page collapse, the load and beforeunload of the window object are used for monitoring.
Preferably, in step S13, the capturing process of the api interface success rate and the performance anomaly includes: monitoring through a window.XMLHttpRequest, starting a monitoring request, recording a timestamp, ending the monitoring request, and recording a request state, a URL (uniform resource locator), request time and an abnormal information index;
in step S14, the specific process of storing the abnormal information is as follows:
the method includes the steps that abnormal information needs to be synchronized to an analysis system, log information is cached to a local browser, a temporary storage area and a synchronization area are established for IndexedDB, when abnormality occurs in the access for the first time, a websocket link is established between a client side and the analysis system, log data in the cache area in the IndexedDB are synchronized to the analysis system of the cloud side in real time, and after the synchronization is successful, the log data enter the synchronization area.
Preferably, in step S21, the JavaScript error exception analysis process is: according to the captured JavaScript error type log, an analysis system displays an exception list, and queries according to occurrence time, alarm level and the multiple dimensionalities of responsible persons; the captured exception information, the position of the code generating the exception in the code warehouse and the type of the exception are shown in the exception details; the analysis system sends an abnormal message to the message list of the responsible person, and performs abnormal state management to form a closed loop;
the analysis system carries out statistics on the captured JavaScript error of the same type, and displays the abnormal occurrence frequency of each type in real time in a chart form, wherein the specific statistical process is as follows:
according to the number of users influenced by the abnormal external environment information statistics, the method is divided into the following two conditions:
1. if the abnormal occurrence frequency is in direct proportion to the number of the influencing users, the calculation rule is as follows: if the number of times of occurrence of the abnormality/the number of affected users > =1, the problem is serious, and the alarm level needs to be improved for the abnormality, and priority processing is performed;
2. if the number of times of abnormal occurrence is inversely proportional to the number of users affected, the abnormal occurrence is only occurred in a small number of users and equipment, and the alarm level is reduced or no alarm is needed;
counting the operating system according to the abnormal external environment information: if the same type of exception only occurs in the same operating system, the problem does not belong to the system compatibility problem, otherwise, the exception is marked as the system compatibility problem;
and (3) counting the browser version according to the abnormal external environment information: if the same type of abnormity occurs in the same browser or version, the problem does not belong to the problem of browser compatibility, otherwise, the abnormity is marked as the problem of browser compatibility;
according to the error stack information, positioning the code position with the exception, and the specific process is as follows:
when the front-end project is packaged, a source map is generated through a packaging tool and uploaded to an analysis system, and a source map file is deleted when the front-end project is deployed, so that code leakage is prevented; and the analysis system reversely finds the specific position of the front-end code exception according to the error stack information and by combining the source map file of the front-end project, and displays the information on the exception detail page.
Preferably, in step S22, the process of analyzing the page performance exception includes: according to the captured page performance type logs, an analysis system displays a page performance list, lists the average loading time and the collapse times of all current pages of the analysis system, and carries out multi-dimensional query according to the time, the pages and the alarm levels; the detail page shows the longest loading time, the shortest loading time, external information and specific time information of each index, and alarms for pages exceeding an alarm line;
in step S23, the procedure of analyzing the success rate and performance abnormality of the api interface is as follows: according to the captured api relevant logs, the analysis system displays an api interface success rate and performance list, lists the success rate and the calling times of all current interfaces of the analysis system and the average interface use time; the detailed information shows the longest loading time, the shortest loading time, external information and specific time information of each index, and a page exceeding an alarm line is alarmed;
wherein, the api interface is counted with two dimensions of success rate and performance:
counting the success rate and the calling times of the current interface in real time in the aspect of success rate, taking the number of influencing users as a standard, and if the error reporting number of the interface is in direct proportion to the number of the influencing users, the problem is serious, needs to be processed in time and is irrelevant to user operation; if the number of error reports of the interface is inversely proportional to the number of influencing users, the interface is only generated in a special user, and external information and access parameters of the user are displayed;
when the average use time of interface communication is counted in real time in the aspect of performance, calculation is carried out through a timestamp, and the performance is not counted when the failed interface is called;
in step S24, the abnormal data is visualized as: the visualization module displays various abnormal key index information in real time in a statistical chart mode, and a statistical icon can be drilled down to detail information; and refreshing statistical chart data at regular time, and when the abnormity needing alarming is obtained, highlighting and alarming on a real-time page.
Preferably, in step S3, the specific process of the abnormal alarm is as follows: after processing the log information, the analysis system gives an alarm in real time for the part of the log information exceeding the reference line in the analysis result, finds out a responsible person needing abnormal alarm according to a responsible person association module, and gives a short message prompt or enterprise WeChat prompt according to different alarm levels, wherein the alarm information received by the responsible person comprises main abnormal information and an abnormal detail page address in the analysis system, and more information can be quickly checked through the address, and the alarm reference line can be configured in a user-defined way;
the abnormal alarm level is divided into at least three levels, which are respectively:
red scale: in the most serious level, the alarm of the level informs a responsible person of alarming in a short message mode, the problem response time is not more than 0.5h, and the recovery time is not more than 3h;
orange grade: the alarm of the level informs a responsible person of alarming in a mode of enterprise WeChat and the like, the response time of the problem is not more than 3h, and the recovery time is not more than 6h;
yellow grade: generally paying attention to the level, the alarm of the level informs a responsible person to alarm in a mode of enterprise WeChat and the like, the problem response time is not more than 4h, and the recovery time is not more than 8h.
Preferably, in step S31, the specific process of the JavaScript error exception alarm is:
the JavaScript error exception alarm baseline is: javaScript error/PV >20% red, >10% orange, >5% yellow; for the error code file path and the line number obtained in the analysis system, the analysis system obtains the latest submitter information by calling the git frame and gives an alarm through the reserved developer information; the short message alarm is sent by calling a short message service, and the enterprise micro-message alarm sends alarm information to a related developer through a small program; a developer checks related notifications and abnormal detailed information for processing by logging in an analysis system;
in step S32, the specific process of page performance anomaly alarm is as follows:
the page loading time alarm baseline is: the loading time is more than 1s, the loading time is red, 600ms-1s is orange, and 400ms-600ms is yellow; all the pages are associated with developers, and the analysis system gives an alarm to the relevant developers according to the association relation;
in step S33, the specific process of api interface success rate and performance anomaly alarm is as follows:
api interface success rate baseline: the interface error reporting quantity is red in direct proportion to the quantity of the influencing users, and the interface error reporting quantity is orange in inverse proportion to the quantity of the influencing users;
api interface performance baseline: general interface response times >5s for red, >3s for orange, >1s for yellow; the returned data volume is large, or the independent configuration of the interface baseline with more complex query is realized;
all the interfaces are associated with developers, and the analysis system gives an alarm to the relevant developers according to the association relation.
Preferably, in step S4, the specific process of exception handling is: based on the detailed information and original log information of the abnormity in the analysis system, a developer is assisted to reproduce and position the problem, the abnormity is recovered through a test construction flow, and the alarm state of the abnormity is relieved through a task management module after the system is recovered, so that a closed loop is formed.
A system of a Web application real-time anomaly analysis method comprises a client and an analysis system, wherein the client synchronizes captured anomaly data to the analysis system through anomaly capture, and the analysis system performs real-time analysis, alarm and processing on the anomaly data and stores the data in real time;
the client comprises an application layer, the application layer comprises Web application and a storage module, the abnormal capture is installed in the Web application through an npm package, the Web application which does not use the npm is introduced through a JavaScript plug-in, and the storage module comprises a log cache area and a log synchronization area;
the analysis system comprises a business layer and a database, wherein the business layer comprises an exception analysis module, an exception alarm module and an exception handling module, the exception analysis module analyzes exception data and transmits the exception analysis data to the exception alarm module, the exception alarm module correspondingly alarms the fed-back exception analysis data, and the exception handling module handles exception alarm;
the database is a levelDB database and is used for storing the log information in the analysis system in real time.
Compared with the prior art, the invention has the following beneficial effects:
1. by real-time monitoring and log synchronization, abnormal logs of the online Web application are collected in time for analysis and alarm, the problems of slow online abnormal response, difficulty in reproduction and overhigh time cost are solved, and the influence on a user is effectively reduced or avoided.
2. Through exception analysis, the problem types and the responsible persons are quickly positioned, and the communication cost is effectively reduced.
3. The method is suitable for more complex project environments by customizing the alarm rules and the base lines.
4. By monitoring the page performance and the interface performance, the real performance of the Web application on-line environment can be known, and an accurate data basis is provided for performance optimization.
5. Through the visual display of the abnormal indexes, the running condition of the Web application on the line can be known in real time, and data support is provided for the stability of the application.
The method can effectively reduce or avoid the influence of the Web application abnormity on the user, can realize quick response, quick alarm, quick analysis and quick recovery of abnormity by capturing and synchronizing the abnormity log in real time, analyzing the abnormity log in real time and alarming the abnormity in real time, effectively ensures the stability of the online Web application, and saves a large amount of time cost and labor cost.
Drawings
FIG. 1 is a flow chart of anomaly analysis according to the present invention.
FIG. 2 is a diagram of an analysis system architecture of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
The real-time anomaly analysis method for the Web application shown in FIG. 1 comprises the following steps:
s1, abnormal capture
In order to cooperate with an analysis system to perform exception capture and achieve low intrusion on the Web application, the exception capture is installed into the Web application through an npm package, and the exception capture is introduced into the Web application which does not use npm through a JavaScript plug-in. The Web application parameters are configured to analyze system related information (address, project information, license), login user related information (optional parameters). The function of exception capture comprises three types of exception real-time capture, exception log information storage and synchronization.
Exception capture includes the following 3 classes:
1. page-level JavaScript error capture;
2. page performance exception captures, such as page load time, crashes, and the like;
3. and capturing the success rate and the performance exception of the api interface.
S11, page-level JavaScript error capturing comprises the following steps:
in the front-end project, global exception capture is carried out through window.
Except for abnormal capture, abnormal external environment information is also collected and uploaded to an analysis system, and the external information needing to be obtained is as follows:
the operating system (mac, windows) and version are judged according to the method of isWin, isLinux, isMac and isUnix in the navigator object, and the current system version is obtained by comparing the character strings according to the user agent attribute.
The method comprises the steps that a browser (IE, chrome and the like) and a version are compared through a user agent attribute in a navigator object to obtain the type and the version of the current browser.
And the analysis system acquires the external network ip as the unique user information identifier from the request information according to the synchronous log request, and can synchronously log in the account information of the application if the application needs to log in and the information is not sensitive.
Url, obtained by window.
And the network environment acquires whether the current network is interrupted or not through the onLine attribute of the navigator object.
Now () obtains the time stamp of the current abnormality occurrence.
S12, the page performance exception capturing comprises the following steps:
aiming at page performance, the captured indexes comprise DNS query time, TCP link time, DOM tree analyzing time, white screen time, domready time and onload time. Onload obtains the above performance index of the page through the timing attribute of performance.
Regarding page collapse, load and beforeunload of the window object are used for monitoring.
S13, capturing of success rate and performance abnormity of api interface
XMLHttpRequest is used for monitoring, a monitoring request starts and a time stamp is recorded, the monitoring request ends, and indexes such as a request state, a URL (uniform resource locator), request time, abnormal information and the like are recorded.
S14, abnormal information storage
The abnormal information needs to be synchronized to an analysis system, log information is cached in a local browser at first, the asynchronous characteristic of IndexDB can not affect user operation when working, and the problem that the whole application is slowed down due to reading and writing of a large amount of data is effectively avoided.
A scratch and sync area are established for IndexedDB. When the access is abnormal for the first time, the client establishes a websocket link with the analysis system, the log data in the cache area in the IndexDB is synchronized to the analysis system at the cloud end in real time, and after the synchronization is successful, the log data enters the synchronization area.
The log information in the analysis system is stored by using a level DB, the level DB is a very efficient kv database, can support hundred million-level data volume, has very good performance under the condition of the data magnitude, and can effectively support the analysis system to frequently read the log information.
S2, abnormal information analysis
Through the above exception capture and exception information storage, exception log information of the Web application is already collected in the cloud analysis system, and the analysis system performs analysis processing according to the three major exceptions.
S21, javaScript error abnormity analysis
According to the captured JavaScript error type log, the analysis system displays an exception list, and inquiry can be carried out according to multiple dimensions such as occurrence time, alarm level, responsible person and the like. The exception details show the captured exception information, the location of the code that generated the exception in the code repository, the type of exception. And the analysis system sends an abnormal message to the message list of the responsible person, and performs abnormal state management to form a closed loop.
The specific implementation method comprises the following steps:
the analysis system carries out statistics on the captured JavaScript error of the same type, and displays the abnormal occurrence frequency of each type in real time in a chart form, wherein the specific statistical process is as follows:
according to the number of users influenced by the abnormal external environment information statistics, the method is divided into the following two conditions:
1. if the number of times of occurrence of the abnormality is proportional to the number of influencing users (the calculation rule is: the number of times of occurrence of the abnormality/the number of influencing users > = 1), the problem is serious, and the alarm level needs to be improved for the abnormality and priority is given to treatment.
2. If the number of times of abnormal occurrence is inversely proportional to the number of influencing users, the abnormal occurrence is only in a small number of users and equipment, and the alarm level is reduced or no alarm is needed.
Counting the operating system according to the abnormal external environment information: if the same type of exception only occurs in the same operating system, the problem does not pertain to system compatibility issues, otherwise the exception is flagged as a system compatibility issue.
And (3) counting the browser version according to the abnormal external environment information: if the same type of exception occurs in the same browser or version, the problem does not belong to a browser compatibility problem, otherwise the exception is flagged as a browser compatibility problem.
According to the error stack information, positioning the code position with the exception, and the specific process is as follows:
when the front-end project is packaged, a source map is generated through a packaging tool such as webpack/vite and uploaded to an analysis system, and a source map file is deleted when the front-end project is deployed, so that code leakage is prevented. And the analysis system reversely finds the specific position of the front-end code exception according to the error stack information and by combining the source map file of the front-end project, and displays the information on the exception detail page.
S22, page performance abnormity analysis
According to the captured page performance type logs, the analysis system displays a page performance list, lists the average loading time and the collapse times of all current pages of the analysis system, and can perform query according to dimensions such as time, pages and alarm levels. The detail page shows information such as the longest load time, the shortest load time, external information, and specific time of each index. And alarming the page exceeding the alarm line.
The specific implementation method comprises the following steps:
when the page performance comprises a page link and a rendering part, the interface is used for counting in the api part. The method mainly comprises DNS query time, TCP link time, DOM tree analyzing time, white screen time, domready time and onload time. The above indexes are added to the page rendering time.
S23, analyzing success rate and performance abnormity of api interface
And according to the captured api relevant logs, the analysis system displays an api interface success rate and performance list, lists the success rate and the calling times of all current interfaces of the analysis system and the average interface use time. The detailed information shows information such as the longest loading time, the shortest loading time, external information, and specific time of each index. And alarming the page exceeding the alarm line.
The specific implementation method comprises the following steps:
the api interface performs statistics in both the success rate and performance dimensions.
Counting the success rate and the calling times of the current interface in real time in the aspect of success rate, taking the number of influencing users as a standard, and if the error reporting number of the interface is in direct proportion to the number of the influencing users, the problem is serious, needs to be processed in time and is irrelevant to user operation; if the number of error reports of the interface is inversely proportional to the number of influencing users, the interface is only generated in the special users, and the external information and the participation of the users are displayed.
And when the average use time of the interface communication is counted in real time in the aspect of performance, the calculation is carried out through the timestamp, and the performance is not counted by the failed interface call.
S24, abnormal data visualization
The visualization module displays various abnormal key index information in real time in a statistical chart mode, and the statistical icon can be drilled down to the detailed information. And refreshing statistical chart data at regular time (defaults for 1 minute, and the statistical chart data can be configured), and when the abnormity needing alarming is obtained, highlighting and alarming on a real-time page.
The specific implementation process is as follows: various statistical charts are realized through echarts, and statistical chart data is refreshed regularly through a JavaScript timer.
S3, abnormal alarm
After the analysis system processes the log information, real-time alarming is carried out on the indexes of the part exceeding the reference line in the analysis result, a responsible person needing abnormal alarming is found according to the responsible person association module, short message reminding or enterprise WeChat reminding is carried out according to different alarming levels, the alarming information received by the responsible person comprises main abnormal information and the page address of the abnormal details in the analysis system, and more information can be quickly checked through the address. The alarm reference line can be configured in a user-defined mode.
The specific implementation method comprises the following steps:
the abnormal alarm grades are divided into at least three grades, which are respectively:
grade of red: in the most serious level, the alarm of the level needs to inform a responsible person of alarming in a short message mode, the problem response time is not more than 0.5h, and the recovery time is not more than 3h.
Orange grade: the alarm of the level needs to pay attention to the level, the responsible person is informed to alarm in the modes of enterprise WeChat and the like, the problem response time is not more than 3h, and the recovery time is not more than 6h.
Yellow grade: generally paying attention to the level, the alarm of the level informs a responsible person to alarm in a mode of enterprise WeChat and the like, the problem response time is not more than 4h, and the recovery time is not more than 8h.
The abnormal alarm level, the alarm mode, the problem response time and the recovery time can be configured by self definition.
S31, javaScript error abnormity alarm
The JavaScript error exception alarm baseline is: javaScript error anomaly/PV >20% red, >10% orange, >5% yellow. For the error code file path and the line number obtained in the analysis system, the analysis system obtains the information of the latest submitter by calling the git frame and gives an alarm through the reserved developer information. The short message alarm is sent by calling a short message service, and the enterprise micro-message alarm sends alarm information to a related developer through a small program. The developer can view the related notice and the detailed information of the exception for processing by logging in the analysis system.
S32, alarming for abnormal page performance
The page load time alarm baseline is: load time >1s red, 600ms-1s orange, 400ms-600ms yellow. All the pages are associated with developers, and the analysis system gives an alarm to the relevant developers according to the association relation.
S33, alarm for abnormal success rate and performance of api interface
api interface success rate baseline: the number of interface errors is proportional to the number of affected users in red and inversely proportional to orange.
api interface performance baseline: general interface response times >5s for red, >3s for orange, >1s for yellow. The amount of returned data is large, or the interface baseline with more complex query can be configured independently.
All the interfaces are associated with developers, and the analysis system gives an alarm to the relevant developers according to the association relation.
S4, exception handling
Based on the detailed information of the abnormity in the analysis system and the information such as the original log, the development personnel is assisted to reproduce and position the problem, and the abnormity is recovered through a test construction process. And after the system is recovered, the task management module is used for relieving the alarm state of the abnormity to form a closed loop.
A system of a Web application real-time anomaly analysis method comprises a client and an analysis system, wherein the client synchronizes captured anomaly data to the analysis system through anomaly capture, and the analysis system performs real-time analysis, alarm and processing on the anomaly data and stores the data in real time;
the client comprises an application layer, the application layer comprises Web application and a storage module, the abnormal capture is installed in the Web application through an npm package, the Web application which does not use the npm is introduced through a JavaScript plug-in, and the storage module comprises a log cache area and a log synchronization area;
the analysis system comprises a business layer and a database, wherein the business layer comprises an abnormity analysis module, an abnormity alarm module and an abnormity processing module, the abnormity analysis module analyzes the abnormity data and transmits the abnormity analysis data to the abnormity alarm module, the abnormity alarm module correspondingly alarms the fed-back abnormity analysis data, and the abnormity processing module processes the abnormity alarm;
the database is a levelDB database and is used for storing the log information in the analysis system in real time.
By protecting a Web application real-time anomaly analysis method and system and combining a real-time monitoring method, the method actively discovers and solves anomalies, and reduces the loss of the anomalies to users to the minimum; the online Web application is monitored in real time through a real-time monitoring function, the obtained page abnormity and interface abnormity are synchronized to an analysis system in real time, the analysis system is analyzed and positioned based on a certain rule, serious problems are alarmed, an analysis result and key indexes are pushed to developers for processing, and the purposes of rapidly responding, reducing labor and time cost and guaranteeing stable operation of the system are achieved.
The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make variations, modifications, additions or substitutions within the technical scope of the present invention.

Claims (10)

1. A Web application real-time anomaly analysis method is characterized by comprising the following steps: the method comprises the following steps of monitoring and analyzing the abnormality:
s1, capturing abnormality;
s2, analyzing abnormal information;
s3, alarming for abnormity;
and S4, exception handling.
2. The Web application real-time anomaly analysis method according to claim 1, wherein: the method specifically comprises the following steps:
s1, the exception capture specifically comprises the following processes:
s11, capturing JavaScript error at a page level;
s12, capturing page performance abnormity;
s13, capturing the success rate and the performance abnormity of the api interface;
s14, storing abnormal information;
s2, the abnormal information analysis specifically comprises the following processes:
s21, carrying out abnormal analysis on JavaScript error;
s22, analyzing page performance abnormity;
s23, analyzing the success rate and the performance abnormity of the api interface;
s24, abnormal data visualization;
s3, the abnormity alarming specifically comprises the following processes:
s31, alarming abnormal JavaScript error;
s32, alarming for page performance abnormity;
s33, alarming the success rate and the performance abnormity of the api interface;
and S4, exception handling.
3. The Web application real-time anomaly analysis method according to claim 2, wherein: in step S11, the page-level JavaScript error capture includes the following steps:
in the front-end project, global anomaly capture is firstly carried out, then special anomaly capture is assisted, and finally the anomalies are classified;
after the abnormal capture is completed, acquiring abnormal external environment information and uploading the abnormal external environment information to an analysis system, wherein the external information required to be acquired comprises: operating system and version, browser and version, user information, url, network environment, time;
in step S12, the page performance exception capturing includes the following steps:
aiming at page performance, the captured indexes comprise DNS query time, TCP link time, DOM tree analyzing time, white screen time, domready time and onload time, and the indexes with the performance of the page are obtained in window.
Regarding page collapse, load and beforeunload of the window object are used for monitoring.
4. The Web application real-time anomaly analysis method according to claim 2, characterized in that: in step S13, the capturing process of the api interface success rate and the performance anomaly includes: monitoring through a window.XMLHttpRequest, starting a monitoring request, recording a timestamp, ending the monitoring request, and recording a request state, a URL (uniform resource locator), request time and an abnormal information index;
in step S14, the specific process of storing the abnormal information is:
the method comprises the steps that abnormal information needs to be synchronized to an analysis system, log information is cached to a local browser at first, a temporary storage area and a synchronization area are established for IndexedDB, when the access is abnormal for the first time, a websocket link is established between a client side and the analysis system, log data in the cache area in the IndexedDB are synchronized to the analysis system at the cloud side in real time, and after the synchronization is successful, the log data enter the synchronization area.
5. The Web application real-time anomaly analysis method according to claim 2, wherein: in step S21, the JavaScript error exception analysis process is: according to the captured JavaScript error type log, an analysis system displays an exception list, and queries according to occurrence time, alarm level and the multiple dimensionalities of responsible persons; the captured exception information, the position of the code generating the exception in the code warehouse and the type of the exception are shown in the exception details; the analysis system sends an abnormal message to the message list of the responsible person, and performs abnormal state management to form a closed loop;
the analysis system carries out statistics on the captured JavaScript error of the same type, and displays the abnormal occurrence frequency of each type in real time in a chart mode, wherein the specific statistical process is as follows:
according to the number of users influenced by the abnormal external environment information statistics, the method is divided into the following two conditions:
1. if the abnormal occurrence frequency is in direct proportion to the number of the influencing users, the calculation rule is as follows: if the number of times of occurrence of the abnormality/the number of affected users > =1, the problem is serious, and the alarm level needs to be improved for the abnormality, and priority processing is performed;
2. if the number of times of abnormal occurrence is inversely proportional to the number of users affected, the abnormal occurrence is only occurred in a small number of users and equipment, and the alarm level is reduced or no alarm is needed;
counting the operating system according to the abnormal external environment information: if the same type of exception only occurs in the same operating system, the problem does not belong to the system compatibility problem, otherwise, the exception is marked as the system compatibility problem;
and counting the browser version according to the abnormal external environment information: if the same type of abnormity occurs in the same browser or version, the problem does not belong to the problem of browser compatibility, otherwise, the abnormity is marked as the problem of browser compatibility;
according to the error stack information, positioning the code position with the exception, and the specific process is as follows:
when the front-end project is packaged, a source map is generated through a packaging tool and uploaded to an analysis system, and a source map file is deleted when the front-end project is deployed, so that code leakage is prevented; and the analysis system reversely finds the specific position of the front-end code exception according to the error stack information and by combining the source map file of the front-end project, and displays the information on the exception detail page.
6. The Web application real-time anomaly analysis method according to claim 2, wherein: in step S22, the process of analyzing the page performance anomaly includes: according to the captured page performance type logs, an analysis system displays a page performance list, lists the average loading time and the collapse times of all current pages of the analysis system, and carries out multi-dimensional query according to the time, the pages and the alarm levels; the detail page shows the longest loading time, the shortest loading time, external information and specific time information of each index, and alarms for pages exceeding an alarm line;
in step S23, the procedure of analyzing the success rate and performance abnormality of the api interface is as follows: according to the captured api relevant logs, the analysis system displays an api interface success rate and performance list, lists the success rate and the calling times of all current interfaces of the analysis system and the average interface use time; the detailed information shows the longest loading time, the shortest loading time, external information and specific time information of each index, and a page exceeding an alarm line is alarmed;
wherein, the api interface is counted with two dimensions of success rate and performance:
counting the success rate and the calling times of the current interface in real time in the aspect of success rate, taking the number of influencing users as a standard, and if the error reporting number of the interface is in direct proportion to the number of the influencing users, the problem is serious, needs to be processed in time and is irrelevant to user operation; if the number of error reports of the interface is inversely proportional to the number of influencing users, the interface is only generated in a special user, and external information and access parameters of the user are displayed;
when the average use time of interface communication is counted in real time in the aspect of performance, calculation is carried out through a timestamp, and the performance is not counted when the failed interface is called;
in step S24, the abnormal data is visualized as: the visualization module displays various abnormal key index information in real time in a statistical chart mode, and the statistical icon can drill down to the detailed information; and refreshing statistical chart data at regular time, and when the abnormity needing alarming is obtained, giving an alarm by highlighting the page in real time.
7. The Web application real-time anomaly analysis method according to claim 2, wherein: in the step S3, the specific process of the abnormal alarm is as follows: after processing the log information, the analysis system gives an alarm in real time for the part of the log information exceeding the reference line in the analysis result, finds out a responsible person needing abnormal alarm according to a responsible person association module, and gives a short message prompt or enterprise WeChat prompt according to different alarm levels, wherein the alarm information received by the responsible person comprises main abnormal information and an abnormal detail page address in the analysis system, and more information can be quickly checked through the address, and the alarm reference line can be configured in a user-defined way;
the abnormal alarm level is divided into at least three levels, which are respectively:
red scale: in the most serious level, the alarm of the level informs a responsible person of alarming in a short message mode, the problem response time is not more than 0.5h, and the recovery time is not more than 3h;
orange grade: the alarm at the level needs to pay attention to the level, the alarm at the level informs a responsible person of giving an alarm in a mode of enterprise WeChat and the like, the response time of the problem does not exceed 3h, and the recovery time does not exceed 6h;
yellow grade: generally paying attention to the level, the alarm of the level informs a responsible person to alarm in a mode of enterprise WeChat and the like, the problem response time is not more than 4h, and the recovery time is not more than 8h.
8. The Web application real-time anomaly analysis method according to claim 2, wherein: in step S31, the specific process of JavaScript error exception alarm is as follows:
the JavaScript error exception alarm baseline is: javaScript error/PV >20% red, >10% orange, >5% yellow; for the error code file path and the line number obtained in the analysis system, the analysis system obtains the latest submitter information by calling the git frame and gives an alarm through the reserved developer information; the short message alarm is sent by calling a short message service, and the enterprise micro-message alarm sends alarm information to a related developer through a small program; a developer checks and processes related notifications and abnormal detailed information by logging in an analysis system;
in step S32, the specific process of page performance anomaly alarm is as follows:
the page loading time alarm baseline is: the loading time is more than 1s, the loading time is red, 600ms-1s is orange, and 400ms-600ms is yellow; all the pages are associated with developers, and the analysis system gives an alarm to the relevant developers according to the association relation;
in step S33, the specific process of api interface success rate and performance anomaly alarm is as follows:
api interface success rate baseline: the interface error reporting quantity is red in direct proportion to the quantity of the influencing users, and the interface error reporting quantity is orange in inverse proportion to the quantity of the influencing users;
api interface performance baseline: general interface response times >5s for red, >3s for orange, >1s for yellow; the returned data volume is large, or the independent configuration of the interface baseline with more complex query is realized;
all the interfaces are associated with developers, and the analysis system gives an alarm to relevant developers according to the association relation.
9. The Web application real-time anomaly analysis method according to claim 2, wherein: in step S4, the specific process of exception handling is: based on the detailed information and original log information of the abnormity in the analysis system, a developer is assisted to reproduce and position the problem, the abnormity is recovered through a test construction flow, and the alarm state of the abnormity is relieved through a task management module after the system is recovered, so that a closed loop is formed.
10. A system for Web application real-time anomaly analysis method according to any one of claims 1-9, characterized by: the system comprises a client and an analysis system, wherein the client synchronizes captured abnormal data to the analysis system through abnormal capture, and the analysis system performs real-time analysis, alarm and processing on the abnormal data and stores the data in real time;
the client comprises an application layer, the application layer comprises Web application and a storage module, the abnormal capture is installed in the Web application through an npm package, the Web application which does not use the npm is introduced through a JavaScript plug-in, and the storage module comprises a log cache area and a log synchronization area;
the analysis system comprises a business layer and a database, wherein the business layer comprises an abnormity analysis module, an abnormity alarm module and an abnormity processing module, the abnormity analysis module analyzes the abnormity data and transmits the abnormity analysis data to the abnormity alarm module, the abnormity alarm module correspondingly alarms the fed-back abnormity analysis data, and the abnormity processing module processes the abnormity alarm;
the database is a levelDB database and is used for storing the log information in the analysis system in real time.
CN202211053134.0A 2022-08-31 2022-08-31 Web application real-time anomaly analysis method and system Pending CN115426278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211053134.0A CN115426278A (en) 2022-08-31 2022-08-31 Web application real-time anomaly analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211053134.0A CN115426278A (en) 2022-08-31 2022-08-31 Web application real-time anomaly analysis method and system

Publications (1)

Publication Number Publication Date
CN115426278A true CN115426278A (en) 2022-12-02

Family

ID=84200376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211053134.0A Pending CN115426278A (en) 2022-08-31 2022-08-31 Web application real-time anomaly analysis method and system

Country Status (1)

Country Link
CN (1) CN115426278A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090089629A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Capturing diagnostics in web browser applications
CN107423194A (en) * 2017-06-30 2017-12-01 阿里巴巴集团控股有限公司 Front end abnormality alarming processing method, apparatus and system
CN112749059A (en) * 2021-01-13 2021-05-04 叮当快药科技集团有限公司 Front-end abnormity monitoring method, device and system
CN113495820A (en) * 2020-04-03 2021-10-12 北京沃东天骏信息技术有限公司 Method and device for collecting and processing abnormal information and abnormal monitoring system
CN113553272A (en) * 2021-09-18 2021-10-26 深圳市信润富联数字科技有限公司 Interface abnormity monitoring method, device, medium and computer program product
CN114116377A (en) * 2021-11-10 2022-03-01 浪潮云信息技术股份公司 Method, system and medium for monitoring performance of cloud platform web front end
US20220245013A1 (en) * 2021-02-02 2022-08-04 Quantum Metric, Inc. Detecting, diagnosing, and alerting anomalies in network applications
CN114968959A (en) * 2022-05-11 2022-08-30 中国平安人寿保险股份有限公司 Log processing method, log processing device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090089629A1 (en) * 2007-09-27 2009-04-02 Microsoft Corporation Capturing diagnostics in web browser applications
CN107423194A (en) * 2017-06-30 2017-12-01 阿里巴巴集团控股有限公司 Front end abnormality alarming processing method, apparatus and system
CN113495820A (en) * 2020-04-03 2021-10-12 北京沃东天骏信息技术有限公司 Method and device for collecting and processing abnormal information and abnormal monitoring system
CN112749059A (en) * 2021-01-13 2021-05-04 叮当快药科技集团有限公司 Front-end abnormity monitoring method, device and system
US20220245013A1 (en) * 2021-02-02 2022-08-04 Quantum Metric, Inc. Detecting, diagnosing, and alerting anomalies in network applications
CN113553272A (en) * 2021-09-18 2021-10-26 深圳市信润富联数字科技有限公司 Interface abnormity monitoring method, device, medium and computer program product
CN114116377A (en) * 2021-11-10 2022-03-01 浪潮云信息技术股份公司 Method, system and medium for monitoring performance of cloud platform web front end
CN114968959A (en) * 2022-05-11 2022-08-30 中国平安人寿保险股份有限公司 Log processing method, log processing device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张语涵;刘淑华;周永鑫;: "Java Web应用中错误和异常处理方法研究", 现代计算机(专业版), no. 23, pages 61 - 65 *

Similar Documents

Publication Publication Date Title
US7559053B2 (en) Program and system performance data correlation
AU2019201687B2 (en) Network device vulnerability prediction
US8688729B2 (en) Efficiently collecting transaction-separated metrics in a distributed enviroment
CN101201786A (en) Method and device for monitoring fault log
CN112905548B (en) Security audit system and method
US20130047169A1 (en) Efficient Data Structure To Gather And Distribute Transaction Events
US11526422B2 (en) System and method for troubleshooting abnormal behavior of an application
CN112416872A (en) Cloud platform log management system based on big data
CN109783754A (en) A kind of log methods of exhibiting and system based on the realization of WEB technology
US10915510B2 (en) Method and apparatus of collecting and reporting database application incompatibilities
CN116562848A (en) Operation and maintenance management platform
CN112052134A (en) Service data monitoring method and device
US20040034614A1 (en) Network incident analyzer method and apparatus
US9645877B2 (en) Monitoring apparatus, monitoring method, and recording medium
US20180295145A1 (en) Multicomputer Digital Data Processing to Provide Information Security Control
CN115426278A (en) Web application real-time anomaly analysis method and system
CN115396199A (en) Attack path visual restoration method, device, equipment and medium
US7783928B2 (en) Description of activities in software products
CN113821412A (en) Equipment operation and maintenance management method and device
CN109412861B (en) Method for establishing security association display of terminal network
CN114143160A (en) Cloud platform automation operation and maintenance system
US20180276253A1 (en) Database Storage Monitoring Equipment
CN116737514B (en) Automatic operation and maintenance method based on log and probe analysis
CN117472684A (en) Fault processing method, device, terminal equipment and storage medium
JP2007013928A (en) Remote fault monitoring apparatus and remote fault monitoring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination