CN114201201A - Method, device and equipment for detecting abnormity of business system - Google Patents

Method, device and equipment for detecting abnormity of business system Download PDF

Info

Publication number
CN114201201A
CN114201201A CN202111536537.6A CN202111536537A CN114201201A CN 114201201 A CN114201201 A CN 114201201A CN 202111536537 A CN202111536537 A CN 202111536537A CN 114201201 A CN114201201 A CN 114201201A
Authority
CN
China
Prior art keywords
log
abnormal
service
behavior
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111536537.6A
Other languages
Chinese (zh)
Inventor
赵西
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111536537.6A priority Critical patent/CN114201201A/en
Publication of CN114201201A publication Critical patent/CN114201201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a method, a device and equipment for detecting the abnormity of a service system, which relate to the field of data verification, and the method comprises the following steps: when detecting that the version of the service system is updated, acquiring log information generated when a new service system processes services; summarizing the log information in the current first summarizing time period, and determining first logs corresponding to various abnormal business behaviors; acquiring a summary result of the log information in a second summary time period counted by the business system before updating in history, and determining a second log corresponding to each abnormal business behavior; and obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is determined to be met according to the change of the statistical indexes. According to the scheme provided by the application, the abnormal service when the new version of the service system is on line can be found in time, the time for problem location is shortened, and the service quality of the service is improved.

Description

Method, device and equipment for detecting abnormity of business system
Technical Field
The invention relates to the field of data verification, in particular to a method, a device and equipment for detecting the abnormity of a service system.
Background
The general large-scale bank system standardizes the system standard, the transaction of each system requires to record a monitoring log, if the transaction has errors, an error code and error information are required to be output to a monitoring platform, the monitoring platform monitors the success rate of the single transaction at ordinary times, wherein the monitoring platform alarms when the service type error monitoring threshold is relatively low, alarms when the technical type error monitoring threshold is relatively high, and prompts operation and maintenance personnel to intervene in time through the alarms.
The above method belongs to conventional monitoring, and is effective for the version in the stable state, while the new version of the system belongs to the unstable state when online, and the problem is difficult to find in time only by the above method.
1. It is possible that not the entire transaction is in question in the new version, but some function in the transaction is in question.
2. Still in the morning after the edition is changed, the traffic volume is gradually increased along with the time, the problem can be found only when the transaction peak period is waited through the monitoring of the transaction success rate, a large amount of customer complaints exist at the moment, and the follow-up emergency operation is very passive.
3. The existing monitoring means has no way to identify whether the alarm is related to the online of the new version, and the problem occurring after the online is not necessarily caused by the current version and may be a historical problem.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for detecting the abnormity of a service system, and aims to solve the problem that the abnormity of the service cannot be found in time when a new version of the service system is online.
In a first aspect, an embodiment of the present application provides a method for detecting an anomaly of a service system, where the method includes:
when detecting that the version of the service system is updated, acquiring log information generated when the new service system processes the service, wherein the log information comprises service behaviors, whether the service is abnormal or not and abnormal types;
in response to a data statistics instruction, collecting log information in a current first collection time period, and determining first logs corresponding to various abnormal business behaviors;
acquiring a summary result of log information in a second summary time period counted by a business system before updating in history, and determining a second log corresponding to each abnormal business behavior, wherein the second summary time period and the first summary time period are the same time period in different time periods;
and obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is determined to be met according to the change of the statistical indexes.
In a possible implementation manner, obtaining corresponding statistical indexes according to a first log and a second log of the same abnormal service behavior, and determining that an alarm condition is met according to a change of the statistical indexes includes:
and determining that the second log has new abnormal business behaviors which do not appear in the first log, and determining that the alarm condition is met.
In a possible implementation manner, obtaining corresponding statistical indexes according to a first log and a second log of the same abnormal service behavior, and determining that an alarm condition is met according to a change of the statistical indexes includes:
and when the increase rate of the times of the same abnormal business behavior appearing in the first log is determined to be larger than the times of the same abnormal business behavior appearing in the second log, and the increase rate exceeds a preset first threshold value, determining that a preset alarm condition is met.
In a possible implementation manner, obtaining corresponding statistical indexes according to a first log and a second log of the same abnormal service behavior, and determining that an alarm condition is met according to a change of the statistical indexes includes:
and determining that the alarm condition is met when the increase rate does not exceed a preset first threshold and exceeds a second preset threshold in comparison with the occurrence frequency of the same abnormal business behavior in the first log in the summary time periods exceeding the set number in the time period.
In one possible implementation, a first log/second log corresponding to each abnormal business activity is determined:
and according to the information item of whether the service in the log information is abnormal, when the information item indicates that the service is abnormal, obtaining the corresponding abnormal service behavior according to the abnormal type.
In a possible implementation manner, obtaining corresponding statistical indexes according to a first log and a second log of the same abnormal service behavior, and determining that an alarm condition is met according to a change of the statistical indexes includes:
obtaining a first average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the first log and the corresponding processing time of each time;
obtaining a second average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the second log and the corresponding processing time of each time;
and when the first average processing time is determined to be greater than the second average processing time and the difference value exceeds a preset threshold value, determining that an alarm condition is met.
In a possible implementation manner, after the abnormal traffic behavior is identified by the alarm, the method further includes:
determining a program module corresponding to the abnormal business behavior with the alarm identifier;
according to a program module comparison table with an updated version compared with a previous version, when the program module is determined to be in the program module comparison table, identifying the program module as a new version program module, otherwise, identifying the program module as an old version program module;
and sending the abnormal business behavior of the identifier to be alarmed and the corresponding identification result to the corresponding debugging client.
In a possible implementation manner, determining a program module corresponding to an abnormal business behavior with an alarm identifier includes:
establishing a mapping relation table of each abnormal business row and the corresponding program module according to the first log corresponding to each abnormal business row;
and inquiring the program module corresponding to the abnormal business behavior with the alarm identification from the mapping relation table.
In one possible embodiment, the service includes transaction information, and the log information includes a transaction code identifying the behavior of the service, an error code identifying whether the service is abnormal, and error information describing the error code, a transaction start time, and a transaction end time.
In a possible implementation manner, when detecting that the version of the business system is updated, acquiring log information generated when the new business system processes the business, includes:
when the version updating is determined to be completed, a timing analysis switch is turned on;
when the timing switch is determined to be turned on, acquiring log information generated when each service client processes services through the SDK by the service client deployed by the service system;
and when the stable operation index is reached after the version is determined to be updated, closing the timing analysis switch.
In one possible embodiment, the aggregating the log information in the current first aggregation time period in response to the data statistics instruction includes:
generating a data statistics instruction at a fixed time interval T, and determining a time period with the time length T before the current time as a current first summary time period;
and summarizing the log information in the current first summarizing time period.
In a second aspect, an embodiment of the present application provides an apparatus for detecting an anomaly of a service system, where the apparatus includes:
the system comprises an acquisition log data module, a processing module and a processing module, wherein the acquisition log data module is used for acquiring log information generated when a new service system processes services when detecting that the version of the service system is updated, and the log information comprises service behaviors, whether the services are abnormal and abnormal types;
the data statistics module is used for responding to a data statistics instruction, summarizing the log information in the current first summarizing time period and determining first logs corresponding to various abnormal business behaviors;
the summary log module is used for acquiring a summary result of log information in a second summary time period counted by the business system before updating in history, and determining a second log corresponding to each abnormal business behavior, wherein the second summary time period and the first summary time period are the same time period in different time periods;
and the alarm identification module is used for obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is met according to the change of the statistical indexes.
In a third aspect, an embodiment of the present application provides a device for detecting an anomaly in a service system, where the device includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of detecting a business system anomaly.
In a fourth aspect, an embodiment of the present application provides a computer program product, where the computer program product is configured to enable a computer to execute a method in a business system anomaly detection method.
In a fifth aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, and the computer program is used to enable a computer to execute any one of the business system abnormality detection methods.
The embodiment of the application provides a method, a device and equipment for detecting the abnormity of a service system, wherein the abnormal service behavior is subjected to alarm identification when the change of the statistical index is determined to meet the alarm condition by comparing the statistical indexes of the log information of the service system in a new version and the log information of the service system before updating in the same time period in different time periods. By using the method provided by the embodiment of the application, the abnormal service when the new version of the service system is on line can be found in time, the time for problem location is shortened, and the service quality of the service is improved.
Drawings
Fig. 1 is a schematic diagram illustrating an architecture of a business system anomaly detection system according to an exemplary embodiment of the present invention;
fig. 2 is a schematic flowchart illustrating an exemplary method for detecting an anomaly in a service system according to an exemplary embodiment of the present invention;
fig. 3 is a schematic flowchart illustrating an exemplary method for detecting an anomaly in a service system according to an exemplary embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an apparatus for detecting anomalies in a business system, according to an illustrative embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an apparatus for detecting an anomaly in a service system according to an exemplary embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be described in detail and clearly with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In the embodiment of the application, the acquisition, storage, use, processing and the like of the data all conform to relevant regulations of national laws and regulations.
The method for detecting the abnormity of the service system is applied to a scene that a monitoring platform server monitors the service system. Fig. 1 is a diagram illustrating an architecture of a service system anomaly detection system according to an embodiment of the present application, where the system includes a monitoring platform server 101, a database 102, and at least one service system server (a service system server 103_1, a service system server 103_2, and a service system server 103_3 in the example in the figure). The monitoring platform server 101 is used for collecting service log data of a service system; the business system server can be a large-scale bank system, comprises a personal online bank system, a financial product sale system, an acquiring system (a POS terminal and an electronic clearing system) and the like, and is used for providing various business services for users; the database 102 is used for storing programs and data required by the monitoring platform server 101 to implement corresponding functions.
The embodiment of the application is applied to the scene shown in fig. 1, for example, a large-scale banking system is taken as an example, a general large-scale banking system standardizes the system, and the transaction of each system requires log information to be recorded. The embodiment of the present application provides a method for detecting an anomaly in a service system, as shown in fig. 2, when a conventional monitoring method cannot meet a monitoring requirement for an abnormal service before a version of the service system reaches a stable operation index after being updated, the method includes:
s201: when detecting that the version of the service system is updated, acquiring log information generated when the new service system processes the service, wherein the log information comprises service behaviors, whether the service is abnormal or not and abnormal types.
In one possible embodiment, when it is determined that the version update is completed, the timing analysis switch is turned on;
when the timing switch is determined to be turned on, acquiring log information generated when each service client processes services through the SDK by the service client deployed by the service system;
and when the stable operation index is reached after the version is determined to be updated, closing the timing analysis switch.
Since the service system abnormality detection method provided by this embodiment implements highly accurate detection and alarm, compared with the conventional detection method, the detection method needs higher cost, and in view of the fact that the conventional detection can also meet the detection requirement when the version reaches the operation stability index after updating, when the version is updated, the timing analysis switch is turned on to detect the service system abnormality by using the method provided by this embodiment, and generally, the timing analysis switch is turned off after 3 days.
For a large-scale banking system, the business system includes various types for implementing various kinds of business. The version updating of the business system of the bank generally refers to the version updating of the large-scale business system, the updating period is frequent, and the version updating of the business system is generally performed in saturday morning in order not to influence the normal business service provided for users. Due to the fact that the transaction amount is small on weekends, particularly in the early morning, the situation that some abnormal business behaviors cannot be found in time occurs, and therefore subsequent emergency operation is quite passive. In view of the above problem, the method provided by the embodiment of the present application obtains log information generated when a new service system processes a service, and analyzes the log information, thereby solving the above problem.
In one possible embodiment, the service includes transaction information including a transaction code identifying the behavior of the service, an error code identifying whether the service is abnormal and error information describing the error code, a transaction start time and a transaction end time.
S202: and responding to the data statistics instruction, summarizing the log information in the current first summarizing time period, and determining a first log corresponding to each abnormal business behavior.
In one possible embodiment, the aggregating the log information in the current first aggregation time period in response to the data statistics instruction includes:
generating a data statistics instruction at a fixed time interval T, and determining a time period with the time length T before the current time as a current first summary time period;
and summarizing the log information in the current first summarizing time period.
The summary time period is calculated from the time of the service system version update, and may be summarized once every hour, for example, the service system version update is set in 2: 10 is complete, then at point 3, summarize 2: 10-3:00, at 4 points, 3: 00-4: 00; it may also be summarized every hour, for example, business system version updates are at 2: 10 is complete, then at 3: 10 and 2: 10-3: 10 log information.
And acquiring a first log corresponding to the abnormal business behavior from the log information, wherein the normal business behavior, namely the successful transaction is not counted in the process. The first log comprises a transaction code of abnormal service behavior after the system version is updated, an error code for identifying the abnormal service, error information for describing the error code, transaction starting time and transaction ending time.
S203: and acquiring a summary result of the log information in a second summary time period counted by the service system before updating in the history, and determining a second log corresponding to each abnormal service behavior, wherein the second summary time period and the first summary time period are the same time period in different time periods.
In addition to summarizing the log information of the new service system in S202, the log information of the service system before updating is summarized, and the summarizing process may be summarized when the monitoring platform server records the log information before the new version is online, or may be summarized when the log information in the first summarizing time period is summarized after the new version is online.
And acquiring a second log corresponding to the abnormal business behavior from the log information of the business system before updating, and similarly, not counting the normal business behavior. The second log comprises transaction codes of abnormal service behaviors of the service system before updating, error codes for identifying abnormal services, error information for describing the error codes, transaction starting time and transaction ending time.
The second summary time period is the same time period which is located in different time periods with the first summary time period, the time period can be set according to the requirements of users, and the second summary time period can be other set days in a week. For example, the first aggregation time period is 2: 10-3:00,3: 00-4: 00, the second time period is 2: 10-3:00,3: 00-4: 00.
s204: and obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is determined to be met according to the change of the statistical indexes.
And comparing the first log of the new service system with the second log of the service system before updating, wherein the statistical indexes corresponding to the same abnormal service behavior, namely the statistical indexes corresponding to the same transaction code, error code and error information combination, need to be compared.
In one possible implementation, a first log/second log corresponding to each abnormal business activity is determined:
and according to the information item of whether the service in the log information is abnormal, determining whether the information item of whether the service is abnormal is only when the service is abnormal, and obtaining corresponding abnormal service behaviors according to the abnormal types. The abnormal business behavior is the combination of a transaction code of the business behavior, an error code for identifying the business abnormality and error information for identifying the coming of the abnormality.
The embodiment of the application provides a method for detecting the abnormity of a service system, wherein the abnormal service behavior is subjected to alarm identification when the change of the statistical index is determined to meet the alarm condition by comparing the statistical indexes of the log information of the service system in a new version and the service system before updating in the same time period in different time periods. By using the method provided by the embodiment of the application, the abnormal service when the new version of the service system is on line can be timely and actively discovered, the time for problem location is shortened, and the service influence generated by the problem is reduced to the minimum.
As shown in fig. 3, the method for detecting an anomaly of a service system provided in the embodiment of the present application mainly includes two parts: and comparing and analyzing abnormal business behaviors of the new business system and the business system before updating and analyzing the program module.
1. And comparing and analyzing the abnormal business behaviors of the new business system and the business system before updating.
In a possible implementation manner, the method includes obtaining corresponding statistical indexes according to a first log and a second log of the same abnormal service behavior, and determining that an alarm condition is met according to a change of the statistical indexes, where the method includes at least one of the following steps:
1) and determining that the second log has new abnormal business behaviors which do not appear in the first log, and determining that the alarm condition is met.
When an abnormal business behavior occurs for the first time, for example, a combination of a transaction code, an error code and error information, "a 0183a200YBLA01823135 input currency is incorrect," occurs in a first log of a new business system, and does not occur in a second log of a business system before updating, it is determined that an alarm condition is satisfied, and an alarm identifier Y is set for the abnormal business behavior.
2) And when the increase rate of the times of the same abnormal business behavior appearing in the first log is determined to be larger than the times of the same abnormal business behavior appearing in the second log, and the increase rate exceeds a preset first threshold value, determining that a preset alarm condition is met.
As shown in table 1, the table is a statistical table of abnormal service behaviors of the new service system, and only different abnormal service behaviors in the same service behavior and transaction quantities corresponding to the different abnormal service behaviors are listed in the table.
TABLE 1
Figure BDA0003412774290000101
Taking the first threshold value as 30% as an example, the number of times of occurrence of the combination of the abnormal service behavior "a 0183a200YBLA01823135 input currency" in the first log of the new service system is 8439, and the number of times of occurrence of the combination of the same abnormal service behavior "a 0183a200YBLA01823135 input currency" in the second log of the service system before updating is 5430, then for the abnormal service behavior, the growth rate of the new service system compared with the service system before updating is (8439 + 5439)/1439 ≈ 55.38% > 30%, it is determined that the alarm condition is satisfied, and the alarm flag Y is set for the abnormal service behavior.
3) And determining that the alarm condition is met when the increase rate does not exceed a preset first threshold and exceeds a second preset threshold in comparison with the occurrence frequency of the same abnormal business behavior in the first log in the summary time periods exceeding the set number in the time period.
Taking the first threshold value as 30%, the second threshold value as 10%, and the set number as 2 as an example, the new service system 2 is summarized: 10-3: 00. 3: 00-4: 00. 4: 00-5: 00; summary one week ago 2: 10-3: 00. 3: 00-4: 00. 4: 00-5: 00.
In the following step 2: 10-3: in the 00 time period, the frequency of occurrence of the combination of abnormal service behavior 'A0183A 200YBLA01823135 input currency' in the first log of the new service system is 8439 times; the number of occurrences of the same abnormal traffic behavior "a 0183a200YBLA01823135 incorrect input coin" in the second log of the pre-update traffic system was 7430 times. Then for the abnormal traffic behavior, the new traffic system has a growth rate (8439 + 7430)/7430 ≈ 13.6% and is between 10% and 30% compared with the traffic system before updating, and the accumulation flag is set to 1 (initially 0).
In the following 3: 00-4: in the time period of 00, the number of times of occurrence of the combination of abnormal service behavior 'A0183A 200YBLA01823135 input currency' in the first log of the new service system is 4239; the number of occurrences of the same abnormal traffic behavior "a 0183a200YBLA01823135 incorrect input currency" in combination of the second log of the business system before updating is 3902 times. For the abnormal business behavior, the growth rate of the new business system is (4392-3902)/3902-12.6% and is between 10% and 30% compared with the business system before updating, and the accumulation flag is set to be 2.
In the following step 4: 10-5: in the time period of 00, the number of times of occurrence of the combination of abnormal service behavior 'A0183A 200YBLA01823135 input currency' in the first log of the new service system is 7025; the number of occurrences of the same abnormal traffic behavior "a 0183a200YBLA01823135 incorrect input coin" in combination of the second log of the pre-update traffic system is 5430 times. For the abnormal traffic behavior, the growth rate of the new traffic system is (7025- > 5430)/5430 ≈ 29.4% compared with the traffic system before updating, and is between 10% and 30%, and the accumulation flag is set to 3.
And in more than 2 summary time periods in the time period, the increasing rate of the times of the same abnormal business behavior occurring in the first log is 10-30% compared with the times of the same abnormal business behavior occurring in the second log, namely the accumulation mark is 3, the alarm condition is determined to be met, and an alarm mark Y is set for the abnormal business behavior.
4) Obtaining a first average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the first log and the corresponding processing time of each time; obtaining a second average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the second log and the corresponding processing time of each time; and when the first average processing time is determined to be greater than the second average processing time and the difference value exceeds a preset threshold value, determining that an alarm condition is met.
In the process of summarizing the log information, the starting time and the ending time corresponding to each service are recorded, in the process of determining the first log/the second log corresponding to each abnormal service behavior, the times of the same abnormal service behavior appearing in the first log/the second log are counted, the processing time corresponding to each abnormal service behavior of the same abnormal service behavior in the first log/the second log is calculated, namely the ending time minus the starting time of the abnormal service behavior is processed, and the processing time corresponding to each time of the same abnormal service behavior is added and divided by the times of the same abnormal service behavior to obtain the average processing time of the abnormal service behavior.
Table 2 includes the average processing time corresponding to each abnormal transaction behavior in the first log.
TABLE 2
Figure BDA0003412774290000121
Taking an abnormal transaction behavior 'the input currency of a0183a200YBLA01823135 is incorrect', and a preset threshold value is 500ms as an example, 534ms of the abnormal transaction behavior in a new service system, 22ms of the abnormal transaction behavior in a service system before updating, 534>22, and 534-22 ═ 512ms, it is determined that an alarm condition is met, and an alarm identifier Y is set for the abnormal transaction behavior.
2. And analyzing the program module.
Abnormal service behaviors with alarm identifiers in all program modules in the service system are obtained, and a corresponding table of error codes, error information and program module names is established, as shown in table 3.
TABLE 3
Error code Error information Name of program module
YBLA01823135 Incorrect classification of the inserted coins KBSADRW0
YBLA01826734 Incorrect account number KBCRTWE1
XDBX00304590 The transaction is not allowed to proceed during this period KBCRTWE1
In a possible implementation manner, determining a program module corresponding to an abnormal business behavior with an alarm identifier includes:
establishing a mapping relation table of each abnormal business row and the corresponding program module according to the first log corresponding to each abnormal business row;
and inquiring the program module corresponding to the abnormal business behavior with the alarm identification from the mapping relation table.
Only the corresponding relation among the error codes, the error information and the program modules is counted, and the corresponding relation between the transaction codes and the program modules is not considered because one business behavior may correspond to a plurality of program modules.
In a possible implementation manner, after the abnormal traffic behavior is identified by the alarm, the method further includes:
determining a program module corresponding to the abnormal business behavior with the alarm identifier;
according to a program module comparison table with an updated version compared with a previous version, when the program module is determined to be in the program module comparison table, identifying the program module as a new version program module, otherwise, identifying the program module as an old version program module;
and sending the abnormal business behavior of the identifier to be alarmed and the corresponding identification result to the corresponding debugging client.
The program module corresponding to the abnormal business behavior with the alarm identifier does not necessarily belong to a new business system, and may also be a problem that the program module is not found in the business system before updating and is triggered after version updating, and both the program module of the new business system and the program module of the old version need to be debugged by operation and maintenance personnel.
The method provided by the embodiment of the application further comprises a function of sending the alarm information. After the above S301 and S302 are completed, the abnormal business behavior with the alarm identifier Y and the new version program module or the old version program module corresponding to the abnormal business behavior are determined, and the above contents are combined into a short message and sent to the operation and maintenance personnel.
Through the implementation mode, the new version program module and the old version program module can be distinguished, so that the reason of the problem can be quickly acquired, the problem solving direction is determined, and accurate operation and maintenance are realized.
Based on the same inventive concept, an embodiment of the present application provides a device 400 for detecting a service system, as shown in fig. 4, where the device includes:
the log data obtaining module 401 is configured to obtain log information generated when a new service system processes a service when it is detected that a version of the service system is updated, where the log information includes a service behavior, whether the service is abnormal, and an abnormal type;
the data statistics module 402 is configured to, in response to a data statistics instruction, summarize log information in a current first summarization time period, and determine a first log corresponding to each abnormal business behavior;
the summary log module 403 is configured to obtain a summary result of log information in a second summary time period counted by the historical service system before updating, and determine a second log corresponding to each abnormal service behavior, where the second summary time period and the first summary time period are the same time period in different time periods;
the alarm identification module 404 is configured to obtain a corresponding statistical index according to the first log and the second log of the same abnormal service behavior, and perform alarm identification on the abnormal service behavior when it is determined that an alarm condition is met according to a change of the statistical index.
In a possible implementation manner, the alarm identification module is configured to obtain a corresponding statistical indicator according to a first log and a second log of the same abnormal service behavior, and determine that an alarm condition is satisfied according to a change of the statistical indicator, where the method includes:
and determining that the second log has new abnormal business behaviors which do not appear in the first log, and determining that the alarm condition is met.
In a possible implementation manner, the alarm identification module is configured to obtain a corresponding statistical indicator according to a first log and a second log of the same abnormal service behavior, and determine that an alarm condition is satisfied according to a change of the statistical indicator, where the method includes:
and when the increase rate of the times of the same abnormal business behavior appearing in the first log is determined to be larger than the times of the same abnormal business behavior appearing in the second log, and the increase rate exceeds a preset first threshold value, determining that a preset alarm condition is met.
In a possible implementation manner, the alarm identification module is configured to obtain a corresponding statistical indicator according to a first log and a second log of the same abnormal service behavior, and determine that an alarm condition is satisfied according to a change of the statistical indicator, where the method includes:
and determining that the alarm condition is met when the increase rate does not exceed a preset first threshold and exceeds a second preset threshold in comparison with the occurrence frequency of the same abnormal business behavior in the first log in the summary time periods exceeding the set number in the time period.
In one possible implementation, the summary log module is configured to determine a first log/a second log corresponding to each abnormal business behavior:
and according to the information item of whether the service in the log information is abnormal, when the information item indicates that the service is abnormal, obtaining the corresponding abnormal service behavior according to the abnormal type.
In a possible implementation manner, the alarm identification module is configured to obtain a corresponding statistical indicator according to a first log and a second log of the same abnormal service behavior, and determine that an alarm condition is satisfied according to a change of the statistical indicator, where the method includes:
obtaining a first average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the first log and the corresponding processing time of each time;
obtaining a second average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the second log and the corresponding processing time of each time;
and when the first average processing time is determined to be greater than the second average processing time and the difference value exceeds a preset threshold value, determining that an alarm condition is met.
In a possible implementation manner, after the alarm information sending module is configured to determine that the abnormal service behavior performs the alarm identification, the method further includes:
determining a program module corresponding to the abnormal business behavior with the alarm identifier;
according to a program module comparison table with an updated version compared with a previous version, when the program module is determined to be in the program module comparison table, identifying the program module as a new version program module, otherwise, identifying the program module as an old version program module;
and sending the abnormal business behavior of the identifier to be alarmed and the corresponding identification result to the corresponding debugging client.
In a possible implementation manner, the alarm information sending module is configured to determine a program module corresponding to an abnormal business behavior with an alarm identifier, and includes:
establishing a mapping relation table of each abnormal business row and the corresponding program module according to the first log corresponding to each abnormal business row;
and inquiring the program module corresponding to the abnormal business behavior with the alarm identification from the mapping relation table.
In a possible implementation manner, the log data obtaining module is configured to obtain that the service obtained by the log data obtaining module includes transaction information, where the log information includes a transaction code identifying a service behavior, an error code identifying whether the service is abnormal, and error information identifying the abnormal error code, a transaction start time, and a transaction end time.
In a possible implementation manner, the module for obtaining log data is configured to, when detecting that the version of the service system is updated, obtain log information generated when a new service system processes a service, and includes:
when the version updating is determined to be completed, a timing analysis switch is turned on;
when the timing switch is determined to be turned on, acquiring log information generated when each service client processes services through the SDK by the service client deployed by the service system;
and when the stable operation index is reached after the version is determined to be updated, closing the timing analysis switch.
In one possible embodiment, the data statistics module is configured to summarize log information in a current first summarization time period in response to a data statistics instruction, and includes:
generating a data statistics instruction at a fixed time interval T, and determining a time period with the time length T before the current time as a current first summary time period;
and summarizing the log information in the current first summarizing time period.
Based on the same inventive concept, the present application provides an apparatus for detecting an anomaly of a service system, as shown in fig. 5, where the apparatus includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform any of the above embodiments of the method for detecting a business system anomaly.
The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 5. The electronic device 130 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
The processor 131 is configured to read and execute the instructions in the memory 132, so that the at least one processor can execute a method for detecting an exception to a service system provided in the foregoing embodiments.
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, the aspects of a method for detecting business system anomalies provided by the present application can also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps of a method for detecting business system anomalies according to various exemplary embodiments of the present application, described above in this specification, when the program product is run on the computer device.
In addition, the present application also provides a computer-readable storage medium storing a computer program for causing a computer to execute the method of any one of the above embodiments.
These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (15)

1. A method for detecting an abnormality of a service system, the method comprising:
when detecting that the version of the service system is updated, acquiring log information generated when the new service system processes the service, wherein the log information comprises service behaviors, whether the service is abnormal or not and abnormal types;
in response to a data statistics instruction, collecting log information in a current first collection time period, and determining first logs corresponding to various abnormal business behaviors;
acquiring a summary result of log information in a second summary time period counted by a business system before updating in history, and determining a second log corresponding to each abnormal business behavior, wherein the second summary time period and the first summary time period are the same time period in different time periods;
and obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is determined to be met according to the change of the statistical indexes.
2. The method of claim 1, wherein obtaining corresponding statistical indexes according to a first log and a second log of a same abnormal business behavior, and determining that an alarm condition is satisfied according to a change of the statistical indexes comprises:
and determining that the second log has new abnormal business behaviors which do not appear in the first log, and determining that the alarm condition is met.
3. The method of claim 1, wherein obtaining corresponding statistical indexes according to a first log and a second log of a same abnormal business behavior, and determining that an alarm condition is satisfied according to a change of the statistical indexes comprises:
and when the increase rate of the times of the same abnormal business behavior appearing in the first log is determined to be larger than the times of the same abnormal business behavior appearing in the second log, and the increase rate exceeds a preset first threshold value, determining that a preset alarm condition is met.
4. The method of claim 1, wherein obtaining corresponding statistical indexes according to a first log and a second log of a same abnormal business behavior, and determining that an alarm condition is satisfied according to a change of the statistical indexes comprises:
and determining that the alarm condition is met when the increase rate does not exceed a preset first threshold and exceeds a second preset threshold in comparison with the occurrence frequency of the same abnormal business behavior in the first log in the summary time periods exceeding the set number in the time period.
5. The method according to any one of claims 1 to 4, wherein the first log/second log corresponding to each abnormal business behavior is determined:
and according to the information item of whether the service in the log information is abnormal, when the information item indicates that the service is abnormal, obtaining the corresponding abnormal service behavior according to the abnormal type.
6. The method of claim 1, wherein obtaining corresponding statistical indexes according to a first log and a second log of a same abnormal business behavior, and determining that an alarm condition is satisfied according to a change of the statistical indexes comprises:
obtaining a first average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the first log and the corresponding processing time of each time;
obtaining a second average processing time of the abnormal business behaviors according to the times of the same abnormal business behavior appearing in the second log and the corresponding processing time of each time;
and when the first average processing time is determined to be greater than the second average processing time and the difference value exceeds a preset threshold value, determining that an alarm condition is met.
7. The method of claim 1, wherein after the abnormal traffic behavior is identified by the alarm, the method further comprises:
determining a program module corresponding to the abnormal business behavior with the alarm identifier;
according to a program module comparison table with an updated version compared with a previous version, when the program module is determined to be in the program module comparison table, identifying the program module as a new version program module, otherwise, identifying the program module as an old version program module;
and sending the abnormal business behavior of the identifier to be alarmed and the corresponding identification result to the corresponding debugging client.
8. The method of claim 7, wherein determining the program module corresponding to the abnormal business behavior with the alarm flag comprises:
establishing a mapping relation table of each abnormal business row and the corresponding program module according to the first log corresponding to each abnormal business row;
and inquiring the program module corresponding to the abnormal business behavior with the alarm identification from the mapping relation table.
9. The method of claim 7, wherein the service comprises transaction information, and wherein the log information comprises a transaction code identifying a behavior of the service, an error code identifying whether the service is abnormal, and error information describing the error code, a transaction start time, and a transaction end time.
10. The method of claim 1, wherein acquiring log information generated when a new business system processes a business when a version update of the business system is detected comprises:
when the version updating is determined to be completed, a timing analysis switch is turned on;
when the timing switch is determined to be turned on, acquiring log information generated when each service client processes services through the SDK by the service client deployed by the service system;
and when the stable operation index is reached after the version is determined to be updated, closing the timing analysis switch.
11. The method of claim 1, wherein aggregating log information for a current first aggregation time period in response to a data statistics instruction comprises:
generating a data statistics instruction at a fixed time interval T, and determining a time period with the time length T before the current time as a current first summary time period;
and summarizing the log information in the current first summarizing time period.
12. An apparatus for detecting an anomaly in a business system, the apparatus comprising:
the system comprises an acquisition log data module, a processing module and a processing module, wherein the acquisition log data module is used for acquiring log information generated when a new service system processes services when detecting that the version of the service system is updated, and the log information comprises service behaviors, whether the services are abnormal and abnormal types;
the data statistics module is used for responding to a data statistics instruction, summarizing the log information in the current first summarizing time period and determining first logs corresponding to various abnormal business behaviors;
the summary log module is used for acquiring a summary result of log information in a second summary time period counted by the business system before updating in history, and determining a second log corresponding to each abnormal business behavior, wherein the second summary time period and the first summary time period are the same time period in different time periods;
and the alarm identification module is used for obtaining corresponding statistical indexes according to the first log and the second log of the same abnormal business behavior, and carrying out alarm identification on the abnormal business behavior when the alarm condition is met according to the change of the statistical indexes.
13. An apparatus for detecting an abnormality in a business system, the apparatus comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
14. A computer program product for causing a computer to perform the method of any one of claims 1 to 11.
15. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-11.
CN202111536537.6A 2021-12-15 2021-12-15 Method, device and equipment for detecting abnormity of business system Pending CN114201201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536537.6A CN114201201A (en) 2021-12-15 2021-12-15 Method, device and equipment for detecting abnormity of business system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536537.6A CN114201201A (en) 2021-12-15 2021-12-15 Method, device and equipment for detecting abnormity of business system

Publications (1)

Publication Number Publication Date
CN114201201A true CN114201201A (en) 2022-03-18

Family

ID=80654135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536537.6A Pending CN114201201A (en) 2021-12-15 2021-12-15 Method, device and equipment for detecting abnormity of business system

Country Status (1)

Country Link
CN (1) CN114201201A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883346A (en) * 2023-02-23 2023-03-31 广州嘉为科技有限公司 FDEP log-based anomaly detection method and device and storage medium
CN116166501A (en) * 2023-02-03 2023-05-26 上海擎创信息技术有限公司 Log verification method and device, electronic equipment and storage medium
CN116383083A (en) * 2023-04-23 2023-07-04 中航信移动科技有限公司 Multi-interface connection-based abnormal data source determining method and storage medium
CN117076953A (en) * 2023-09-20 2023-11-17 深圳市小赢信息技术有限责任公司 Asynchronous service exception handling method, electronic device and computer readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166501A (en) * 2023-02-03 2023-05-26 上海擎创信息技术有限公司 Log verification method and device, electronic equipment and storage medium
CN116166501B (en) * 2023-02-03 2024-03-08 上海擎创信息技术有限公司 Log verification method and device, electronic equipment and storage medium
CN115883346A (en) * 2023-02-23 2023-03-31 广州嘉为科技有限公司 FDEP log-based anomaly detection method and device and storage medium
CN116383083A (en) * 2023-04-23 2023-07-04 中航信移动科技有限公司 Multi-interface connection-based abnormal data source determining method and storage medium
CN116383083B (en) * 2023-04-23 2024-01-12 中航信移动科技有限公司 Multi-interface connection-based abnormal data source determining method and storage medium
CN117076953A (en) * 2023-09-20 2023-11-17 深圳市小赢信息技术有限责任公司 Asynchronous service exception handling method, electronic device and computer readable storage medium
CN117076953B (en) * 2023-09-20 2024-01-16 深圳市小赢信息技术有限责任公司 Asynchronous service exception handling method, electronic device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN114201201A (en) Method, device and equipment for detecting abnormity of business system
CN110661659B (en) Alarm method, device and system and electronic equipment
EP3373516B1 (en) Method and device for processing service calling information
CN101902366B (en) Method and system for detecting abnormal service behaviors
AU2019275633B2 (en) System and method of automated fault correction in a network environment
CN110149223B (en) Fault positioning method and equipment
CN109934268B (en) Abnormal transaction detection method and system
CN101321084A (en) Method and apparatus for generating configuration rules for computing entities within a computing environment using association rule mining
CN107302469B (en) Monitoring device and method for data update of distributed service cluster system
CN110674014A (en) Method and device for determining abnormal query request
CN109509082B (en) Monitoring method and device for bank application system
CN110727533A (en) Alarm method, device, equipment and medium
CN111131290B (en) Flow data processing method and device
CN101925039A (en) Prewarning method and device of billing ticket
CN113986595A (en) Abnormity positioning method and device
CN108418703B (en) Early warning method and system based on real-time event detection
CN113377559A (en) Big data based exception handling method, device, equipment and storage medium
CN114022151A (en) Block chain data visualization method and system, electronic device and storage medium
CN108809729A (en) The fault handling method and device that CTDB is serviced in a kind of distributed system
CN106951360B (en) Data statistical integrity calculation method and system
CN113590427A (en) Alarm method, device, storage medium and equipment for monitoring index abnormity
CN112491622B (en) Method and system for locating fault root cause of service system
CN112416896A (en) Data abnormity warning method and device, storage medium and electronic device
CN111401874A (en) Self-service transaction system monitoring method and device
CN113472881B (en) Statistical method and device for online terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination