CA3141565A1

CA3141565A1 - Method and system for automatically monitoring business systems

Info

Publication number: CA3141565A1
Application number: CA3141565A
Authority: CA
Inventors: Yuxue Bao; Sikai Wang; Zhiliang Geng
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2020-12-09
Filing date: 2021-12-09
Publication date: 2022-06-09
Also published as: CN112615737B; CN112615737A

Abstract

A method and a system for automatically monitoring business systems are disclosed. The method includes: automatically collecting dependency property information of a business system through JAR packages of a monitoring and inspecting component and reporting the dependency property information to a service end; after the service end configures the relevant business system according to the dependency property information, automatically generating inspection case tasks and issuing the inspection case tasks to the monitoring and inspecting component; executing the inspection case tasks and reporting inspection results by the monitoring and inspecting component; and locating problems and giving alerts by the service end according to the inspection results. The method and the system solve the technical problems raised from complexity and ineffectiveness in recognizing and locating abnormalities of business services as seen in existing methods for monitoring business systems.

Description

METHOD AND SYSTEM FOR AUTOMATICALLY MONITORING BUSINESS
SYSTEMS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of E-business, and more particularly to a method and a system for automatically monitoring business systems.
Description of Related Art

[0002] In early days where systems are of relatively small scales, maintenance and operation of these systems mainly relied on manual processing of maintenance and operation staff. With the rapid growth of business and increasing diverseness of services, building networks has become more and more complicated. A single system may involve multiple devices and need to deploy plural instances, making manual checking and locating problems by maintenance and operation staff become more difficult and more inefficient. Increase of the number of devices also means that communication between maintenance and operation teams are becoming costly. Since various logs and alerts are distributed over different devices, checking statistical data and then prompting relevant businesses to take action are time- and effort-consuming.

[0003] The traditional business systems lack for a consistent monitoring solution, and usually rely on third-party systems for measurement of various indicators, making it complicated for businesses to adopt. Besides, since its indicator monitoring data and alert-related configuration are scattered over different systems, though such known systems can configure several monitoring alerts, it is difficult for them to see the root causes of problems and to estimate affected ranges. Additionally, even if such known systems give alerts, the alert information is nevertheless too blurred for maintenance and operation staff to locate problems directly. In this case, businesses have to check all the suspected part one by one, making troubleshooting time-and effort-consuming.

[0004] Hence, there is a pressing need for a method or a system that is easy to adopt and can provide business systems with automated inspection to fast identify problems.

Date recue / Date received 2021-12-09 SUMMARY OF THE INVENTION

[0005] To address the defects of the prior art, the objective of the present invention is to provide a method and a system for automatically monitoring business systems that solve the technical problems raised from complexity and ineffectiveness in recognizing and locating abnormalities of business services as seen in existing methods for monitoring business systems.

[0006] In a first aspect, the present invention provides a method for automatically monitoring business systems. The method comprises: automatically collecting dependency property information of a business system through JAR packages of a monitoring and inspecting component and reporting the dependency property information to a service end;

[0007] after the service end configures the relevant business system according to the dependency property information, automatically generating inspection case tasks and issuing the inspection case tasks to the monitoring and inspecting component;

[0008] executing the inspection case tasks and reporting inspection results by the monitoring and inspecting component; and

[0009] locating problems and giving alerts by the service end according to the inspection results.

[0010] Further, the configuration comprises: custom configuration of business system cases, configuration of user access authority, configuration of sensitivity of early warning and alerts, and configuration of inspection task strategies.

[0011] Further, the monitoring and inspecting component according to different business system services, configures different parameters and task strategies and performs task scheduling on the monitoring and inspecting component, the monitoring and inspecting component periodically executes inspection case tasks.

[0012] Further, the custom configuration of business system cases comprises mapping rules for inspection tasks, and service names of service machines are acquired through the mapping rules, thereby locating inspected abnormal services.

[0013] Further, the service end actively gives troubleshooting instructions to business nodes determined as having failures, so as to acquire information required by relevant troubleshooting, Date recue / Date received 2021-12-09 which includes acquiring memory dump documents and acquiring JVM thread running states.

[0014] Further, the service end provides component expansion service ports.
According to different business system services, the service end provides parameters to configure the component expansion service ports and control a monitoring and inspecting component to monitor different case tasks.

[0015] Further, for indicator-type inspection results, calculation is made based on baseline values from historical inspections and the configured sensitivity so as to determine whether an early warning and an alert are to be triggered.

[0016] Further, and for middleware-type inspection results, fault sources are identified by means of multi-machine cross inspection.

[0017] Further, the monitoring and inspecting component further verifies a post-alert processed business problem, and, after verification, gives a feedback instruction to the service end, to make the service end cancel the early warning or the alert for the business problem.

[0018] In another aspect, the present invention provides a system for automatically monitoring business systems. The system comprises:

[0019] a monitoring and inspecting component, automatically collecting dependency property information of a business system through component JAR packages and reporting the dependency property information to a configuration service module, executing inspection case tasks issued by the configuration service module, and reporting inspection results; and

[0020] the configuration service module, configuring relevant business systems according to the dependency property information, generating the inspection case tasks issued to the monitoring and inspecting component, and receiving the inspection results reported by the monitoring and inspecting component so as to locate problems and give an alert.

[0021] Further, the system further comprises: inspection query module, for making queries for inspection case information according to a query interface provided by the configuration service module.

[0022] As compared to the prior art, the method and system for automatically monitoring business systems of the present invention accomplish the following technical effects:

[0023] Adoption is easy and less intrusive to existing business systems, and only needs Date recue / Date received 2021-12-09 introducing JAR packages of the component, by which information related to service-dependency link properties can be automatically collected and reported to the service end through the JAR packages of the introduced component.

[0024] 2. The present invention supports custom inspection tasks, and is capable of issuing operational states of the process nodes processed by the business system itself in a quasi-real-time manner, and accurately identifies which process node has failure and underlying reason according to predetermined rules.

[0025] 3. By using system sensitivity parameters as the benchmark values for early warning and alerts, the present invention eliminates the need of separate configuration of thresholds for early warning and alerts, while enhancing accuracy of early warning and alerts, thereby reducing the risk of false positives.
BRIEF DESCRIPTION OF THE DRAWINGS

[0026] FIG. 1 is a flowchart of a method for automatically monitoring business systems according to one embodiment of the present invention.

[0027] FIG. 2 is a structural diagram of a system for automatically monitoring business systems according to one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION

[0028] The following description details some embodiments of the present invention, and illustrations of these embodiments are shown in the accompanying drawings.
Therein, the embodiments described with reference to the drawings are exemplary and intended to explain the present invention, and shall by no means be understood as limitation to the present invention.

[0029] FIG. 1 is a flowchart of a method for automatically monitoring business systems according to one embodiment of the present invention.

[0030] As shown in FIG. 1, a method for automatically monitoring business systems comprises the following steps.

[0031] The step Si 1 involves automatically collecting dependency property information of a Date recue / Date received 2021-12-09 business system through JAR packages of a monitoring and inspecting component and reporting the dependency property information to a service end.

[0032] In a business system, the monitoring and inspecting component are introduced for collecting dependency service link property information through JAR packages, and reporting the dependency property information to a service end. The existing solutions for business monitoring, system monitoring, and custom monitoring have to connect multiple client ends, and have to be upgraded separately with the upgrading of these client ends, so the operation is relatively complicated. The present invention uses a single component that can be easily introduced into business systems in a consistent, less-intrusive manner, thereby eliminating the need of additional configuration of information related to service-dependent link properties of system at the business system. Instead, adoption can be simply accomplished using JAR
packages of the monitoring and inspecting component. The JAR packages, designed as plug-in elements, support not only business monitoring but also support system monitoring and custom expansion monitoring. Therein, the dependency property information specifically refers to two things, namely dependency services and property information. The dependency services mainly refer to information of data exchange between the present system and the exterior, including databases, caches, and external system ports accessed. The property information refers to information of addresses or authorization required by linking to databases, such as domain names, IP addresses, ports, instance names, and URLs of services. These details will be sent to the service end for storage. The monitoring and inspecting component, based on agreed rules, automatically detects systems, services, and middleware, including Java virtual machines, and Docker containers. For example, it recognizes whether MySQL, Redis, or MQ is used, initiates linking according to its property information, and regularly performs instructions about default detection.

[0033] At the step S12, the method involves, after the service end configures the relevant business system according to the dependency property information, automatically generating inspection case tasks and issuing the inspection case tasks to the monitoring and inspecting component.

[0034] The monitoring component sends the dependency property information of individual Date recue / Date received 2021-12-09 business systems collected in a real-time manner to the service end. After receiving the dependency property information from the monitoring and inspecting component, the service end first verifies consistency. The latest information received by the service end is compared with the previously stored information, and if the information is about a newly added dependency service link, updating is performed at the service end, such as updating the IP
address of a database. After verification, the relevant business system is configured. With the received service link property information of an introduced system, the service end can learn what are the services the system depends on, and generate corresponding inspection cases according to these services.

[0035] The configuration as described above comprises: custom configuration of business system cases, configuration of sensitivity of early warning and alerts, access authority configuration of the systems, and inspection strategies of the monitoring and inspecting component, etc. Therein, different business service ports are configured with different contents.

[0036] The custom configuration is mainly about configuring services, ports, or local business code logics designated to be inspected by the relevant business system, so as to achieve custom inspection. Therein, the custom configuration further covers mapping rules of different business services, separation into standalone machines according to rules, and their service names.
Depending on business service ports, different parameters can be configured.
For example, the parameters required by a money-transfer port include two test bank account numbers and the amount to be transferred, etc. The parameters required by an order query service port include a user ID, and start and end time. Through these configuration operations, the subsequent detection queries and information feedback for the monitoring and inspecting component are facilitated.
The sensitivity configuration for early warning and alerts is about setting different sensitivity for different business systems according to practical needs, with the aim to reduce false positives in subsequent determination, and particularly effective to determination for indicator-type results.
The access authority configuration for the system is about setting access authorities of maintenance and operation staff. The inspection strategies provide the combinations of inspection items of different business types and the inspection frequencies for the monitoring and inspecting component to perform inspection. For example, in order to check whether a Date recue / Date received 2021-12-09 certain system is normal, inspection for all items may be set as being performed automatically at a set interval, and cross inspection may be also set.

[0037] After custom configuration of inspection tasks of business systems, the disclosed system can automatically generate inspection case tasks based on the set inspection rules and task strategies, and issue the inspection case tasks to the monitoring and inspecting component.

[0038] Therein, the set rules can be briefed as below:
1) The Jar packages of the monitoring and inspecting components automatically acquire JNDI name of a database, and automatically partition and extract a read JNDI and a write JNDI according to the abbreviation of the relevant system and JNDI naming rules, automatically. For example, assuming that the system abbreviation is pgmcms, the corresponding reading and writing JNDI are:
pgmcmsRDS and pgmcmsWDS, respectively. The contents extracted from the petitioning rules are: pgmcmsRDS 1, pgmcmsRDS 2, .........................
and pgmcmsWDS 1, pgmcmsWDS 2, ..............................................................
After the data source is determined, the dependent default verification SQL is performed according to the database type. For example, SELECT VERSION() is for MySQL, and SELECT*FROM DUAL is for ORACLE.
2) The Jar packages of the monitoring and inspecting components automatically acquire the redis sharding name, and configure familiar information such as links in documents and keys according to the abbreviation of the relevant system and Redis redis.conf. For example, assuming that the system abbreviation is pgmcms, the corresponding redis sharding is pgmcms 1, pgmcms 2, .....

[0039] By setting rule examples, accurate problem locating can be realized.
Assuming that 5 databases (each machine is installed with a database) provide a VIP outgoing service, it is impossible to tell the database having problems is on which machine by directly inspecting VIP.
Therefore, the names of service provided on the individual machines have to be found according to the VIP service names. This process is achieved using the mapping rules.
Without rule mapping, inspection can only depend on domain names known to the public. In this case, even if it is certain that there are problems, it is impossible to tell the faulty machine from all the machines providing services to the site having that domain name.

Date recue / Date received 2021-12-09

[0040] The set task strategies can be briefed as below:

[0041] The monitoring and inspecting component automatically generates a time interval of inspection according to the level of the relevant business system (e.g., Level 1, 2, or 3, in which Level 1 has the top priority, and when a Level 1 system has problems, the service provision is interrupted) and whether the relevant system is on a core link. Therein, the default inspection interval is the mean of intervals of inspections of the connected systems having the same system level, the same link, and the same inspection task dimensions. The inspection intervals may be customly configured at the service end.

[0042] In addition, the service end further performs task scheduling on the monitoring and inspecting component, so as to control plural monitoring and inspecting components to conduct inspections for different task instances. The task scheduling mainly includes:
1. The monitoring and inspecting component is loaded with all available case tasks of the introduced system, and executes them at a set frequency; and 2. the service end may alternatively initiate a case inspection on a certain machine in the business system, or it may initiate case inspections on all the machines in this system. Since the service end further provides component expansion service ports allowing introduce of different business systems and provision of services of various types, the only thing has to be done at this time is to issue new parameters through the ports or adjust parameters and reschedule tasks, so that operations of different business services can be easily recognized.

[0043] In the step S13, the monitoring and inspecting component executes an inspection task and reports inspection result.

[0044] The monitoring and inspecting component executes an inspection case and reports the results of its execution of this case through RESTFUL. RESTFUL is connected to different services through ports defined in advance, and thus features excellent capability of expansion.
When a depend service inspection reveals abnormality, it is natural to accurately tell which is the problematic dependency service. For example, when Redis has a fault, the inspecting component will inform the service end of which machine has problems about Redis services or is unable to connect link. When the inspecting component finds early warning about abnormality of a JVM (java virtual machine), such as early warning for memory usage, the inspection Date recue / Date received 2021-12-09 component can automatically generate a dump document for memory usage, thereby facilitating fasting locating the problems. Objects of inspections may be usability inspection of an entire cluster, or may be services of service of every machine in a cluster.

[0045] At the step S14, the service end according to inspection results locates the problem and give alerts.

[0046] The service end according to the cases reported in a real-time manner executes and store log information, and gives alerts. Specifically, the service end analyzes the log of inspection results, and calculates early warning and alerts according to sensitivity settings of individual business systems. The results are them sent to the user. Besides, the service end can further accurately locate problems. With the previously set rules, the specific machine providing some certain service can be identified through mapping, so that it is possible to accurately locate which machine is the one having failure.

[0047] Another example is about middleware-type inspection results. Herein, multi-machine cross inspection is performed on middleware to identify the final fault source. For example, when three JBOSS machines are separately inspected for database usability, if two of them are normal and the other one is abnormal, it can be certain that the abnormal JBOSS machine has problems.

[0048] A further example is about indicator-type inspection results. To inspect indicators such as memory usage or threads, the baseline values based on historical inspections and the sensitivity level set at the system are combined to get criteria for giving early warning and alerts.
For example, assuming that the baseline value of memory usage is 60%, and the sensitivity level set for the system is 0.3, early warning will be issued when the memory usage reaches 78%.
Different systems may have different sensitivity levels to errors. By introducing system sensitivity parameters as basic values for early warning and alerts, the need of directly configuring thresholds for early warning and alerts is eliminated, and the accuracy of early warning and alerts is improved, with reduced false positives.

[0049] Referring to FIG. 2, another embodiment of the present invention provides a system for automatically monitoring business systems. The system comprises:

[0050] a monitoring and inspecting component, automatically collecting dependency property Date recue / Date received 2021-12-09 information of a business system through component JAR packages and reporting the dependency property information to a configuration service module, executing inspection case tasks issued by the configuration service module, and reporting inspection results; and

[0051] the configuration service module, configuring relevant business systems according to the dependency property information, generating the inspection case tasks issued to the monitoring and inspecting component, and receiving the inspection results reported by the monitoring and inspecting component so as to locate problems and give an alert. In addition, the configuration service module further stores configuration information and inspection result records of the individual business systems.

[0052] There is also an inspection query module for making queries for inspection case information according to a query interface provided by the configuration service module. The queries may include queries for alert notification information, inspection records, and analysis records.

[0053] The disclosed system may be introduced through simulating a monitoring and inspecting component in a business system, while expanding custom inspection case configuration and system sensitivity setting. Then the information of results of the inspection cases of the business system can be reported through the monitoring and inspecting component in a real-time manner, for the disclosed system to analyze the result information, thereby accurately detecting whether any business inspected needs early warning or alert.

[0054] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims.
Hence, the scope of the present invention shall only be defined by the appended claims.
Date recue / Date received 2021-12-09

Claims

What is claimed is:

1. A method for automatically monitoring business systems, the method comprising:
automatically collecting dependency property information of a business system through JAR
packages of a monitoring and inspecting component and reporting the dependency property information to a service end;
after the service end configures the relevant business system according to the dependency property information, automatically generating inspection case tasks and issuing the inspection case tasks to the monitoring and inspecting component;
executing the inspection case tasks and reporting inspection results by the monitoring and inspecting component; and locating problems and giving alerts by the service end according to the inspection results.

2. The method of claim 1, wherein the configuration comprises: custom configuration of business system cases, configuration of user access authority, configuration of sensitivity of early warning and alerts, and configuration of inspection task strategies.

3. The method of claim 2, wherein the monitoring and inspecting component according to different business system services, configures different parameters and task strategies and performs task scheduling on the monitoring and inspecting component, the monitoring and inspecting component periodically executes inspection case tasks.

4. The method of claim 2 or 3, wherein the custom configuration of business system cases Date recue / Date received 2021-12-09 comprises mapping rules for inspection tasks, and service names of service machines are acquired through the mapping rules, thereby locating inspected abnormal services.

The method of claim 4, wherein the service end actively gives troubleshooting instructions to business nodes determined as having failures, so as to acquire information required by relevant troubleshooting, which includes acquiring memory dump documents and acquiring JVM thread running states.

6. The method of claim 3, wherein the service end further provides component expansion service ports.

7. The method of claim 4, wherein for indicator-type inspection results, calculation is made based on baseline values from historical inspections and the configured sensitivity so as to determine whether an early warning and an alert are to be triggered; and for middleware-type inspection results, fault sources are identified by means of multi-machine cross inspection.

8. The method of claim 4, wherein the monitoring and inspecting component further verifies a post-alert processed business problem, and, after verification, gives a feedback instruction to the service end, to make the service end cancel the early warning or the alert for the business problem.

9. A system for automatically monitoring business systems, the system comprising:
a monitoring and inspecting component, automatically collecting dependency property information of a business system through component JAR packages and reporting the dependency property information to a configuration service module, executing inspection case tasks issued by the configuration service module, and reporting inspection results; and the configuration service module, configuring relevant business systems according to the dependency property information, generating the inspection case tasks issued to the monitoring Date recue / Date received 2021-12-09 and inspecting component, and receiving the inspection results reported by the monitoring and inspecting component so as to locate problems and give an alert.

10. The system of claim 9, further comprising: an inspection query module, for making queries for inspection case information according to a query interface provided by the configuration service module.

Date recue / Date received 2021-12-09