WO2024002327A1 - 一种云监控和分析方法、系统 - Google Patents

一种云监控和分析方法、系统 Download PDF

Info

Publication number
WO2024002327A1
WO2024002327A1 PCT/CN2023/104509 CN2023104509W WO2024002327A1 WO 2024002327 A1 WO2024002327 A1 WO 2024002327A1 CN 2023104509 W CN2023104509 W CN 2023104509W WO 2024002327 A1 WO2024002327 A1 WO 2024002327A1
Authority
WO
WIPO (PCT)
Prior art keywords
database
analysis
sql
monitoring
dimension
Prior art date
Application number
PCT/CN2023/104509
Other languages
English (en)
French (fr)
Inventor
张晓磊
王昊玄
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211172568.2A external-priority patent/CN117370128A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024002327A1 publication Critical patent/WO2024002327A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Definitions

  • This application relates to the field of cloud technology, and in particular to a cloud monitoring and analysis method and system.
  • the existing monitoring and analysis methods are relatively simple. There are problems such as the separation of monitoring functions and analysis functions, and the insufficient analysis dimensions. Therefore, an end-to-end monitoring and analysis system is needed to quickly identify the causes of database load. Identify the causes of abnormal events and quickly restore normal business operations.
  • the embodiments of this application provide a cloud monitoring and analysis method and system, which helps to quickly identify the causes of abnormal load events in the database, so as to quickly restore normal business operations.
  • embodiments of the present application provide a cloud monitoring and analysis method, which can be implemented by a cloud monitoring and analysis system.
  • the method can include: obtaining the configuration information input or selected by the tenant on the cloud management platform.
  • the configuration information Used to represent the tenant's monitoring and analysis requirements for the database.
  • the database is used to store the business data of the tenant's business system; analyze the target structured query language SQL statement to be optimized according to the configuration information to obtain the analysis Report that the target SQL statement is a historical statement that operates on the database.
  • the cloud monitoring and analysis system provides a new service form to perform cloud monitoring and analysis on the database to quickly identify the causes of abnormal load events in the database and quickly restore normal business operations.
  • the method further includes: receiving the target SQL statement input or selected by the tenant on the cloud management platform.
  • the cloud monitoring and analysis system supports tenants to customize the input of target SQL statements to be optimized, which helps expand the application scenarios of the cloud monitoring and analysis system in the embodiment of this application.
  • analyzing the target structured query language SQL statement to be optimized according to the configuration information includes: when a load abnormal event occurs in the database, analyzing the target structured query language SQL statement according to the configuration information. Analyze the target SQL statement to be optimized.
  • the cloud monitoring and analysis system can provide cloud monitoring services to the database to detect abnormal load events of the database.
  • the configuration information is used to indicate at least one monitoring indicator associated with the load abnormal event and an indicator threshold corresponding to the monitoring indicator
  • the method further includes: obtaining the Indicator data of at least one monitoring indicator; when the indicator data is greater than or equal to the corresponding indicator threshold, it is determined that the load abnormal event occurs in the database.
  • the cloud monitoring and analysis system can receive the tenant's customized input or selected monitoring indicators, and collect and detect data based on the monitoring indicators to determine whether an abnormal load event occurs.
  • the monitoring indicators include system indicators and/or business indicators, and the system indicators At least one of the following: CPU load, memory load, disk read/write (IO) load, network packet loss rate or network delay of the computing device to which the service that accesses the database belongs; the business indicators include at least one of the following: total database links Number, number of active database connections, table expansion rate, master-slave synchronization rate, and number of slow SQLs per minute.
  • system indicators At least one of the following: CPU load, memory load, disk read/write (IO) load, network packet loss rate or network delay of the computing device to which the service that accesses the database belongs
  • the business indicators include at least one of the following: total database links Number, number of active database connections, table expansion rate, master-slave synchronization rate, and number of slow SQLs per minute.
  • monitoring indicators can also be adjusted according to the business system or application requirements, which will not be described again here.
  • the configuration information is used to indicate at least one log analysis dimension of the database
  • the method further includes: analyzing the target of the database from the at least one log analysis dimension. Perform log analysis on the log; determine the target SQL statement to be optimized based on the analysis result of the target log in the at least one log analysis dimension.
  • the at least one log analysis dimension includes: the execution time-consuming dimension of a single SQL, the total execution time-consuming dimension of a single category of SQL, the SQL execution business proportion dimension, and the SQL connection host Proportion dimension, SQL delay distribution dimension, or SQL query rate per second QPS dimension.
  • the target structured query language SQL statement to be optimized is analyzed according to the configuration information, and an analysis report is obtained, including: restoring a backup of the database on a replica database node Data package; execute the target SQL statement on the replica database node to obtain an analysis report.
  • the configuration information is also used to indicate information describing the load scenario of the database
  • the target SQL statement is executed on the replica database node to obtain an analysis report. , including: after constructing the load scenario on the replica database node according to the configuration information, executing the target SQL statement on the replica database node to obtain an analysis report.
  • the cloud monitoring and analysis system can provide users with a customized configuration channel, so that tenants can customize the input background parameters based on this channel, so that the cloud monitoring and analysis system can construct a pressure background, and under the constructed pressure background Execute SQL statements to improve the analysis efficiency of the cloud monitoring and analysis system.
  • the information describing the load scenario of the database includes at least one of the following: pressure mode name, service name, database name, concurrent number of SQL executions, or concurrently executed SQLs.
  • the analysis report includes at least one of the following: analysis results of at least one log analysis dimension of the database; indicator data of at least one monitoring indicator of the database; The target SQL statement; the execution plan of the target SQL statement; or a rendering of the execution plan of the target SQL statement.
  • the content of the analysis report is presented in at least one of the following presentation methods: progress bar, percentage, pie chart, list, line chart or dashboard.
  • embodiments of the present application provide a cloud monitoring and analysis system, including: a cloud management platform for receiving configuration information input or selected by a tenant, where the configuration information is used to represent the tenant's monitoring and analysis of the database. Analysis requirements, the database is used to store business data of the tenant's business system; the analysis device is used to analyze the target structured query language SQL statement to be optimized according to the configuration information, and obtain an analysis report, the target SQL Statements are statements that historically operate on the database.
  • the cloud management platform is configured to receive the target SQL statement input or selected by a tenant on the cloud management platform.
  • the analysis device is configured to analyze the target SQL statement to be optimized according to the configuration information when an abnormal load event occurs in the database.
  • the configuration information is used to indicate at least one monitoring indicator associated with the load abnormal event and an indicator threshold corresponding to the monitoring indicator
  • the analysis device is also used to: obtain Indicator data of the at least one monitoring indicator; when the indicator data is greater than or equal to the corresponding indicator threshold, it is determined that the load abnormal event occurs in the database.
  • the monitoring indicators include system indicators and/or business indicators, and the system indicators include at least one of the following: CPU load, memory load, disk load of the computing device to which the service that accesses the database belongs. Read/write (IO) load, network packet loss rate or network delay; the business indicators include at least one of the following: total number of database links, number of active database connections, table expansion rate, master-slave synchronization rate, slow SQL per minute quantity.
  • system indicators include at least one of the following: CPU load, memory load, disk load of the computing device to which the service that accesses the database belongs. Read/write (IO) load, network packet loss rate or network delay
  • the business indicators include at least one of the following: total number of database links, number of active database connections, table expansion rate, master-slave synchronization rate, slow SQL per minute quantity.
  • the configuration information is used to indicate at least one log analysis dimension of the database
  • the analysis device is further configured to: analyze the database from the at least one log analysis dimension. Perform log analysis on the target log; determine the target SQL statement to be optimized based on the analysis result of the target log in the at least one log analysis dimension.
  • the at least one log analysis dimension includes: the execution time-consuming dimension of a single SQL, the total execution time-consuming dimension of a single category of SQL, the SQL execution business proportion dimension, and the SQL connection host Proportion dimension, SQL delay distribution dimension, or SQL query rate per second QPS dimension.
  • the analysis device is configured to: restore the backup data package of the database at the replica database node; execute the target SQL statement at the replica database node to obtain the analysis Report.
  • the configuration information is also used to indicate information describing the load scenario of the database
  • the analysis device is configured to: based on the configuration information, clone the database node After constructing the load scenario, execute the target SQL statement on the replica database node to obtain an analysis report.
  • the information describing the load scenario of the database includes at least one of the following: pressure mode name, service name, database name, concurrent number of SQL executions, or concurrently executed SQLs.
  • the analysis report includes at least one of the following: analysis results of at least one log analysis dimension of the database; indicator data of at least one monitoring indicator of the database; The target SQL statement; the execution plan of the target SQL statement; or a rendering of the execution plan of the target SQL statement.
  • the content of the analysis report is presented in at least one of the following presentation methods: progress bar, percentage, pie chart, list, line chart or dashboard.
  • embodiments of the present application provide a computing device cluster, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute the at least one computing device.
  • the instructions stored in the memory enable the computing device cluster to execute the method described in the above first aspect or any possible design of the first aspect.
  • embodiments of the present application provide a computer program product containing instructions.
  • the computing device cluster causes the computing device cluster to execute the first aspect or any possible design of the first aspect. the method described.
  • embodiments of the present application provide a computer-readable storage medium, including computer program instructions.
  • the computer program instructions When the computer program instructions are executed by a computing device cluster, the computing device cluster executes the above first aspect or the first aspect. Any possible design of the described method.
  • Figure 1 shows a schematic diagram of the system architecture applicable to the embodiment of the present application
  • FIG. 2 shows a schematic diagram of the management interface for service registration according to the embodiment of the present application
  • Figure 3 shows a schematic diagram of an interface for managing SQL templates according to an embodiment of the present application
  • Figure 4 shows a schematic diagram of a management interface for configuring monitoring indicators according to an embodiment of the present application
  • Figure 5 shows a schematic diagram of an interface for managing system indicator monitoring threshold templates according to an embodiment of the present application
  • Figure 6 shows a schematic diagram of an interface for managing a database according to an embodiment of the present application
  • Figure 7 shows a schematic diagram of an interface for managing business indicator monitoring threshold templates according to an embodiment of the present application
  • Figures 8-10 show schematic diagrams of analysis results of at least one log analysis dimension according to the embodiment of the present application.
  • Figure 11 shows a schematic diagram of an interface for managing stress test templates according to an embodiment of the present application
  • Figure 12 shows a schematic diagram of an interface for configuring SQL parameters according to an embodiment of the present application
  • Figure 13 shows a schematic flow chart of the cloud monitoring and analysis method according to the embodiment of the present application.
  • Figure 14 shows a schematic flow chart of the cloud monitoring and analysis method according to the embodiment of the present application.
  • Figure 15 shows a schematic diagram of the output interface of the embodiment of the present application.
  • Figure 16 shows a schematic diagram of a computing device according to an embodiment of the present application.
  • FIGS 17-18 show schematic diagrams of computing device clusters according to embodiments of the present application.
  • This kind of data collection has the following characteristics: it is as non-duplicated as possible, serves multiple applications of a specific organization in an optimal way, its data structure is independent of the applications that use it (such as the tenant's business system), and it increases the number of data. , deletion, modification and retrieval can be managed and controlled uniformly.
  • the database in order to ensure system stability and data availability, can adopt a master-standby architecture, including a master node and Standby node.
  • the business system can write data to the main node of the database and perform data query through the main node.
  • the backup node only performs data backup under normal circumstances. Only when the primary node goes down, the backup node will provide read and write services to the business system.
  • Fork is a clone of a database. Cloning a database allows the freedom to experiment with changes without affecting the original database.
  • fork_DB is used to obtain the backup package of the existing network database and restore it for testing.
  • SQL Structured query language
  • SQL statement is a language that operates on the database.
  • Structured Query Language is a high-level non-procedural programming language that allows users to work on high-level data structures. This language does not require users to specify the data storage method, nor does it require users to understand the specific data storage method. Therefore, different database systems with completely different underlying structures can use the same structured query language as the interface for data input and management. SQL statements can be nested, providing great flexibility and powerful functions.
  • OBS Object Storage Service
  • OBS buckets are containers for storing objects in OBS. Each bucket has its own storage category, access rights, region and other attributes. Users can locate the bucket on the Internet through the bucket's access domain name.
  • Object is the basic unit of data storage in OBS. An object is actually a collection of file data and related attribute information, including three parts: key, metadata, and data.
  • the cloud management platform provides cloud resources to tenants (users who purchase cloud resources), including cloud services and cloud instances.
  • Cloud services such as virtual private cloud (Virtual Private Cloud, VPC) network services, gateway services, firewall services, Network Address Translation (NAT) services, cloud disks, elastic public IP (Elastic IP, EIP), cloud monitoring services and other cloud services provided by various cloud vendors, cloud instances such as virtual machines, containers or bare Metal servers, virtual machines, containers or bare metal servers are virtual instances provided by cloud vendors for tenants to use in the cloud vendor's data center.
  • the embodiments of this application do not limit the product form of cloud resources.
  • API Application programming interface gateway
  • APIG can realize the integration of various microservices, while also being client-friendly and shielding system complexity and differences.
  • SQL execution plan (explain): a description of the execution process of a SQL statement in the database.
  • Figure 1 shows a schematic diagram of the system architecture applicable to the embodiment of the present application.
  • the system architecture may include a cloud monitoring and analysis system 100 and a tenant's business system 200 .
  • the tenant's business system 200 can realize its own business by providing at least one service.
  • the cloud monitoring and analysis system 100 can be connected to the tenant's business system 200 and can provide cloud monitoring services for the tenant's business system 200 to monitor and analyze the operation of the tenant's business system 200 .
  • the business system 200 may be a cloud service system, and at least one service provided by the business system 200 may be a cloud service.
  • the embodiment of this application does not limit the implementation manner of the business system 200.
  • the tenant's business system may include a database
  • the cloud monitoring and analysis system 100 may provide cloud monitoring services for the database of the tenant's business system 200, by monitoring and analyzing the load of the database, so that when an abnormal load event occurs in the database, Quickly identify the cause of the anomaly to quickly restore normal business operations.
  • the cloud monitoring and analysis system 100 may include, for example, but is not limited to the following functional modules: registration device 110, monitoring device 120, log device 130, log analysis and processing cluster 140, SQL statement analysis device 150, replica database node 160, and cloud management platform 170.
  • the tenant's business system 200 may include but is not limited to the following functional modules: APIG 210. Service device 220, database 230, object storage service (OBS) 240 and internal management service 250.
  • OBS object storage service
  • the functional modules included in each of the cloud monitoring and analysis system 100 and the tenant's business system 200 can be based on at least one service specifically provided by the tenant's business system 200 or the cloud monitoring and analysis system 100
  • the specific cloud monitoring services provided vary, and are not limited in the embodiments of this application.
  • the functional modules included in the cloud monitoring and analysis system 100 and the tenant's business system 200 can be implemented by software or hardware. As an example, the implementation of the monitoring device 120 is introduced next.
  • the implementation of the management service 250 may refer to the implementation of the monitoring device 120 .
  • the monitoring device 120 may include code running on a computing instance.
  • the computing instance may be at least one of a physical host (computing device), a virtual machine, a container, and other computing devices. Further, the above computing device may be one or more.
  • monitoring device 120 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the application can be distributed in the same region (region) or in different regions. Multiple hosts/VMs/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs, each AZ including a data center or multiple geographical locations Close data center.
  • AZ availability zone
  • the monitoring device 120 may include at least one computing device, such as a server. Alternatively, the monitoring device 120 may also be a device implemented using an application specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex PLD (Complex PLD, CPLD), a field programmable gate array or a field programmable gate array (field programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any of them.
  • Multiple computing devices included in the monitoring device 120 may be distributed in the same region or in different regions. Multiple computing devices included in the monitoring device 120 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the monitoring device 120 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the cloud monitoring and analysis system 100 shown in Figure 1 and the tenant's business system 200 each include functional modules. The functions are introduced in detail.
  • the tenant's business system 200 may include but is not limited to the following functional modules: APIG 210, service device 220, database 230, object storage service 240, and internal management service 250.
  • APIG 210 is a communication interface between the tenant's business system 200 and the tenant's electronic equipment.
  • the tenant's electronic equipment can be connected to at least one service provided by the business system 200 through the APIG 210, and the at least one service can collaborate to realize the business.
  • the service device 220 of the tenant's business system 200 may provide at least one service to the tenant, for example, represented as service 221, service 222, service 223, etc.
  • the at least one service can interact with the main node 231 of the database 230 to store the generated service data to the main node 231 of the database 230 or read related services from the main node 231 required data.
  • the main node 231 and the backup node 232 of the database 230 can communicate and interact to back up the data saved by the main node 231 to the backup node 232.
  • the backup node 232 can replace the main node 231 and the service device 220. Interact with at least one service to provide data storage services for at least one service to support business operations.
  • the object storage service 240 can be used in conjunction with the database 230.
  • the object storage service 240 can store the backup package of the database 230. There is no need to consider capacity restrictions when using it, and it can provide a variety of storage types to choose from, which can meet the needs of tenants in various business scenarios.
  • the internal management service 250 can be connected to the APIG 210 and is used to implement internal management of the tenant's business system 200 through the APIG 210, including but not limited to recruitment, onboarding, resignation, personnel management, IT service management, etc. of the tenant enterprise.
  • Figure 1 is only an illustration of the functional modules of the business system 200 in the embodiment of the present application and is not limiting. In other embodiments, the functions required to be used in the business system 200 can be changed according to specific application scenarios or business requirements. The device or function module will not be described again here.
  • Cloud monitoring and analysis system 100
  • the cloud monitoring and analysis system 100 can be connected to the tenant's business system 200 and can provide cloud monitoring services and analysis services for the tenant's business system 200 to monitor the operation of the tenant's business system 200 and analyze the business.
  • the causes of abnormal load events of the system 200 can be quickly restored to the normal operation of the business system 200 .
  • the cloud monitoring and analysis system 100 may include but is not limited to the following functional modules: registration device 110, monitoring device 120, log device 130, log analysis processing cluster 140, SQL statement analysis device 150, replica database node 160 and Cloud management platform170.
  • the console management interface is an example and not a limitation of the communication interface provided by the cloud management platform for tenants. In other embodiments, the communication interface may have multiple variations. The implementation of this application This example does not limit this.
  • devices of the cloud monitoring and analysis system 100 other than the cloud management platform 170 may be collectively referred to as analysis devices.
  • Registration device 110 (1) Registration device 110:
  • the registration device 110 can be used to implement the management function of services (such as service 221, service 222, service 223, etc.) accessing the database 230 and the management function of the business scenario of the service, and can be used to collect and record services.
  • the service management function may include managing information of services connected to the cloud monitoring and analysis system 100, including, for example, information registration, update and deregistration of services.
  • the information that the tenant or the tenant's business system 200 needs to provide when registering a service with the registration device 110 may include but is not limited to the following information about the service: product department, service domain, service name, microservice name, accessed database, service person in charge, service Team members etc.
  • the registration device 110 can receive the updated service information from the tenant's business system 200 and store it.
  • the registration device 110 may delete the relevant registration information of the service after receiving the deregistration instruction from the business system 200 .
  • the management function of business scenarios includes management and maintenance of various business scenarios that may be involved in services, as well as the corresponding relationships between various business scenarios and the SQL templates executed in the database 230 for the business scenarios, such as one-to-one relationships, one-to-many relationships, etc. .
  • the registration device 110 can provide tenants with visual registration and management functions through a cloud management platform.
  • the tenant can register the services of the business system 200 that implement the business and need to access the database 230 to the cloud monitoring and analysis system 100 through the tenant's electronic equipment and the corresponding console management interface, so as to facilitate cloud monitoring and analysis.
  • the system 100 provides cloud monitoring services to the tenant's business system 200, it can detect abnormal load events of the database 230, quickly analyze and identify the source of concurrency or business scenarios, and implement response strategies in a timely manner to maintain stable operation of the business and reduce the cost of the database. Serve.
  • the tenant can bind the business scenario involved in the service to be registered with the SQL template to register with the cloud monitoring and analysis system 100 to access the database.
  • 230 services and the correspondence between the various business scenarios implemented by the registration service and the SQL template.
  • tenants can click the "Add" button to add a new SQL template to the business scenario, or click the "Delete” button to delete the selected SQL template.
  • the interface for adding a new SQL template as shown in Figure 3 can be suspended on the presented console management interface.
  • the tenant can provide relevant attribute configuration items on this interface. Enter or select the corresponding parameters to bind (or associate) the business scenario with the SQL template.
  • attribute configuration items available for tenant configuration may include but are not limited to: service attributes, used to indicate services associated with business scenarios, such as elastic cloud server (ECS); microservice attributes, used to indicate Indicates the microservice associated with the business scenario, such as nova; business scenario attributes are used to indicate and describe the business scenario, such as node specification (flavor) management; interface attributes are used to indicate the interface associated with the business scenario, such as querying node specifications; interface description attributes , used to describe the purpose of the interface associated with the business scenario, such as querying node specification information; related SQL operation attributes, used to indicate the SQL template, which can include SQL attributes and SQL description attributes, etc.
  • service attributes used to indicate services associated with business scenarios, such as elastic cloud server (ECS); microservice attributes, used to indicate Indicates the microservice associated with the business scenario, such as nova
  • business scenario attributes are used to indicate and describe the business scenario, such as node specification (flavor) management
  • interface attributes are used to indicate the interface associated with the business scenario, such as querying node specifications
  • Tenants can increase or decrease the related SQL operations of the newly added SQL template by clicking the plus sign "+” button or the minus sign "-” button. After the relevant attributes of the new SQL template are configured, the tenant can click the "Confirm” button to end this configuration. If the tenant abandons the configuration, it can give up by clicking the "Cancel” button. After the tenant confirms the configuration, the registered business scenario and the SQL template associated with the business scenario will be displayed on the console management interface as shown in Figure 2. In an optional implementation, the tenant can view the registration information of each registered service through the console management interface. If necessary, tenants can modify the registration information of the business scenario and the corresponding relationship between the business scenario and the SQL template through the "Modify” button. The modification interface is similar to the interface shown in Figure 3 and can be referenced here. No longer.
  • the monitoring device 120 is used to collect monitoring information (or indicator data) generated by the cloud monitoring service provided by the cloud monitoring and analysis system 100 for monitoring the tenant's business system 200 based on the preconfigured information, and analyze the Display the collected monitoring information and provide automated analysis capabilities.
  • the key indicators monitored by the cloud monitoring service for the tenant's business system 200 may include two categories: system indicators and business indicators.
  • the system indicators may include, for example, the CPU load, memory load, disk read/write (IO) load, network packet loss rate, network delay ( ping) etc.
  • Business indicators can include the total number of database links, the number of active database connections, table expansion rate, master-slave synchronization rate, number of slow SQLs per minute, etc.
  • the monitoring device 120 can provide tenants with visual configuration and management functions through a cloud management platform. Based on the interface definition of the cloud monitoring service, the tenant can configure the cloud resources and monitoring indicators that need to be monitored through its terminal equipment and corresponding console management interface, so that the monitoring device 120 can monitor the tenant's business system (for example, in real time or periodically) in real time or periodically. database) to provide automated analysis capabilities when abnormal load events (such as load surges) occur in the database, helping to quickly identify the source of concurrency or business scenarios, implement response strategies in a timely manner, maintain stable operation of the business, and reduce Database services.
  • abnormal load events such as load surges
  • abnormal load events can be detected by configuring relevant indicator monitoring thresholds for the database.
  • relevant indicators associated with the database are detected to be greater than or equal to the configured indicator threshold, a load abnormal event is considered to have occurred.
  • the “New System Indicator Monitoring Threshold Template” shown in Figure 5 can be floated on the console management interface. Similar to the configuration interface shown in Figure 3, the tenant can Enter or select the corresponding parameters in the relevant attribute configuration items provided in the interface shown in Figure 5 to configure the system indicator monitoring threshold template.
  • Relevant attribute configuration items may include, for example, threshold configuration items for key indicators such as CPU, memory, IO, network packet loss rate, and network delay.
  • the parameters of the corresponding system indicator monitoring threshold template will be displayed on the console management interface shown in Figure 4, such as template identification (Identity document, ID) (for example, 001), template Corresponding CPU load threshold (for example, 0.9), memory load threshold (for example, 0.9), IO load threshold (for example, 0.8), network packet loss rate threshold (for example, 0.2), creator (for example, Zhang San 00123456), and related operation buttons (such as binding, modification, etc.).
  • templates can click the "Modify” button associated with the system indicator monitoring threshold template to modify the relevant parameters of the template.
  • the modification interface is similar to the interface shown in Figure 5 and can be mutually referenced, so I will not go into details here.
  • the configuration interface of the business scenario of the service to be associated with the system indicator monitoring threshold template can also be suspended on the console management interface.
  • Tenants can enter or select corresponding parameters in the relevant attribute configuration items provided on this interface to associate the target objects to be monitored with the selected system indicator monitoring threshold template.
  • the related attribute configuration items may include, for example, the region, AZ, data center product (performance optimization datacenter, POD), database to which the database belongs. Name, database master node IP, etc. Similarly, tenants can enter or select corresponding parameters in each attribute configuration item for configuration.
  • the database information associated with the system indicator monitoring threshold template will be displayed on the interface shown in Figure 4, such as the region to which the database belongs (for example, North China-Beijing IV), AZ (for example, AZ1), POD (such as pod15), database name (such as gaussdb nova), database master node IP (such as 10.77.24.177), and related operation buttons (such as bind, unbind, modify), etc.
  • tenants can modify the database information by clicking the "Modify” button, or modify the template associated with the database by clicking the "Unbind” button or the "Bind” button.
  • “modify” The button can be used to modify the bound database information
  • the "Unbind” button is used to delete the database information bound to the template
  • the "Bind” button can be used to add new database information to be bound to the template. It should be understood that this is only an example without any limitation. In actual applications, tenants can adjust the relevant attribute configuration methods or attribute configuration items according to their own business systems or application requirements, which will not be described again here.
  • the console management interface shown in Figure 4 can also be used to configure the business indicator monitoring threshold template for the database (not shown in the figure).
  • the configuration interface of the business indicator monitoring threshold template As shown in Figure 7, tenants can enter or select corresponding parameters in the relevant attribute configuration items provided on this interface to configure the business indicator monitoring threshold template.
  • Relevant attribute configuration items may include, for example, the total number of links to the database, the number of active links, the table expansion rate, the master-slave synchronization rate, the number of slow SQLs per minute, etc.
  • Tenants can enter or select corresponding parameters in each attribute configuration item provided on the interface for configuration, such as the total number of links (for example, 2000), the number of active links (for example, 85), the table expansion rate (for example, 0.3), and the master-backup synchronization rate ( For example, 0.2), the number of slow SQLs per minute (for example, 100).
  • the parameters of the corresponding business indicator monitoring threshold template will be displayed on the interface shown in Figure 4 (not shown in Figure 4).
  • tenants can also modify the relevant parameters of the business indicator monitoring threshold template by clicking the "Modify" button associated with the template on the interface shown in Figure 4.
  • “Modify" button associated with the template on the interface shown in Figure 4.
  • the monitoring device 120 can collect objects to be monitored (such as databases) in the process of realizing services in the tenant's business system 200.
  • Various indicator data display the collected monitoring information and provide automated analysis capabilities.
  • the automated analysis capability refers to the ability to set thresholds for one or a group of key indicators of concern. When the value of the collected indicators exceeds the preset threshold, it is considered that the monitored object has an abnormal load event, and then the monitoring can be triggered.
  • the log analysis processing cluster 140 is called to collect SQL statistics in the time range near the moment when the load abnormal event occurs (abbreviated to the abnormal moment), and output the SQL statistical results and the SQL execution plan according to the preconfigured log analysis dimensions, so as to Quickly identify sources of concurrency or business scenarios, and promptly implement response strategies to maintain stable business operations and reduce database services.
  • the automatic analysis capability will be introduced in detail when introducing the functions of the log analysis and processing cluster 140 below, and will not be described in detail here.
  • the log device 130 is used to collect log records of the target object to be monitored (for example, the database 230).
  • the log device 130 can be connected to the log analysis and processing cluster 140, and can provide the collected log records to the log analysis and processing cluster 140, so that the log analysis and processing cluster 140 performs log analysis.
  • the log analysis processing cluster 140 is a cluster node that performs log analysis capabilities.
  • the log analysis and processing cluster 140 may include multiple nodes, for example, represented as node 01, node 02, node 03, etc. Among them, at least one node among the plurality of nodes can cooperate according to application scenarios, business requirements, etc., and collaboratively execute relevant log analysis steps.
  • the embodiments of this application do not limit the division method of execution nodes of specific log analysis steps.
  • the log analysis and processing cluster 140 can be connected to the registration device 110 to obtain the registered service information of the business system 200 from the registration device 110 .
  • the log analysis and processing cluster 140 can be connected to the monitoring device 120 to learn the object information to be monitored from the monitoring device 120 .
  • the log analysis and processing cluster 140 can be connected to the log device 130 to obtain log records to be analyzed from the log device 130 .
  • the log analysis and processing cluster 140 can determine the scope of the log records to be analyzed through the following information: region, POD, schema, database master node IP, database, log analysis time range ( Including start time, end time, etc.), SQL template, affiliated services, affiliated components, service node IP, etc.
  • the log analysis and processing cluster 140 can provide visual configuration and management functions for tenants through the cloud management platform 170 .
  • the tenant can configure the log analysis task through the tenant's electronic device and the corresponding console management interface to indicate the scope of the log records to be analyzed, so that the log analysis
  • the processing cluster 140 can automatically obtain the log records within the range from the log device 130, analyze the log records within the range, and obtain the log analysis results.
  • the log analysis processing cluster 140 can be connected to other devices and can be called by other devices to provide automated analysis capabilities for other devices. For example, taking the monitoring device 120 calling the log analysis and processing cluster 140 to perform automated analysis as an example, the monitoring device 120 can monitor relevant indicators of the service device 220 or the database 230 to determine whether an abnormal load event occurs (such as a certain or certain system). The log analysis and processing cluster 140 is called when the value of the indicator/business indicator exceeds the preset threshold). The log analysis and processing cluster 140 can learn the indicator abnormal information associated with the load abnormal event of the service device 220 or the database 230 through communication interaction with the monitoring device 120, such as abnormal indicators, abnormal time range, etc.
  • the log analysis and processing cluster 140 may learn the business scenarios registered in the cloud monitoring and analysis system 100, the services that access the database, the SQL templates associated with the business scenarios/services, etc., through communication interaction with the registration device 110.
  • the log analysis and processing cluster 140 can determine the business scenarios, services, SQL templates, etc. associated with the abnormal load event. Further, the log analysis and processing cluster 140 can obtain the log records to be analyzed from the log device 130 based on the determined business scenarios, services, SQL templates, exception time ranges, etc., and perform an automated analysis process on the obtained log records to obtain the logs. Analyze the results.
  • the log analysis and processing cluster 140 can also be triggered to execute the log analysis and processing function in other ways, which will not be described again here. .
  • the log analysis and processing cluster 140 can provide visual configuration and management functions for tenants through the cloud management platform 170 .
  • the tenant can configure the log analysis and processing capabilities of the log analysis and processing cluster 140 through its terminal device and the corresponding console management interface, including but not limited to configuring the log analysis dimensions and the number of SQL to be analyzed. wait.
  • the cloud management platform 170 can also provide tenants with a visual display interface on which log analysis results can be presented.
  • the at least one log analysis dimension may include: the execution time-consuming dimension of a single SQL, the total execution time-consuming dimension of a single category of SQL, the SQL execution business proportion dimension, the SQL connection host proportion dimension, and the SQL delay distribution dimension. Or the SQL query rate per second (Queries-per-second, QPS) dimension.
  • the log analysis dimensions of the tenant's custom configuration can be shown in Table 1 below:
  • the "View Type” column represents the statistical results of each log analysis dimension, including but not limited to: single SQL execution time-consuming sorting statistics; a certain type of SQL execution total time-consuming sorting; a certain type of SQL execution frequency sorting; SQL, business proportion Statistics; SQL and host proportion statistics; SQL delay distribution statistics; SQL QPS statistics.
  • the "Detailed Description” column is used to describe the corresponding log analysis dimensions involved Indicators related to SQL execution operations, including but not limited to SQL execution time (including total time, average time, shortest time, longest time, etc.), SQL-related business scenarios, SQL templates, SQL quantity, SQL Execution time etc.
  • the "Presentation Type” column is used to represent the presentation method of relevant statistical results, including but not limited to tables, pie charts, bar charts, line charts, etc., which are not limited in the embodiment of this application.
  • the log analysis dimensions can be input by the tenant through the visual configuration interface provided by the cloud management platform 170.
  • the configuration method is the same or similar to the configuration method described above in conjunction with Figures 2-7. For detailed implementation, see The relevant descriptions mentioned above will not be repeated here.
  • the at least one log analysis dimension can be changed according to changes in business systems or application requirements, which will not be described again here.
  • Range information such as region (for example, North China-Beijing 4), POD (for example, pod15), schema (for example, standard schema), database (for example, gaussdb_nova), database master node IP (for example, 10.77.24.177), log analysis Time range (including start time, end time, last 30 minutes, etc.), SQL template (not fully shown in the figure), belonging service (such as ECS), belonging component (such as nova), service node IP (such as service source IP, Not fully shown in the figure) etc.
  • region for example, North China-Beijing 4
  • POD for example, pod15
  • schema for example, standard schema
  • database for example, gaussdb_nova
  • database master node IP for example, 10.77.24.177
  • log analysis Time range including start time, end time, last 30 minutes, etc.
  • SQL template not fully shown in the figure
  • belonging service such as ECS
  • belonging component such as nova
  • service node IP such as service source IP, Not fully shown in the figure
  • the analysis and statistics results of a single SQL execution within the scope can be presented in the form of a list, including, for example, SQL ID, SQL template, business scenario, total time spent, number of SQL executions, shortest time spent (ms), longest time spent (ms), ms), average time taken (ms), etc.
  • Information such as region (such as North China-Beijing IV), POD (such as pod15), schema (such as standard schema), database (such as gaussdb_nova), database master node IP (such as 10.77.24.177), log analysis time Scope (including start time, end time, last 30 minutes, etc.), SQL template (not fully shown in the figure), belonging service (such as ECS), belonging component (such as nova), service node IP (such as service source IP, Figure (not fully shown in ), etc.
  • region such as North China-Beijing IV
  • POD such as pod15
  • schema such as standard schema
  • database such as gaussdb_nova
  • database master node IP such as 10.77.24.177
  • log analysis time Scope including start time, end time, last 30 minutes, etc.
  • SQL template not fully shown in the figure
  • belonging service such as ECS
  • belonging component such as nova
  • service node IP such as service source IP, Figure (not fully shown in ), etc.
  • the statistical results of the proportion of SQL execution services and the proportion of SQL connection IPs involved in this scope can be presented in the form of a pie chart.
  • the statistical results of SQL execution business proportion can include ECS business accounting for 56%, EVS business accounting for 20%, laasdeploy business accounting for 14%, and VPC business accounting for 10%.
  • the statistical results of the proportion of SQL connection IPs can include ECS: 26.22.240.32 70%, EVS: 26.22.240.15 15%, laasdeploy26.22.240.21 10%, VPC: 26.22.240.23 5%, etc.
  • information representing the scope of the analyzed log records can be presented.
  • region such as North China-Beijing IV
  • POD such as pod15
  • schema such as standard schema
  • database such as gaussdb_nova
  • database master node IP such as 10.77.24.177
  • log analysis time range including Start time, end time, last 30 minutes, etc.
  • SQL template not completely shown in the figure
  • belonging service such as ECS
  • belonging component such as nova
  • service node IP such as service source IP, not fully shown in the figure
  • the interface diagrams shown in Figures 8 to 10 above are only examples of how log analysis results are presented and are not limiting in any way.
  • the tenant or the cloud monitoring and analysis system 100 can change the log analysis dimensions, the indicators involved in each log analysis dimension, and the presentation methods of different analysis results according to application scenarios or business requirements, which will not be described again here.
  • the log analysis and processing cluster 140 can periodically analyze the log records collected by the log device 130 and display the analysis results.
  • the log analysis processing cluster 140 may analyze the log records specified by the log analysis task and display the analysis results.
  • the log analysis and processing cluster 140 can automatically trigger the analysis of log records in the time range near the abnormal time and display the analysis results when the monitoring device 120 detects an abnormal load event of the business system.
  • the embodiments of this application do not limit the timing or triggering method of log analysis.
  • the SQL statement analysis device 150 can provide online SQL analysis and testing capabilities.
  • the SQL statement analysis device 150 can be connected to the log analysis and processing cluster 140. After the log analysis and processing cluster 140 analyzes the SQL that needs to be optimized, it can obtain the SQL execution plan to clarify the optimization direction.
  • the SQL statement analysis device 150 can have SQL analysis and testing functional components, background pressure structure management functional components and data management functional components.
  • the SQL analysis and testing functional component, the background pressure structure management functional component and the data management functional component can provide tenants with visual configuration and management functions through the cloud management platform 170 .
  • the tenant Based on the interface definition of the SQL statement analysis function, the tenant can provide configuration information to the SQL analysis and testing functional component, background pressure structure management functional component and data management functional component through its terminal device and corresponding console management interface.
  • the tenant can provide SQL optimized attribute configuration items to the SQL analysis and testing functional component through the cloud management platform 170, or input background stress policies (including stress test templates) to the background stress structure management functional component, or provide the data management functional component with Data synchronization command, data synchronization cycle, etc.
  • the data management functional component is used to obtain and restore the backup data of the live network of the business system 200 from OBS to the fork database node 160 for SQL analysis and testing.
  • the default policy of the data management functional component can perform data synchronization once a week, and can also support manual triggering of data synchronization, which is not limited in the embodiments of this application.
  • the SQL analysis and testing functional component is responsible for executing the SQL that needs to be analyzed on the replica database node 160, analyzing the execution plan of the SQL statement, and rendering the analysis results of the SQL execution plan.
  • the functions provided by the SQL analysis and testing functional component may include, for example: 1 Testing the time taken to execute SQL concurrently. 2. Analyze the SQL execution plan under background pressure (optional setting) and render the analysis results.
  • the background pressure structure management functional component is used to manage and construct various load scenarios of the database, such as concurrent queries, concurrent updates, etc., and is responsible for the management of background pressure patterns and the injection of background pressure parameters.
  • the dimension information that needs to be provided for setting the background pressure mode may include but is not limited to at least one of the following information: pressure mode name, service name, database name, number of concurrent SQL executions, or concurrent SQL execution.
  • the interface shown in Figure 11 will be suspended on the console management interface.
  • the stress test template may include, for example, the schema name, service, service name, database master node IP, and related stress test parameters (including parameters and stress values) involved in the background stress policy.
  • the interface shown in Figure 12 can be suspended on the console management interface.
  • the relevant attribute configuration items of this interface can be, for example, Including database name, database master node IP, pressure mode options, SQL statement input box, etc.
  • Tenants can enter or select corresponding parameters in each attribute configuration item.
  • the cloud management platform 170 will issue the parameters entered or selected by the tenant on this interface to the replica database node, so that SQL can Online analysis and testing under real data improves analysis efficiency.
  • Figures 11 and 12 are only examples of the functional configuration of the SQL statement analysis device 150 in the embodiment of the present application and are not limiting in any way.
  • the console management interface can realize linkage between the SQL statement analysis device 150, the registration device 110, the monitoring device 120, the log device 130, and the log analysis processing cluster 140, and support automated and customized SQL analysis capabilities.
  • SQL statement analysis can be automatically executed based on the relevant information of the registration device 110, the monitoring device 120, the log device 130, and the log analysis and processing cluster 140, so as to reduce the operation and maintenance burden of the cloud monitoring and analysis system 100.
  • tenants can configure various functional modules of the cloud monitoring and analysis system 100 at the stage of registering business scenarios or services of the business system, including but not limited to configuring monitoring indicators, monitoring indicator thresholds, log analysis dimensions, and business scenarios. /Service-associated SQL template, SQL template-associated background pressure policy, etc.
  • the monitoring device 120 may notify the log analysis and processing cluster 140 .
  • the log analysis and processing cluster 140 can automatically execute a log analysis process in response to abnormal load events to sort out SQL statements that need optimization.
  • the log analysis and processing cluster 140 performs log analysis to sort out the SQL statements that need to be optimized, it can provide the SQL statements that need to be optimized to the SQL statement analysis device 150 by calling the relevant console management interface.
  • the SQL statement analysis device 150 can automatically execute the SQL statements.
  • the cloud management platform 170 can provide a visual interface to display the execution results of the SQL statement and the rendered execution plan.
  • the replica database node 160 is used to restore the backup data of the existing network of the database 230 of the business system 200 for SQL statement analysis and testing.
  • Cloud management platform 170
  • the cloud management platform 170 can provide customized communication for tenants by providing relevant console management interfaces. way, so that tenants can configure the functions of each functional module of the cloud monitoring and analysis system 100 through the console management interface, so that each functional module of the cloud monitoring and analysis system 100 can coordinate monitoring and automation according to the tenant's monitoring and analysis requirements. Analyze and provide feedback to tenants.
  • the console management interface can provide customized channels for tenants by providing a visual interface, so that tenants can configure functions and view relevant information on the relevant interfaces.
  • the console management interface is only an example of the configuration method of the embodiment of the present application and is not limiting in any way.
  • the cloud monitoring and analysis system 100 may also provide tenants with a custom configuration channel based on an API format.
  • the cloud monitoring and analysis system 100 can display the API format on a web page provided by the Internet and indicate the usage of the corresponding fields. After seeing the corresponding API format, the tenant enters the corresponding parameters according to the API format to complete the configuration.
  • the tenant's electronic device can send the API with input parameters to the cloud monitoring and analysis system 100 in the form of a template through the Internet.
  • the cloud monitoring and analysis system 100 detects the parameters corresponding to different fields in the API, thereby obtaining the tenant's response to the different fields of the API.
  • the tenant's configuration information for the cloud monitoring and analysis system 100 may also include API fields and parameters input by the tenant. Furthermore, the cloud monitoring and analysis system 100 may store the tenant's configuration information in the corresponding memory. in order to complete the functional configuration of the relevant functional modules of the cloud monitoring and analysis system 100 by obtaining configuration information from the memory when necessary, so as to provide cloud monitoring services for the tenant's business system 200.
  • the cloud monitoring and analysis method can include the following steps:
  • the cloud management platform obtains the configuration information entered or selected by the tenant on the cloud management platform.
  • the configuration information is used to represent the tenant's monitoring and analysis requirements for the database.
  • the database is used to store the business of the tenant's business system. data.
  • the configuration information may be used to indicate at least one monitoring indicator associated with the abnormal load event and an indicator threshold corresponding to the monitoring indicator.
  • the monitoring indicators include system indicators and/or business indicators, and the system indicators include at least one of the following: CPU load, memory load, disk read/write (IO) load, network packet loss rate, or Network delay; the business indicators include at least one of the following: total number of database links, number of active database connections, table expansion rate, master-slave synchronization rate, and the number of slow SQLs per minute.
  • the configuration information may be used to indicate at least one log analysis dimension of the database.
  • the at least one log analysis dimension may include, for example: the execution time-consuming dimension of a single SQL, the total execution time-consuming dimension of a single category of SQL, the SQL execution business proportion dimension, the SQL connection host proportion dimension, the SQL delay distribution dimension, or the SQL The query rate per second QPS dimension.
  • the configuration information may be used to indicate information describing the load scenario of the database.
  • the information describing the load scenario of the database includes, for example, at least one of the following: pressure mode name, service name, database name, concurrent number of SQL executions, or concurrently executed SQLs.
  • the cloud management platform can refer to the interface diagrams shown in Figures 2 to 12 to receive relevant configuration information input or selected by tenants on the cloud management platform.
  • the cloud management platform can refer to the interface diagrams shown in Figures 2 to 12 to receive relevant configuration information input or selected by tenants on the cloud management platform.
  • the analysis device analyzes the target structured query language SQL statement to be optimized according to the configuration information, and obtains an analysis report.
  • the target SQL statement is a statement that has historically operated on the database.
  • the analysis device is a back-end device connected to the cloud management platform, which may include all devices in the cloud monitoring and analysis system 100 in FIG. 1 except the cloud management platform 170 .
  • the analysis device may receive the target SQL statement input or selected by the tenant on the cloud management platform from the cloud management platform.
  • the analysis device can obtain the target SQL statement according to an internal preset method.
  • the analysis device can analyze the database from the at least one log analysis dimension when an abnormal load occurs in the database. Perform log analysis on the target log, and determine the target SQL statement to be optimized based on the analysis results of the target log in the at least one log analysis dimension.
  • the embodiment of this application does not limit the acquisition method of the target SQL statement.
  • the cloud monitoring and analysis method can include the following steps:
  • the tenant's terminal device can send configuration information to the monitoring device through the cloud management platform.
  • the configuration information can be used, for example, to configure at least one monitoring indicator associated with the load abnormal event and the indicator threshold corresponding to the monitoring indicator.
  • the monitoring indicators may include, for example, system indicators and/or business indicators.
  • the system indicators include at least one of the following: CPU load, memory load, disk read/write (IO) load, and network packet loss rate of the computing device to which the service that accesses the database belongs. Or network delay;
  • the business indicators include at least one of the following: total number of database links, number of active database connections, table expansion rate, master-slave synchronization rate, and the number of slow SQLs per minute. Configuration For details, please refer to the relevant introduction in the previous article and will not be repeated here.
  • the tenant's terminal device can send configuration information to the log analysis processing cluster through the cloud management platform.
  • the configuration information can be used, for example, to configure at least one log analysis dimension of the database.
  • the at least one log analysis dimension may include, for example: the execution time-consuming dimension of a single SQL, the total execution time-consuming dimension of a single category of SQL, the SQL execution business proportion dimension, the SQL connection host proportion dimension, the SQL delay distribution dimension, or the SQL Query rate per second QPS dimension.
  • the configuration details please refer to the previous introduction and will not be repeated here.
  • the tenant's terminal device can send configuration information to the SQL analysis and testing functional component through the cloud management platform.
  • This configuration information can be used, for example, to configure the number of SQL to be analyzed and/or the load background parameters of the database.
  • the SQL analysis and testing function component can deliver (or inject) the load background parameters to the replica database node.
  • the replica database node or SQL analysis and testing functional component can also feed back the load background parameter injection response information to the cloud management platform.
  • the monitoring device obtains index data of at least one monitoring index, and determines that the abnormal load event occurs in the database when the index data satisfies the preset analysis trigger condition (for example, the index data is greater than or equal to the corresponding index threshold).
  • S1404 may be replaced by: the tenant's terminal device sends instruction information to the SQL analysis and testing functional component through the cloud management platform, where the instruction information is used to indicate the target SQL statement to be optimized. That is, the online analysis process of SQL can be triggered manually, and the embodiment of the present application does not limit this triggering method.
  • the monitoring device indicates the range of log records to be analyzed, and notifies the log analysis and processing cluster to perform log analysis.
  • the log analysis and processing cluster can obtain the target log of the database, and perform log analysis on the target log of the database from at least one log analysis dimension. According to the analysis result of the at least one log analysis dimension, the target log can be sent to Monitoring device feedback log analysis results.
  • S1408 The cloned database node restores the backup data package of the database of the tenant's business system, and after the cloned database node executes the target SQL statement, feeds back the SQL analysis results to the monitoring device.
  • the monitoring device generates an analysis report based on its own information, information from the log analysis and processing cluster, SQL analysis and testing functional components, or replica database nodes.
  • S1410 (optional step): The tenant's terminal device requests to download the analysis report through the cloud management platform.
  • S1411 (optional step): The monitoring device feeds back the analysis report to the cloud management platform.
  • the cloud management platform can output analysis reports.
  • the output interface of the analysis report can be as shown in Figure 15, presenting at least one report in a list.
  • the inspection list of each analysis report can include the identification (for example, serial number), type, inspection results, associated with the analysis report. Report creation time, report start and end time, report progress, triggering method and related operations (such as viewing, email, modification), etc.
  • the tenant can delete the analysis report by clicking the "Delete” button, or download the analysis report by clicking the "Export” button.
  • the tenant selects a specific analysis report he can click the "View", "Email” or “Modify” button associated with the analysis report to view, create a new email or modify the analysis report accordingly.
  • the analysis report may include, for example, the analysis results of at least one log analysis dimension of the database; the indicator data of at least one monitoring indicator of the database; the target SQL statement; the target SQL statement Execution plan; rendering of the execution plan of the target SQL statement.
  • the content of the analysis report can be presented in at least one of the following presentation methods: progress bar, percentage, pie chart, list, line chart, or dashboard.
  • the cloud monitoring and analysis system can automatically detect the abnormal load events of the database after obtaining the relevant configuration information of the tenant, and when an abnormal load event occurs in the database, based on the preconfigured information, automatically Properly conduct statistics and analysis on the target logs of the database to quickly identify the source of concurrency or business scenarios, implement response strategies in a timely manner, maintain the stable operation of the business and reduce database services.
  • computing device 1600 includes: bus 1602, processor 1604, memory 1606, and communication interface 1608.
  • the processor 1604, the memory 1606 and the communication interface 1608 communicate through a bus 1602.
  • Computing device 1600 may be a server or terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1600.
  • Bus 1602 may be a peripheral component interconnect (PCI) bus or an expansion interface.
  • PCI peripheral component interconnect
  • Industry standard architecture extended industry standard architecture, EISA
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 16, but it does not mean that there is only one bus or one type of bus.
  • Bus 1604 may include a path that carries information between various components of computing device 1600 (eg, memory 1606, processor 1604, communications interface 1608).
  • the processor 1604 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1606 may include volatile memory, such as random access memory (RAM).
  • the processor 1604 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (SSD). drive, SSD).
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 1606 stores executable program codes
  • the processor 1604 executes the executable program codes to respectively realize the functions of the devices included in the foregoing business system or to realize the functions of the devices included in the foregoing cloud monitoring and analysis system, thereby achieving Cloud monitoring and analysis method according to the embodiment of this application. That is, instructions for executing cloud monitoring and analysis methods are stored on the memory 1606 .
  • the communication interface 1608 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1600 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 1600.
  • the same instructions for performing cloud monitoring and analysis methods may be stored in memory 1606 in one or more computing devices 1600 in a cluster of computing devices.
  • the memory 1606 of one or more computing devices 1600 in the computing device cluster may also store part of the instructions for executing the cloud monitoring and analysis method respectively.
  • a combination of one or more computing devices 1600 may collectively execute instructions for performing cloud monitoring and analysis methods.
  • the memory 1606 in different computing devices 1600 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the cloud management platform or the analysis device. That is, the instructions stored in the memory 1606 in different computing devices 1600 can implement the functions of one or more modules in the cloud management platform or analysis device mentioned above.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 18 shows a possible implementation. As shown in Figure 18, two computing devices 1600A and 1600B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • the memory 1606 in the computing device 1600A stores instructions for performing the functions of the analysis device.
  • memory 1606 in computing device 1600B stores instructions for performing the functions of the analysis device.
  • connection method between computing device clusters shown in Figure 18 can be: Considering that the cloud monitoring and analysis method provided by this application requires multiple computing devices, such as a large amount of data storage and analysis calculations, it is considered that the functions of the analysis device are handed over to the computing device Device 1600A executes.
  • computing device 1600A shown in FIG. 18 may also be performed by multiple computing devices 1600.
  • the functions of computing device 1600B may also be performed by multiple computing devices 1600.
  • the memory 1606 in different computing devices 1600 in the computing device cluster can store different instructions for executing some functions of the cloud monitoring and analysis system. That is, the instructions stored in the memory 1606 in different computing devices 1600 can implement the functions of one or more devices in the cloud monitoring and analysis system.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product when run on at least one computing device, causes at least one computing device to perform the cloud monitoring and analysis method.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (Such as solid state drive) etc.
  • the computer-readable storage medium includes instructions that instruct a computing device to perform a cloud monitoring and analysis method, or instruct a computing device to perform a cloud monitoring and analysis method.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects.
  • the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Abstract

一种云监控和分析方法、系统,涉及云技术领域。该方法可以包括:获取租户在云管理平台输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据;根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。该方法有助于快速识别出导致数据库发生负载异常事件的原因,以快速恢复业务正常运行。

Description

一种云监控和分析方法、系统
相关申请的交叉引用
本申请要求在2022年06月30日提交中国专利局、申请号为202210771078.8、申请名称为“一种基于云技术的服务提供方法和云管理平台”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2022年09月26日提交中国专利局、申请号为202211172568.2、申请名称为“一种云监控和分析方法、系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云技术领域,特别涉及一种云监控和分析方法、系统。
背景技术
随着云的规模不断扩大,承载的业务量和业务复杂度不断增加,对支撑业务运行的数据库的要求也不断提高。这里的要求不止体现在性能方面,还体现在对数据库配套的运维能力的方面,特别是当数据库存在性能瓶颈,且业务架构暂时还无法快速进行调整(例如分库、分表)时,一旦出现数据库的负载冲高的情况,快速识别出并发的来源或者业务场景,并及时执行应对策略,对维护业务的稳定运行和降低数据库的服务就变得极为重要。
目前已有的监控、分析手段相对比较简单,存在监控功能与分析功能相割裂、分析的维度不够全面等问题,因此,需要一套端到端的监控和分析系统,以便快速识别出导致数据库发生负载异常事件的原因,快速恢复业务正常运行。
发明内容
本申请实施例提供一种云监控和分析方法、系统,有助于快速识别出导致数据库发生负载异常事件的原因,以快速恢复业务正常运行。
第一方面,本申请实施例提供了一种云监控和分析方法,该方法可由云监控和分析系统实现,该方法可以包括:获取租户在云管理平台输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据;根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。
通过上述方法,云监控和分析系统通过提供一种新型的服务形式,来对数据库进行云监控和分析,以快速识别出导致数据库发生负载异常事件的原因,快速恢复业务正常运行。
结合第一方面,在一种可能的设计中,所述方法还包括:接收租户在所述云管理平台输入或选择的所述目标SQL语句。
通过方式方法,该云监控和分析系统支持租户自定义输入待优化的目标SQL语句,有助于扩展本申请实施例的云监控和分析系统的应用场景。
结合第一方面,在一种可能的设计中,所述根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,包括:在所述数据库发生负载异常事件时,根据所述配置信息对待优化的目标SQL语句进行分析。
通过上述方法,云监控和分析系统可以向数据库提供云监控服务,以检测数据库的负载异常事件。
结合第一方面,在一种可能的设计中,所述配置信息用于指示所述负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值,所述方法还包括:获取所述至少一个监控指标的指标数据;在所述指标数据大于或等于对应的指标阈值时,确定所述数据库发生所述负载异常事件。
通过上述方法,云监控和分析系统可以接收租户自定义输入或选择的监控指标,并基于该监控指标进行数据收集和检测,确定是否发生负载异常事件。
结合第一方面,在一种可能的设计中,所述监控指标包括系统指标和/或业务指标,所述系统指标 以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。
应理解,此处仅是对监控指标的示例说明而非任何限定,在其它实施例中,还可以根据业务系统或者应用需求对监控指标进行调整,在此不再赘述。
结合第一方面,在一种可能的设计中,所述配置信息用于指示所述数据库的至少一个日志分析维度,所述方法还包括:从所述至少一个日志分析维度对所述数据库的目标日志进行日志分析;根据所述目标日志在所述至少一个日志分析维度的分析结果,确定待优化的目标SQL语句。
结合第一方面,在一种可能的设计中,所述至少一个日志分析维度包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。
结合第一方面,在一种可能的设计中,所述根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,包括:在复刻数据库节点恢复所述数据库的备份数据包;在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
结合第一方面,在一种可能的设计中,所述配置信息还用于指示描述所述数据库的负载场景的信息,所述在所述复刻数据库节点执行所述目标SQL语句,获得分析报告,包括:根据所述配置信息在所述复刻数据库节点构造所述负载场景后,在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
通过上述方法,云监控和分析系统可以向用户提供自定义的配置通道,使得租户可以基于该通道自定义输入背景参数,使得云监控和分析系统能够构造压力背景,并在所构造的压力背景下执行SQL语句,提升云监控和分析系统的分析效率。
结合第一方面,在一种可能的设计中,描述所述数据库的负载场景的信息包括以下至少一项:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
结合第一方面,在一种可能的设计中,所述分析报告包括以下至少一项:所述数据库的至少一个日志分析维度的分析结果;所述数据库的至少一个监控指标的指标数据;所述目标SQL语句;所述目标SQL语句的执行计划;或者,所述目标SQL语句的执行计划渲染图。
结合第一方面,在一种可能的设计中,所述分析报告的内容采用以下至少一种展现方式呈现:进度条、百分比、饼状图、列表、折线图或者仪表盘。
第二方面,本申请实施例提供了一种云监控和分析系统,包括:云管理平台,用于接收租户输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据;分析装置,用于根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。
结合第二方面,在一种可能的设计中,所述云管理平台用于接收租户在所述云管理平台输入或选择的所述目标SQL语句。
结合第二方面,在一种可能的设计中,所述分析装置用于:在所述数据库发生负载异常事件时,根据所述配置信息对待优化的目标SQL语句进行分析。
结合第二方面,在一种可能的设计中,所述配置信息用于指示所述负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值,所述分析装置还用于:获取所述至少一个监控指标的指标数据;在所述指标数据大于或等于对应的指标阈值时,确定所述数据库发生所述负载异常事件。
结合第二方面,在一种可能的设计中,所述监控指标包括系统指标和/或业务指标,所述系统指标以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。
结合第二方面,在一种可能的设计中,所述配置信息用于指示所述数据库的至少一个日志分析维度,所述分析装置还用于:从所述至少一个日志分析维度对所述数据库的目标日志进行日志分析;根据所述目标日志在所述至少一个日志分析维度的分析结果,确定待优化的目标SQL语句。
结合第二方面,在一种可能的设计中,所述至少一个日志分析维度包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。
结合第二方面,在一种可能的设计中,所述分析装置用于:在复刻数据库节点恢复所述数据库的备份数据包;在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
结合第二方面,在一种可能的设计中,所述配置信息还用于指示描述所述数据库的负载场景的信息,所述分析装置用于:根据所述配置信息在所述复刻数据库节点构造所述负载场景后,在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
结合第二方面,在一种可能的设计中,描述所述数据库的负载场景的信息包括以下至少一项:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
结合第二方面,在一种可能的设计中,所述分析报告包括以下至少一项:所述数据库的至少一个日志分析维度的分析结果;所述数据库的至少一个监控指标的指标数据;所述目标SQL语句;所述目标SQL语句的执行计划;或者,所述目标SQL语句的执行计划渲染图。
结合第二方面,在一种可能的设计中,所述分析报告的内容采用以下至少一种展现方式呈现:进度条、百分比、饼状图、列表、折线图或者仪表盘。
第三方面,本申请实施例提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行上述第一方面或第一方面的任一可能设计所述的方法。
第四方面,本申请实施例提供了一种包含指令的计算机程序产品,当所述指令被计算设备集群运行时,使得所述计算设备集群执行上述第一方面或第一方面的任一可能设计所述的方法。
第五方面,本申请实施例提供了一种计算机可读存储介质,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行上述第一方面或第一方面的任一可能设计所述的方法。
本申请在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。
附图说明
图1示出了本申请实施例适用的系统架构的示意图;
图2示出了本申请实施例的用于服务注册的管理界面的示意图;
图3示出了本申请实施例的用于管理SQL模板的界面的示意图;
图4示出了本申请实施例的用于配置监控指标的管理界面的示意图;
图5示出了本申请实施例的用于管理系统指标监控阈值模板的界面的示意图;
图6示出了本申请实施例的用于管理数据库的界面的示意图;
图7示出了本申请实施例的用于管理业务指标监控阈值模板的界面的示意图;
图8-图10示出了本申请实施例的至少一个日志分析维度的分析结果的示意图;
图11示出了本申请实施例的用于管理压力测试模板界面的示意图;
图12示出了本申请实施例的用于配置SQL参数的界面的示意图;
图13示出了本申请实施例的云监控和分析方法的流程示意图;
图14示出了本申请实施例的云监控和分析方法的流程示意图;
图15示出了本申请实施例的输出界面的示意图;
图16示出了本申请实施例的计算设备的示意图;
图17-图18示出了本申请实施例的计算设备集群的示意图。
具体实施方式
为了便于理解本申请实施例的技术方案,先对本申请实施例涉及的部分用语进行介绍。
1、数据库(data base,DB):
是依照某种数据模型组织起来并存放二级存储器中的数据集合。这种数据集合具有如下特点:尽可能不重复,以最优方式为某个特定组织的多种应用服务,其数据结构独立于使用它的应用程序(例如租户的业务系统),对数据的增、删、改和检索可统一进行管理和控制。
本申请实施例中,为保障系统稳定性以及数据的可用性,数据库可以采用主备架构,包括主节点和 备节点。业务系统可以向数据库的主节点写入数据,并通过主节点进行数据查询。备节点在正常情况下只是做数据备份,只有当主节点宕机时,备节点才会对业务系统提供读写服务。
2、复刻数据库(fork_DB):
fork是对一个数据库的克隆。克隆一个数据库允许自由试验各种改变而不影响原始数据库。
本申请实施例中,fork_DB用于获取现网数据库的备份包并恢复,以供测试使用。
3、结构化查询语言(structured query language,SQL):
是一种数据库查询和程序设计语言,用于存取数据以及查询、更新和管理关系数据库系统。SQL语句是对数据库进行操作的一种语言。
其中,结构化查询语言是高级的非过程化编程语言,允许用户在高层数据结构上工作。该语言不要求用户指定对数据的存放方法,也不需要用户了解具体的数据存放方式,所以具有完全不同底层结构的不同数据库系统,可以使用相同的结构化查询语言作为数据输入与管理的接口。SQL语句可以嵌套,具有极大的灵活性和强大的功能。
4、对象存储服务(Object Storage Service,OBS):
是一个基于对象的海量存储服务,可以为用户提供海量、安全、高可靠、低成本的数据存储能力。
OBS的基本组成是OBS桶和对象。OBS桶是OBS中存储对象的容器,每个桶都有自己的存储类别、访问权限、所属区域等属性,用户在互联网上通过桶的访问域名来定位桶。对象是OBS中数据存储的基本单位,一个对象实际是一个文件的数据与其相关属性信息的集合体,包括键值(Key)、元数据(Metadata)、数据(Data)三部分。
5、云资源:
云管理平台为租户(购买云资源的用户)提供的云上资源,包括云服务和云实例,云服务例如为虚拟私有云(Virtual Private Cloud,VPC)网络提供服务、网关提供服务、防火墙服务、网络地址转换(Network Address Translation,NAT)服务、云盘、弹性公网IP(Elastic IP,EIP)、云监控服务以及其他各种云厂商提供的云服务,云实例例如为虚拟机、容器或裸金属服务器,虚拟机、容器或裸金属服务器均为云厂商在云厂商的数据中心提供给租户使用的虚拟实例。本申请实施例对云资源的产品形态不做限定。
6、应用程序编程接口网关(application programming interface gateway,APIG):
是微服务架构中一个非常通用的模式,APIG作为系统的统一入口,可以实现各个微服务间的整合,同时又可以做到对客户端友好、屏蔽系统复杂性和差异性。
7、云日志服务(cloud log service,CLS):
提供一站式的日志数据解决方案,可以享受从日志采集、日志存储到日志内容搜索、统计分析等全方位稳定可靠的日志服务,帮助解决业务问题定位,指标监控,安全审计等日志问题,有助于降低日志运维门槛。
8、SQL执行计划(explain):对一条SQL语句在数据库中执行过程的描述。
用户可以通过EXPLAIN命令查看优化器针对给定SQL生成的逻辑执行计划。如果要分析某条SQL的性能问题,通常需要先查看SQL的执行计划,排查每一步SQL执行是否存在问题。所以读懂执行计划是SQL优化的先决条件,而了解执行计划的算子是理解EXPLAIN命令的关键。
下面结合附图及实施例详细介绍本申请。
图1示出了本申请实施例适用的系统架构的示意图。
参阅图1所示,该系统架构中可以包括云监控和分析系统100以及租户的业务系统200。
其中,租户的业务系统200可以通过提供至少一项服务,来实现自身的业务。该云监控和分析系统100可以与租户的业务系统200的连接,并可以为租户的业务系统200提供云监控服务,以监控和分析租户的业务系统200的运行情况。一种可选的实现方式中,该业务系统200可以是云服务系统,业务系统200所提供的至少一项服务可以是云服务,本申请实施例对业务系统200的实现方式不做限定。
示例地,租户的业务系统可以包括数据库,该云监控和分析系统100可以面向租户的业务系统200的数据库提供云监控服务,通过监控和分析数据库的负载情况,以在数据库出现负载异常事件时,快速识别出异常原因,以快速恢复业务正常运行。该云监控和分析系统100例如可以包括但不限于以下功能模块:注册装置110、监控装置120、日志装置130、日志分析处理集群140、SQL语句分析装置150、复刻数据库节点160以及云管理平台170。租户的业务系统200可以包括但不限于以下功能模块:APIG 210、服务装置220、数据库230、对象存储服务(OBS)240以及内部管理服务250。
需要说明的是,本申请实施例中,云监控和分析系统100以及租户的业务系统200各自包含的功能模块,可以根据租户的业务系统200具体提供的至少一项服务或者云监控和分析系统100具体提供的云监控服务的不同而有所不同,本申请实施例对此不做限定。并且,云监控和分析系统100以及租户的业务系统200各自包含的功能模块,均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来介绍监控装置120装置的实现方式。类似的,注册装置110、日志装置130、日志分析处理集群140、SQL语句分析装置150、复刻数据库节点160以及云管理平台170、APIG 210、服务装置220、数据库230、对象存储服务240以及内部管理服务250的实现方式可以参考监控装置120的实现方式。
模块作为软件功能单元的一种举例,监控装置120可以包括运行在计算实例上的代码。其中,计算实例可以是物理主机(计算设备)、虚拟机、容器等计算设备中的至少一种。进一步地,上述计算设备可以是一台或者多台。例如,监控装置120可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该应用程序的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zones,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内。同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。模块作为硬件功能单元的一种举例,监控装置120可以包括至少一个计算设备,如服务器等。或者,监控装置120也可以是利用专用集成电路(application specific integrated circuit,ASIC)实现、或可编辑逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂型PLD(Complex PLD,CPLD)、场式可编程闸数组或现场可编程闸数组(field programmable gate array,FPGA)、通用数组逻辑(generic array logic,GAL)或其任意组合实现。监控装置120包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。监控装置120包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,监控装置120包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
为了便于理解,下面以面向业务系统的数据库提供云监控服务为例,结合附图及实施例,对图1所示的云监控和分析系统100以及租户的业务系统200各自包含的各个功能模块的功能进行详细介绍。
1、租户的业务系统200:
本申请实施例中,租户的业务系统200可以包括但不限于以下功能模块:APIG 210、服务装置220、数据库230、对象存储服务240以及内部管理服务250。
其中,APIG 210为租户的业务系统200与租户的电子设备的通信接口,租户的电子设备可以通过APIG 210与业务系统200提供的至少一项服务连接,该至少一项服务可以协同实现业务。
租户的业务系统200的服务装置220可以向租户提供至少一项服务,例如表示为服务221、服务222、服务223等。
在该至少一项服务实现业务的过程中,该至少一项服务可以数据库230的主节点231交互,以将产生的服务数据存储至数据库230的主节点231,或者从主节点231读取相关服务所需的数据。数据库230的主节点231与备节点232之间可以进行通信交互,以将主节点231保存的数据备份至备节点232,当主节点231出现故障时,备节点232可以代替主节点231与服务装置220的至少一项服务交互,来为该至少一项服务提供数据存储服务,以支撑业务运行。
对象存储服务240可以与数据库230结合使用,对象存储服务240可以存储数据库230的备份包,使用时无需考虑容量限制,并且可以提供多种存储类型供选择,能够满足租户的各类业务场景诉求。
内部管理服务250可以与APIG 210连接,用于通过APIG 210来对租户的业务系统200实施内部管理,包括但不限于对租户企业的招聘、入职、离职、人员管理、IT服务管理等。
需要说明的是,图1仅是对本申请实施例的业务系统200的功能模块的示例说明而非限定,在其它实施例中,根据具体的应用场景或者业务需求,更改业务系统200所需使用的装置或功能模块,在此不再赘述。
2、云监控和分析系统100:
本申请实施例中,云监控和分析系统100可以与租户的业务系统200连接,并可以为租户的业务系统200提供云监控服务和分析服务,来监控租户的业务系统200的运行情况、分析业务系统200的负载异常事件的原因等,以快速恢复业务系统200的正常运行。
示例地,该云监控和分析系统100例如可以包括但不限于以下功能模块:注册装置110、监控装置120、日志装置130、日志分析处理集群140、SQL语句分析装置150、复刻数据库节点160以及云管理平台170。应理解,本申请实施例中,控制台(console)管理接口是云管理平台为租户提供的通信接口的示例而非限定,在其它实施例中,该通信接口可以存在多种变形,本申请实施例对此不做限定。并且,云监控和分析系统100的除了云管理平台170以外的装置可以统称为分析装置。
(1)注册装置110:
本申请实施例中,注册装置110可以用于实现对访问数据库230的服务(例如服务221、服务222、服务223等)的管理功能以及服务的业务场景的管理功能,并可用于收集、记录服务的业务场景与对应的数据库SQL模板的对应关系。
其中,服务的管理功能可以包括管理对接到云监控和分析系统100的服务的信息,例如包括对服务的信息注册、更新和注销等。租户或者租户的业务系统200向注册装置110注册服务时需要提供的信息可以包括但不限于服务的以下信息:产品部、服务域、服务名称、微服务名称、访问的数据库、服务负责人、服务团队成员等。在需要对已注册的服务的相关信息进行更新时,注册装置110可以从租户的业务系统200接收更新后的服务信息并存储。在需要对已注册的服务进行注销时,注册装置110可以接收到来自业务系统200的注销指令后,删除该服务的相关注册信息。
业务场景的管理功能包括管理和维护服务可能涉及的各种业务场景,以及各种业务场景与该业务场景在数据库230中执行的SQL模板的对应关系,例如一对一关系、一对多关系等。
在一个可选的实施方式中,注册装置110可以通过云管理平台,为租户提供可视化的注册和管理功能。租户可以基于服务注册的接口定义,通过租户的电子设备和相应的控制台管理接口,将业务系统200的实现业务并需要访问数据库230的服务注册到云监控和分析系统100,以便云监控和分析系统100向租户的业务系统200提供云监控服务时,能够检测数据库230的负载异常事件,并快速分析识别出并发的来源或业务场景,以及时执行应对策略,维护业务的稳定运行和降低数据库的服务。
示例的,如图2所示,在注册装置110提供的控制台管理界面中,租户可以通过对待注册的服务涉及的业务场景与SQL模板进行绑定,以向云监控和分析系统100注册访问数据库230的服务、并注册服务所实现的各种业务场景与SQL模板的对应关系。其中,租户可以通过点击“新增”按钮为业务场景新增SQL模板,或者通过点击“删除”按钮来删除所选择的SQL模板。在租户点击“新增”按钮的情况下,在所呈现的控制台管理界面上可以悬浮出如图3所示的用于新增SQL模板的界面,租户可以在该界面提供的相关属性配置项中输入或选择相应的参数,以进行业务场景与SQL模板之间的绑定(或称为关联)。
如图3所示,可供租户配置的属性配置项可以包括但不限于:服务属性,用于指示业务场景关联的服务,例如弹性云服务器(elastic cloud server,ECS);微服务属性,用于指示业务场景关联的微服务,例如nova;业务场景属性,用于指示描述业务场景,例如节点规格(flavor)管理;接口属性,用于指示业务场景关联的接口,例如查询节点规格;接口描述属性,用于描述业务场景关联的接口的用途,例如查询节点规格信息;相关SQL操作属性,用于指示SQL模板,可以包括SQL属性与SQL描述属性等。租户可以通过点击加号“+”按钮或者减号“-”按钮,来增加或减少新增的SQL模板的相关SQL操作。新增的SQL模板的相关属性配置完成后,租户可以通过点击“确认”按钮结束此项配置。若租户放弃配置,则可以通过点击“取消”按钮放弃。在租户确认配置后,在如图2所示的控制台管理界面上,会呈现所注册的业务场景,以及该业务场景关联的SQL模板。在一个可选的实施方式中,租户可以通过该控制台管理界面查看所注册的各个服务的注册信息。在需要的情况下,租户可以通过“修改”按钮,对业务场景的注册信息、以及业务场景和SQL模板的对应关系进行修改,修改界面与图3所示的界面相似,可以相互参见,在此不再赘述。
需要说明的是,图2-图3所示的界面图中,仅以ECS的业务场景以及绑定的SQL模板为例,对访问数据库230的服务的管理、服务的业务场景的管理以及服务的业务场景与SQL模板之间的对应关系管理的示例说明而非限定。在实际应用中,租户可以结合自身的业务系统或者应用需求,通过与图2或图3相似的管理界面,在云监控和分析系统100注册访问数据库230的服务,以及对该服务实施管理以及业务场景的管理,相应的属性配置项也可以根据自身的业务系统或者应用需求进行调整,在此不再赘述。
(2)监控装置120:
本申请实施例中,监控装置120用于根据预配置信息,收集云监控和分析系统100提供的云监控服务针对租户的业务系统200进行监控产生的监控信息(或称为指标数据),并对所收集到的监控信息进行展示以及提供自动化分析能力。
示例地,云监控服务针对租户的业务系统200进行监控的关键指标可以包括两类:系统指标和业务指标。其中,以待监控的目标对象是数据库为例,系统指标例如可以包括访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率、网络延时(ping)等。业务指标可以包括数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量等。
一种可选的实施方式中,监控装置120可以通过云管理平台,为租户提供可视化的配置和管理功能。租户可以基于云监控服务的接口定义,通过其终端设备和相应的控制台管理接口,配置所需监控的云资源以及监控指标,以便监控装置120可以实时地或周期性监控租户的业务系统(例如数据库)的运行情况,以在数据库发生负载异常事件(例如负载冲高)时提供自动化的分析能力,帮助快速识别出并发的来源或业务场景,以及时执行应对策略,维护业务的稳定运行和降低数据库的服务。
示例地,负载异常事件可以通过为数据库配置相关指标监控阈值来检测,当检测到数据库关联的相关指标大于或等于所配置的指标阈值时,即认为发生负载异常事件。
如图4所示,以在控制台管理界面为数据库配置系统指标监控阈值模板为例,在该控制台管理界面中,租户也可以通过点击“新增”按钮新增系统指标监控阈值模板,或者通过点击“删除”按钮来删除所选择的系统指标监控阈值模板。
在租户点击“新增”的情况下,在该控制台管理界面上可以悬浮出如图5所示的“新增系统指标监控阈值模板”,与图3所示的配置界面相似地,租户可以在图5所示界面提供的相关属性配置项中输入或选择相应的参数,以配置系统指标监控阈值模板。相关属性配置项例如可以包括对CPU、内存、IO、网络丢包率、网络时延等待监控的关键指标的阈值配置项。相似地,租户点击“确认”按钮配置完成后,在图4所示的控制台管理界面上会呈现相应系统指标监控阈值模板的参数,例如模板标识(Identity document,ID)(例如001)、模板对应的CPU负载阈值(例如0.9)、内存负载阈值(例如0.9)、IO负载阈值(例如0.8)、网络丢包率阈值(例如0.2)、创建人(例如张三00123456)、以及相关的操作按钮(例如绑定、修改等)。在需要的情况下,租户可以通过点击系统指标监控阈值模板关联的“修改”按钮,修改模板的相关参数,修改界面与图5所示的界面相似,可以相互参见,在此不再赘述。
其中,在租户针对具体的系统指标监控阈值模板点击该模板“绑定”按钮时,在该控制台管理界面上还可以悬浮出待与系统指标监控阈值模板关联的服务的业务场景的配置界面,租户可以在该界面提供的相关属性配置项中输入或选择相应的参数,以为所选择的系统指标监控阈值模板关联待监控的目标对象。
以待监控的目标对象是数据库为例,如图6所示的“关联数据库信息”界面,相关属性配置项例如可以包括数据库所属的区域、AZ、数据中心产品(performance optimization datacenter,POD)、数据库名称、数据库主节点IP等。相似地,租户可以在各个属性配置项输入或选择相应的参数来进行配置。租户点击“确认”按钮配置完成后,在图4所示的界面上会呈现与系统指标监控阈值模板关联的数据库信息,例如数据库所属的区域(例如华北-北京四)、AZ(例如AZ1)、POD(例如pod15)、数据库名称(例如gaussdb nova)、数据库主节点IP(例如10.77.24.177)、以及相关的操作按钮(例如绑定、解绑、修改)等。在需要的情况下,租户可以通过点击“修改”按钮修改数据库信息,或者通过点击“解绑”按钮或者“绑定”按钮修改该数据库关联的模板。例如,“修改” 按钮可以用于对已绑定的数据库信息进行修改,“解绑”按钮用于删除模板已绑定的数据库信息,“绑定”按钮可以用于为模板新增待绑定的数据库信息。应理解,此处仅是示例说明而非任何限定,在实际应用中,租户可以根据自身的业务系统或者应用需求,对相关的属性配置方式或者属性配置项进行调整,在此不再赘述。
与为数据库配置系统指标监控阈值模板相似的,图4所示的控制台管理界面还可以用于为数据库配置业务指标监控阈值模板(图中未示出),该业务指标监控阈值模板的配置界面可以如图7所示,租户可以在该界面提供的相关属性配置项中输入或选择相应的参数,以配置业务指标监控阈值模板。相关属性配置项例如可以包括数据库的总链接数、活跃链接数、表膨胀率、主备同步率、每分钟慢SQL数量等。租户可以在界面提供的各个属性配置项输入或选择相应的参数来进行配置,例如总链接数(例如2000)、活跃链接数(例如85)、表膨胀率(例如0.3)、主备同步率(例如0.2)、每分钟慢SQL数量(例如100)。租户点击“确认”按钮配置完成后,在图4所示的界面上会呈现相应业务指标监控阈值模板的参数(图4未示出)。在需要的情况下,租户也可以通过在图4所示的界面点击模板关联的“修改”按钮,修改业务指标监控阈值模板的相关参数。相似配置细节可以参阅上文结合图4-6的相关描述,在此不再赘述。
基于以上图4-图7所示的界面图,租户完成对监控装置120的功能配置后,该监控装置120即可在租户的业务系统200实现业务的过程中,收集待监控对象(例如数据库)的各种指标数据,并对所收集到的监控信息进行展示以及提供自动化分析能力。
其中,自动化分析能力指的是,可以对关注的一个或一组关键指标设置阈值,在收集到的指标的值超过预设的阈值时,认为所监控的对象出现负载异常事件,则可以触发对日志分析处理集群140的调用,来对发生负载异常事件的时刻(简称为异常时刻)附近时间范围内的SQL统计,以及根据预配置的日志分析维度,输出SQL统计结果以及SQL的执行计划,以便快速识别出并发的来源或业务场景,并及时执行应对策略,维护业务的稳定运行和降低数据库的服务。下文将在介绍日志分析处理集群140的功能时对该自动化分析能力进行详细介绍,在此暂不赘述。
需要说明的是,本申请实施例中,以上结合图4-图7的相关描述中,仅是对数据库的系统指标监控阈值模板和业务指标监控阈值模板的示例说明而非任何限定。在其它实施例中,系统指标或者业务指标可以根据具体的应用场景或业务需求进行更改,本申请实施例对此不做限定。并且,在为具体业务关联系统指标监控阈值模板或者业务指标监控阈值模板时,配置过程的具体执行顺序不限于以上描述,例如可以是在先指定数据库(或其它服务)的情况下,为数据库关联系统指标监控阈值模板或者业务指标监控阈值模板,在此不再赘述。
(3)日志装置130:
本申请实施例中,日志装置130用于收集待监控的目标对象(例如数据库230)的日志记录。日志装置130可以与日志分析处理集群140连接,并可以将所收集到的日志记录提供给日志分析处理集群140,以便日志分析处理集群140执行日志分析。
(4)日志分析处理集群140:
本申请实施例中,日志分析处理集群140是执行日志分析能力的集群节点。该日志分析处理集群140可以包括多个节点,例如表示为节点01、节点02、节点03等。其中,该多个节点中的至少一个节点可以根据应用场景、业务需求等进行配合,协同执行相关的日志分析步骤,本申请实施例对具体日志分析步骤的执行节点的划分方式不做限定。
其中,日志分析处理集群140可以与注册装置110连接,以从注册装置110获取业务系统200的已注册的服务信息。该日志分析处理集群140可以监控装置120连接,以从监控装置120知悉待监控的对象信息。该日志分析处理集群140可以与日志装置130连接,以从日志装置130获取待分析的日志记录。以待监控的目标对象为数据库为例,日志分析处理集群140可以通过以下信息确定待分析的日志记录的范围:区域、POD、模式(schema)、数据库主节点IP、数据库、日志分析时间范围(包括开始时间、结束时间等)、SQL模板、所属服务、所属组件、服务节点IP等。
具体实施时,在一个可选的实施方式中,日志分析处理集群140可以通过云管理平台170,为租户提供可视化的配置和管理功能。租户可以基于日志分析的接口定义,通过租户的电子设备和相应的控制台管理接口,配置日志分析任务,以指示待分析的日志记录的范围,使得日志分析 处理集群140可以自动地从日志装置130获取该范围内的日志记录,并对该范围内的日志记录进行分析,获得日志分析结果。
在另一个可选的实施方式中,该日志分析处理集群140可以与其它装置连接,并可以被其它装置调用,以为其它装置提供自动化分析能力。例如,以监控装置120调用日志分析处理集群140进行自动化分析为例,监控装置120可以通过对服务装置220或数据库230的相关指标进行监控,在确定发生负载异常事件(例如某个或某些系统指标/业务指标的值超出预设的阈值)时调用日志分析处理集群140。日志分析处理集群140可以是通过与监控装置120之间的通信交互,获知服务装置220或数据库230的负载异常事件关联的指标异常信息,例如异常指标、异常时间范围等。日志分析处理集群140可以是通过与注册装置110的通信交互获知注册在云监控和分析系统100的业务场景、访问数据库的服务、业务场景/服务关联的SQL模板等。在负载异常事件的指标异常信息已知的情况下,日志分析处理集群140可以确定负载异常事件关联的业务场景、服务、SQL模板等。进一步,日志分析处理集群140可以基于所确定的业务场景、服务、SQL模板、异常时间范围等,从日志装置130获取待分析的日志记录,并对所获得的日志记录执行自动化分析过程,获得日志分析结果。
应理解,以上是对日志分析处理集群140的运行时机的示例说明而非任何限定,在其它实施例中,还可以通过其它方式触发日志分析处理集群140执行日志分析处理功能,在此不再赘述。
一种可选的实施方式中,日志分析处理集群140可以通过云管理平台170为租户提供可视化的配置和管理功能。租户可以基于日志分析处理功能的接口定义,通过其终端设备和相应的控制台管理接口,配置日志分析处理集群140的日志分析处理能力,包括但不限于配置日志分析维度、待分析的SQL的数量等。云管理平台170还可以为租户提供可视化的展示界面,该展示界面上可以呈现日志分析结果。
示例地,所述至少一个日志分析维度可以包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率(Queries-per-second,QPS)维度。租户自定义配置的日志分析维度可以如下表1所示:
表1
其中,“视图类型”列表示各个日志分析维度的统计结果,包括但不限于:单SQL执行耗时排序统计;某类SQL执行总耗时排序;某类SQL执行频率排序;SQL、业务占比统计;SQL、主机占比统计;SQL时延分布统计;SQL的QPS统计。“详细描述”列用于描述相应日志分析维度涉及 的SQL执行操作关联的指标,包括但不限于SQL执行耗时(包括总耗时、平均耗时、最短耗时、最长耗时等)、SQL关联的业务场景、SQL模板、SQL数量、SQL执行时间等。“呈现类型”列用于表示相关统计结果的呈现方式,包括但不限于表格、饼图、柱状图、折线图等,本申请实施例对此不做限定。
应理解,本申请实施例中,日志分析维度可以是租户通过云管理平台170提供的可视化的配置界面输入的,配置方式与前文结合图2-图7的配置方式相同或相似,详细实现可以参见前文的相关描述,在此不再赘述。该至少一个日志分析维度可以根据业务系统或者应用需求的变化而更改,在此不再赘述。
对于日志分析结果的呈现,示例地,以单SQL执行耗时排序统计结果为例,如图8所示,在云管理平台170提供的可视化的输出界面上,可以呈现表征被分析的日志记录的范围的信息,例如区域(例如华北-北京四)、POD(例如pod15)、模式(例如准模式(standard schema))、数据库(例如gaussdb_nova)、数据库主节点IP(例如10.77.24.177)、日志分析时间范围(包括开始时间、结束时间、近30分钟等)、SQL模板(图中未完全示出)、所属服务(例如ECS)、所属组件(例如nova)、服务节点IP(例如服务源IP,图中未完全示出)等。该范围内涉及的单个SQL执行的分析统计结果可以以列表的形式呈现,例如包括SQL ID、SQL模板、业务场景、总耗时、SQL执行次数、最短耗时(ms)、最长耗时(ms)、平均耗时(ms)等。
或者,以SQL执行业务占比统计结果、SQL连接IP占比统计结果为例,如图9所示,在云管理平台170提供的可视化的输出界面上,可以呈现表征被分析的日志记录的范围的信息,例如区域(例如华北-北京四)、POD(例如pod15)、模式(例如准模式(standard schema))、数据库(例如gaussdb_nova)、数据库主节点IP(例如10.77.24.177)、日志分析时间范围(包括开始时间、结束时间、近30分钟等)、SQL模板(图中未完全示出)、所属服务(例如ECS)、所属组件(例如nova)、服务节点IP(例如服务源IP,图中未完全示出)等。该范围内涉及的SQL执行业务占比统计结果、SQL连接IP占比统计结果可以以饼图的形式呈现。例如SQL执行业务占比统计结果可以包括ECS业务占比56%、EVS业务占比20%、laasdeploy业务占比14%、VPC业务占比10%。又例如,SQL连接IP占比统计结果可以包括ECS:26.22.240.32 70%、EVS:26.22.240.15 15%、laasdeploy26.22.240.21 10%、VPC:26.22.240.23 5%等。
或者,以SQL时延分布统计结果和SQL的QPS统计结果为例,如图10所示,在云管理平台170提供的可视化的输出界面上,可以呈现表征被分析的日志记录的范围的信息,例如区域(例如华北-北京四)、POD(例如pod15)、模式(例如准模式(standard schema))、数据库(例如gaussdb_nova)、数据库主节点IP(例如10.77.24.177)、日志分析时间范围(包括开始时间、结束时间、近30分钟等)、SQL模板(图中未完全示出)、所属服务(例如ECS)、所属组件(例如nova)、服务节点IP(例如服务源IP,图中未完全示出)等。该范围内涉及的SQL时延分布统计结果和SQL的QPS统计结果可以以折线图的形式呈现。
需要说明的是,以上图8-图10所示的界面图,仅是对日志分析结果的呈现方式的示例说明而非任何限定。在其它实施例中,租户或者云监控和分析系统100可以根据应用场景或者业务需求更改日志分析维度,各个日志分析维度涉及的指标,以及不同分析结果的呈现方式,在此不再赘述。
需要说明的是,本申请实施例中,日志分析处理集群140可以周期性地对日志装置130收集的日志记录进行分析和展示分析结果。或者,日志分析处理集群140可以在接收到来自云管理平台170的日志分析任务时,对日志分析任务指定的日志记录进行分析和展示分析结果。或者,日志分析处理集群140可以在监控装置120检测到业务系统的负载异常事件时,自动触发对异常时刻附近时间范围内的日志记录的分析和展示分析结果。本申请实施例对日志分析的时机或者触发方式不做限定。
(5)SQL语句分析装置150:
本申请实施例中,SQL语句分析装置150可以提供在线SQL分析、测试能力。SQL语句分析装置150可以与日志分析处理集群140连接,在日志分析处理集群140分析出需要优化的SQL后,获取SQL执行计划,以便明确优化方向。
为了提高对SQL的在线分析与测试效率以及准确率,如图1所示,该SQL语句分析装置150 可以具有SQL分析与测试功能组件、背景压力构造管理功能组件和数据管理功能组件。SQL分析与测试功能组件、背景压力构造管理功能组件和数据管理功能组件可以通过云管理平台170为租户提供可视化的配置和管理功能。租户可以基于SQL语句分析功能的接口定义,通过其终端设备和相应的控制台管理接口,向SQL分析与测试功能组件、背景压力构造管理功能组件和数据管理功能组件提供配置信息。例如,租户可以通过云管理平台170向SQL分析与测试功能组件提供SQL优化的属性配置项,或者向背景压力构造管理功能组件输入背景压力策略(包括压力测试模板),或者向数据管理功能组件提供数据同步命令、数据同步周期等。
其中,数据管理功能组件用于将业务系统200的现网的备份数据从OBS中获取并恢复到复刻(fork)数据库节点160,以供SQL分析与测试使用。数据管理功能组件的默认策略可以每周执行一次数据同步,同时也可以支持手动触发数据同步,本申请实施例对此不做限定。
SQL分析与测试功能组件用于负责在复刻数据库节点160上执行需要分析的SQL,分析SQL语句的执行计划,并对SQL执行计划的分析结果进行渲染。SQL分析与测试功能组件提供的功能例如可以包括:①、测试并发执行SQL的耗时。②、在背景压力(可选设置)下分析SQL的执行计划,并对分析结果进行渲染。
背景压力构造管理功能组件用于管理和构造数据库的各种负载场景,例如并发查询、并发更新等,负责背景压力模式的管理以及背景压力参数的注入。背景压力模式的设置需要提供的维度信息可以包括但不限于以下至少一项信息:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
示例地,以租户在控制台管理界面提供背景压力策略为例,在需要新增背景压力策略时,会在控制台管理界面悬浮出如图11所示的界面,通过新增压力测试模板,新增背景压力策略。该压力测试模板例如可以包括背景压力策略涉及的模式名称、服务、服务名称、数据库主节点IP以及相关的压力测试参数(包括参数以及压力值)。在租户点击“确认”按钮后,云管理平台170将租户输入或选择的背景压力策略提供给背景压力构造管理功能组件。
以租户在控制台管理界面输入SQL优化参数为例,在需要执行SQL分析与测试的情况下,可以在控制台管理界面悬浮出如图12所示的界面,该界面的相关属性配置项例如可以包括数据库名称、数据库主节点IP、压力模式选项、SQL语句输入框等。租户可以在各个属性配置项输入或选择相应的参数,在租户选择完毕并点击“执行”按钮后,云管理平台170将租户在此界面输入或选择的参数下发复刻数据库节点,使得SQL可以在真实的数据下在线分析、测试,提升分析效率。
需要说明的是,图11-图12仅是对本申请实施例中,SQL语句分析装置150的功能配置的示例说明而非任何限定。在一些实施例中,控制台管理接口可以实现SQL语句分析装置150与注册装置110、监控装置120、日志装置130、日志分析处理集群140之间的联动,并支持自动化、定制化的SQL分析能力,能够基于注册装置110、监控装置120、日志装置130、日志分析处理集群140的相关信息自动执行SQL语句分析,以减轻云监控和分析系统100的运维负担。
例如,租户在可以在注册业务系统的业务场景或者服务的阶段,即针对云监控和分析系统100的各个功能模块进行配置,包括但不限于配置监控指标、监控指标阈值、日志分析维度、业务场景/服务关联的SQL模板、SQL模板关联的背景压力策略等。监控装置120在检测到负载异常事件后,可以通知日志分析处理集群140。日志分析处理集群140可以响应于负载异常事件自动执行日志分析流程,以梳理出需要优化的SQL语句。日志分析处理集群140在进行日志分析梳理出需要优化的SQL语句后,即可通过调用相关控制台管理接口向SQL语句分析装置150提供需要优化的SQL语句,SQL语句分析装置150可以自动执行SQL语句的执行计划,并获得SQL语句的执行结果以及渲染后的执行计划。云管理平台170可以提供可视化界面来展示SQL语句的执行结果以及渲染后的执行计划。详细实现细节与前文相似,可以参见前文结合图2-图10的相关介绍,在此不再赘述。
(6)复刻数据库节点160:
本申请实施例中,复刻数据库节点160用于恢复业务系统200的数据库230的现网的备份数据,以供SQL语句分析与测试使用。
(7)云管理平台170:
本申请实施例中,云管理平台170可以通过提供相关控制台管理接口来为租户提供自定义通 道,使得租户可以通过该控制台管理接口对云监控和分析系统100的各个功能模块进行功能配置,使得云监控和分析系统100的各个功能模块可以协同根据租户的监控和分析诉求进行监控以及自动化分析,并将分析结果反馈给租户。
其中,参见前文中结合图2-图12的相关介绍,控制台管理接口可以通过提供可视化界面的方式,为租户提供自定义通道,使得租户可以在相关界面上进行功能配置以及查看相关信息。
可以理解的是,控制台管理界面仅是本申请实施例的配置方式的示例说明而非任何限定。在一些实施例中,云监控和分析系统100例如还可以为租户提供基于API格式的自定义配置通道。例如,云监控和分析系统100可以在互联网提供的网页上显示API格式,并注明相应字段的用法。租户在看到相应API格式后,根据该API格式输入相应的参数来完成配置。租户的电子设备可以将输入了参数的API以模板的方式,通过互联网发送给云监控和分析系统100,云监控和分析系统100检测到API中不同字段对应的参数,从而获得租户针对API不同字段对应的需求。因此,本申请实施例中,租户针对云监控和分析系统100的配置信息还可以包括API字段和租户输入的参数,进一步地,云监控和分析系统100可以将租户的配置信息存储至相应的存储器中,以便在需要的情况下,通过从该存储器中获取配置信息,来完成对云监控和分析系统100的相关功能模块的功能配置,来为租户的业务系统200提供云监控服务。
下面结合方法流程图,介绍基于图1所示的系统架构实现的一种云监控和分析方法。
如图13所示,该云监控和分析方法可以包括以下步骤:
S1310:云管理平台获取租户在云管理平台输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据。
示例地,该配置信息可以用于指示所述负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值。所述监控指标包括系统指标和/或业务指标,所述系统指标以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。
或者,该配置信息可以用于指示所述数据库的至少一个日志分析维度。所述至少一个日志分析维度例如可以包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。
或者,该配置信息可以用于指示描述所述数据库的负载场景的信息。描述所述数据库的负载场景的信息例如包括以下至少一项:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
具体实施S1310时,云管理平台可以参阅图2-图12所示的界面图,接收租户在云管理平台输入或选择的相关配置信息,详细实现细节可以参见前文结合图2-图12的相关描述,在此不再赘述。
S1320:分析装置根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。其中,该分析装置为与云管理平台连接的后端装置,可以包括图1中云监控和分析系统100的除云管理平台170以外的全部装置。
实施S1320之前,在一个可选的实施方式中,分析装置可以从云管理平台接收租户在所述云管理平台输入或选择的所述目标SQL语句。在另一个可选的实施方式中,分析装置可以根据内部预设方法获取所述目标SQL语句,例如,分析装置可以在数据库发生负载异常时间时,从所述至少一个日志分析维度对所述数据库的目标日志进行日志分析,根据所述目标日志在所述至少一个日志分析维度的分析结果,确定待优化的目标SQL语句。本申请实施例对该目标SQL语句的获取方式不做限定。
为了便于理解,下面结合方法流程图进行介绍。
参阅图14所示,该云监控和分析方法可以包括以下步骤:
S1401(可选步骤):租户的终端设备可以通过云管理平台向监控装置发送配置信息,该配置信息例如可以用于配置负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值。该监控指标例如可以包括系统指标和/或业务指标,所述系统指标以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。配置 细节可以参见前文的相关介绍,在此不再赘述。
S1402(可选步骤):租户的终端设备可以通过云管理平台向日志分析处理集群发送配置信息,该配置信息例如可以用于配置数据库的至少一个日志分析维度。该至少一个日志分析维度例如可以包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。配置细节可以参见前文的相关介绍,在此不再赘述。
S1403(可选步骤):租户的终端设备可以通过云管理平台向SQL分析与测试功能组件发送配置信息,该配置信息例如可以用于配置待分析的SQL的数量和/或数据库的负载背景参数。SQL分析与测试功能组件可以将该负载背景参数下发(或称为注入)复刻数据库节点。配置细节可以参见前文的相关介绍,在此不再赘述。在负载背景参数注入完成后,复刻数据库节点或SQL分析与测试功能组件还可以向云管理平台反馈负载背景参数注入响应信息。
S1404:监控装置获取至少一个监控指标的指标数据,在所述指标数据满足预设的分析触发条件(例如指标数据大于或等于对应的指标阈值)时,确定所述数据库发生所述负载异常事件。
在另一个实施例中,该S1404可以替换为:租户的终端设备通过云管理平台向SQL分析与测试功能组件发送指示信息,该指示信息用于指示待优化的目标SQL语句。即,可以采用手动的方式触发SQL的在线分析过程,本申请实施例对该触发方式不做限定。
S1405:监控装置指示待分析的日志记录的范围,并通知日志分析处理集群进行日志分析。
S1406:日志分析处理集群可以获取数据库的目标日志,并从至少一个日志分析维度对所述数据库的目标日志进行日志分析,根据所述目标日志在所述至少一个日志分析维度的分析结果后,向监控装置反馈日志分析结果。
S1407:监控装置通知复刻数据库节点对目标SQL语句进行分析。
S1408:复刻数据库节点恢复租户的业务系统的数据库的备份数据包,并在所述复刻数据库节点执行目标SQL语句后,向监控装置反馈SQL分析结果。
S1409:监控装置根据自身的信息、来自日志分析处理集群、SQL分析与测试功能组件或者复刻数据库节点的信息,生成分析报告。
S1410(可选步骤):租户的终端设备通过云管理平台请求下载分析报告。
S1411(可选步骤):监控装置向云管理平台反馈分析报告。
S1412(可选步骤):云管理平台可以输出分析报告。
示例地,分析报告的输出界面可以如图15所示,以列表方式呈现至少一个报告,每个分析报告的巡检列表可以包括该分析报告关联的标识(例如序号)、类型、巡检结果、报告创建时间、报告起止时间、报告进度、触发方式以及相关操作(例如查看、邮件、修改)等。租户在选择具体的分析报告后,可以通过点击“删除”按钮删除该分析报告,或者通过点击“导出”按钮下载该分析报告。租户在选择具体的分析报告后,可以通过点击该分析报告关联的“查看”、“邮件”或者“修改”按钮对该分析报告进行相应的查看操作、新建邮件操作或者修改操作。在租户查看分析报告时,该分析报告中例如可以包括所述数据库的至少一个日志分析维度的分析结果;所述数据库的至少一个监控指标的指标数据;所述目标SQL语句;所述目标SQL语句的执行计划;所述目标SQL语句的执行计划渲染图。分析报告的内容可以采用以下至少一种展现方式呈现:进度条、百分比、饼状图、列表、折线图或者仪表盘。相关输出界面可以参见上文结合图8-图10的相关介绍,在此不再赘述。
由此,通过以上云监控和分析方法,云监控和分析系统可以在获得租户的相关配置信息后,自动检测数据库的负载异常事件,并在数据库发生负载异常事件时,根据预配置的信息,自动化地对数据库的目标日志进行统计和分析,以快速识别出并发的来源或业务场景,以及时执行应对策略,维护业务的稳定运行和降低数据库的服务。
本申请还提供一种计算设备。如图16所示,计算设备1600包括:总线1602、处理器1604、存储器1606和通信接口1608。处理器1604、存储器1606和通信接口1608之间通过总线1602通信。计算设备1600可以是服务器或终端设备。应理解,本申请不限定计算设备1600中的处理器、存储器的个数。
总线1602可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工 业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图16中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1604可包括在计算设备1600各个部件(例如,存储器1606、处理器1604、通信接口1608)之间传送信息的通路。
处理器1604可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器1606可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1604还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器1606中存储有可执行的程序代码,处理器1604执行该可执行的程序代码以分别实现前述业务系统所包含的装置的功能或者实现前述云监控和分析系统所包含的装置的功能,从而实现本申请实施例的云监控和分析方法。也即,存储器1606上存有用于执行云监控和分析方法的指令。
通信接口1608使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1600与其他设备或通信网络之间的通信。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
如图17所示,所述计算设备集群包括至少一个计算设备1600。计算设备集群中的一个或多个计算设备1600中的存储器1606中可以存有相同的用于执行云监控和分析方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1600的存储器1606中也可以分别存有用于执行云监控和分析方法的部分指令。换言之,一个或多个计算设备1600的组合可以共同执行用于执行云监控和分析方法的指令。
需要说明的是,计算设备集群中的不同的计算设备1600中的存储器1606可以存储不同的指令,分别用于执行云管理平台或分析装置的部分功能。也即,不同的计算设备1600中的存储器1606存储的指令可以实现前文述及的云管理平台或分析装置中的一个或多个模块的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图18示出了一种可能的实现方式。如图18所示,两个计算设备1600A和1600B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1600A中的存储器1606中存有执行分析装置的功能的指令。同时,计算设备1600B中的存储器1606中存有执行分析装置的功能的指令。
图18所示的计算设备集群之间的连接方式可以是考虑到本申请提供的云监控和分析方法需要多个计算设备,例如大量地存储数据和分析计算,因此考虑分析装置的功能交由计算设备1600A执行。
应理解,图18中示出的计算设备1600A的功能也可以由多个计算设备1600完成。同样,计算设备1600B的功能也可以由多个计算设备1600完成。
需要说明的是,计算设备集群中的不同的计算设备1600中的存储器1606可以存储不同的指令,用于执行云监控和分析系统的部分功能。也即,不同的计算设备1600中的存储器1606存储的指令可以实现云监控和分析系统中的一个或多个装置的功能。
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行云监控和分析方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质 (例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行云监控和分析方法,或指示计算设备执行云监控和分析方法。
因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,各个实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。

Claims (27)

  1. 一种云监控和分析方法,其特征在于,所述方法包括:
    获取租户在云管理平台输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据;
    根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收租户在所述云管理平台输入或选择的所述目标SQL语句。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,包括:
    在所述数据库发生负载异常事件时,根据所述配置信息对待优化的目标SQL语句进行分析。
  4. 根据权利要求3所述的方法,其特征在于,所述配置信息用于指示所述负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值,所述方法还包括:
    获取所述至少一个监控指标的指标数据;
    在所述指标数据大于或等于对应的指标阈值时,确定所述数据库发生所述负载异常事件。
  5. 根据权利要求4所述的方法,其特征在于,所述监控指标包括系统指标和/或业务指标,所述系统指标以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。
  6. 根据权利要求3-5中任一项所述的方法,其特征在于,所述配置信息用于指示所述数据库的至少一个日志分析维度,所述方法还包括:
    从所述至少一个日志分析维度对所述数据库的目标日志进行日志分析;
    根据所述目标日志在所述至少一个日志分析维度的分析结果,确定待优化的目标SQL语句。
  7. 根据权利要求6所述的方法,其特征在于,所述至少一个日志分析维度包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,包括:
    在复刻数据库节点恢复所述数据库的备份数据包;
    在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
  9. 根据权利要求8所述的方法,其特征在于,所述配置信息还用于指示描述所述数据库的负载场景的信息,所述在所述复刻数据库节点执行所述目标SQL语句,获得分析报告,包括:
    根据所述配置信息在所述复刻数据库节点构造所述负载场景后,在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
  10. 根据权利要求9所述的方法,其特征在于,描述所述数据库的负载场景的信息包括以下至少一项:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述分析报告包括以下至少一项:
    所述数据库的至少一个日志分析维度的分析结果;
    所述数据库的至少一个监控指标的指标数据;
    所述目标SQL语句;
    所述目标SQL语句的执行计划;
    所述目标SQL语句的执行计划渲染图。
  12. 根据权利要求11所述的方法,其特征在于,所述分析报告的内容采用以下至少一种展现方式呈现:进度条、百分比、饼状图、列表、折线图或者仪表盘。
  13. 一种云监控和分析系统,其特征在于,包括:
    云管理平台,用于接收租户输入或选择的配置信息,所述配置信息用于表示所述租户对数据库的监控与分析诉求,所述数据库用于存储所述租户的业务系统的业务数据;
    分析装置,用于根据所述配置信息对待优化的目标结构化查询语言SQL语句进行分析,获得分析报告,所述目标SQL语句是历史对所述数据库进行操作的语句。
  14. 根据权利要求13所述的系统,其特征在于,
    所述云管理平台用于接收租户在所述云管理平台输入或选择的所述目标SQL语句。
  15. 根据权利要求13所述的系统,其特征在于,所述分析装置用于:
    在所述数据库发生负载异常事件时,根据所述配置信息对待优化的目标SQL语句进行分析。
  16. 根据权利要求15所述的系统,其特征在于,所述配置信息用于指示所述负载异常事件关联的至少一个监控指标以及所述监控指标对应的指标阈值,所述分析装置还用于:
    获取所述至少一个监控指标的指标数据;
    在所述指标数据大于或等于对应的指标阈值时,确定所述数据库发生所述负载异常事件。
  17. 根据权利要求16所述的系统,其特征在于,所述监控指标包括系统指标和/或业务指标,所述系统指标以下至少一项:访问数据库的服务所属计算设备的CPU负载、内存负载、磁盘读/写(IO)负载、网络丢包率或者网络延时;所述业务指标包括以下至少一项:数据库总链接数、数据库活跃连接数、表膨胀率、主备同步率、每分钟慢SQL数量。
  18. 根据权利要求15-17中任一项所述的系统,其特征在于,所述配置信息用于指示所述数据库的至少一个日志分析维度,所述分析装置还用于:
    从所述至少一个日志分析维度对所述数据库的目标日志进行日志分析;
    根据所述目标日志在所述至少一个日志分析维度的分析结果,确定待优化的目标SQL语句。
  19. 根据权利要求18所述的系统,其特征在于,所述至少一个日志分析维度包括:单个SQL的执行耗时维度、单个类别SQL的总执行耗时维度、SQL执行业务占比维度、SQL连接主机占比维度、SQL时延分布维度或者SQL的每秒查询率QPS维度。
  20. 根据权利要求13-19中任一项所述的系统,其特征在于,所述分析装置用于:
    在复刻数据库节点恢复所述数据库的备份数据包;
    在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
  21. 根据权利要求20所述的系统,其特征在于,所述配置信息还用于指示描述所述数据库的负载场景的信息,所述分析装置用于:
    根据所述配置信息在所述复刻数据库节点构造所述负载场景后,在所述复刻数据库节点执行所述目标SQL语句,获得分析报告。
  22. 根据权利要求21所述的系统,其特征在于,描述所述数据库的负载场景的信息包括以下至少一项:压力模式名称、服务名称、数据库名称、SQL执行的并发数量或者并发执行的SQL。
  23. 根据权利要求13-22中任一项所述的系统,其特征在于,所述分析报告包括以下至少一项:
    所述数据库的至少一个日志分析维度的分析结果;
    所述数据库的至少一个监控指标的指标数据;
    所述目标SQL语句;
    所述目标SQL语句的执行计划;
    所述目标SQL语句的执行计划渲染图。
  24. 根据权利要求23所述的系统,其特征在于,所述分析报告的内容采用以下至少一种展现方式呈现:进度条、百分比、饼状图、列表、折线图或者仪表盘。
  25. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1-12中任一项所述的方法。
  26. 一种包含指令的计算机程序产品,其特征在于,当所述指令被计算设备集群运行时,使得所述计算设备集群执行如权利要求的1-12中任一项所述的方法。
  27. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由计算设备集群执行时,所述计算设备集群执行如权利要求1-12中任一项所述的方法。
PCT/CN2023/104509 2022-06-30 2023-06-30 一种云监控和分析方法、系统 WO2024002327A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210771078 2022-06-30
CN202210771078.8 2022-06-30
CN202211172568.2A CN117370128A (zh) 2022-06-30 2022-09-26 一种云监控和分析方法、系统
CN202211172568.2 2022-09-26

Publications (1)

Publication Number Publication Date
WO2024002327A1 true WO2024002327A1 (zh) 2024-01-04

Family

ID=89383345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104509 WO2024002327A1 (zh) 2022-06-30 2023-06-30 一种云监控和分析方法、系统

Country Status (1)

Country Link
WO (1) WO2024002327A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105208098A (zh) * 2015-08-24 2015-12-30 用友网络科技股份有限公司 云监控系统的实现装置和方法
CN109376139A (zh) * 2018-08-15 2019-02-22 中国平安人寿保险股份有限公司 数据库集中监控方法、计算机装置及存储介质
CN110011853A (zh) * 2019-04-11 2019-07-12 中国联合网络通信集团有限公司 一种面向多平台和集群的交叉故障排查方法及装置
CN113177060A (zh) * 2021-05-25 2021-07-27 中国工商银行股份有限公司 一种管理sql语句的方法、装置及设备
CN113485887A (zh) * 2021-06-29 2021-10-08 上海众言网络科技有限公司 数据库监控方法、装置、电子设备及存储介质
CN114443435A (zh) * 2022-01-27 2022-05-06 中远海运科技股份有限公司 一种面向容器微服务的性能监控告警方法及告警系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105208098A (zh) * 2015-08-24 2015-12-30 用友网络科技股份有限公司 云监控系统的实现装置和方法
CN109376139A (zh) * 2018-08-15 2019-02-22 中国平安人寿保险股份有限公司 数据库集中监控方法、计算机装置及存储介质
CN110011853A (zh) * 2019-04-11 2019-07-12 中国联合网络通信集团有限公司 一种面向多平台和集群的交叉故障排查方法及装置
CN113177060A (zh) * 2021-05-25 2021-07-27 中国工商银行股份有限公司 一种管理sql语句的方法、装置及设备
CN113485887A (zh) * 2021-06-29 2021-10-08 上海众言网络科技有限公司 数据库监控方法、装置、电子设备及存储介质
CN114443435A (zh) * 2022-01-27 2022-05-06 中远海运科技股份有限公司 一种面向容器微服务的性能监控告警方法及告警系统

Similar Documents

Publication Publication Date Title
US11762882B2 (en) System and method for analysis and management of data distribution in a distributed database environment
US10592474B2 (en) Processing log files using a database system
US10339126B2 (en) Processing log files using a database system
US10296661B2 (en) Processing log files using a database system
US10169416B2 (en) Detecting logical relationships based on structured query statements
US8407205B2 (en) Automating sharing data between users of a multi-tenant database service
US9992269B1 (en) Distributed complex event processing
US10135749B2 (en) Mainframe migration tools
CN109683911B (zh) 一种实现自动化应用部署及影响分析的系统
US11036608B2 (en) Identifying differences in resource usage across different versions of a software application
KR20180121337A (ko) 자동화된 테스트 시스템을 위한 방법 및 설계
US10042956B2 (en) Facilitating application processes defined using application objects to operate based on structured and unstructured data stores
CN111078695B (zh) 计算企业内元数据关联关系的方法及装置
US11615076B2 (en) Monolith database to distributed database transformation
CN112395333B (zh) 用于排查数据异常的方法、装置、电子设备及存储介质
CN112416991A (zh) 一种数据处理方法、装置以及存储介质
US10365925B2 (en) Merging applications
US20210056121A1 (en) Systems and methods of determining target database for replication of tenant data
US10089350B2 (en) Proactive query migration to prevent failures
US9361351B2 (en) Data management via active and inactive table space containers
CN111159207B (zh) 一种信息处理方法和装置
CN110309206B (zh) 订单信息采集方法及系统
US10997160B1 (en) Streaming committed transaction updates to a data store
WO2024002327A1 (zh) 一种云监控和分析方法、系统
CN114996104A (zh) 一种数据处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23830491

Country of ref document: EP

Kind code of ref document: A1