CN109766238B - Session number-based operation and maintenance platform performance monitoring method and device and related equipment - Google Patents

Session number-based operation and maintenance platform performance monitoring method and device and related equipment Download PDF

Info

Publication number
CN109766238B
CN109766238B CN201811537816.2A CN201811537816A CN109766238B CN 109766238 B CN109766238 B CN 109766238B CN 201811537816 A CN201811537816 A CN 201811537816A CN 109766238 B CN109766238 B CN 109766238B
Authority
CN
China
Prior art keywords
monitored object
monitoring period
cpu utilization
utilization rate
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811537816.2A
Other languages
Chinese (zh)
Other versions
CN109766238A (en
Inventor
陈东杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201811537816.2A priority Critical patent/CN109766238B/en
Publication of CN109766238A publication Critical patent/CN109766238A/en
Application granted granted Critical
Publication of CN109766238B publication Critical patent/CN109766238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A session number-based operation and maintenance platform performance monitoring method comprises the following steps: acquiring the CPU utilization rate of a monitored object in a current monitoring period; when the CPU utilization rate of a monitored object is determined to meet a first alarm condition, acquiring the number of active sessions of the monitored object in the current monitoring period; judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition; when the second alarm condition is determined not to be met, alarming the monitored object in a preset first alarm mode; and when the second alarm condition is met, alarming the monitored object in a preset second alarm mode. The invention also provides an operation and maintenance platform performance monitoring device based on the session number, electronic equipment and a storage medium. The invention can determine whether the high utilization rate of the CPU is caused by the number of the active sessions or not through cloud monitoring and different alarm modes, is beneficial to an operation and maintenance manager to quickly locate the root of the problem and is convenient to investigate.

Description

Session number-based operation and maintenance platform performance monitoring method and device and related equipment
Technical Field
The invention relates to the technical field of computers, in particular to a session number-based operation and maintenance platform performance monitoring method, a session number-based operation and maintenance platform performance monitoring device and related equipment.
Background
In the traditional operation and maintenance management mode, a manager manually monitors the running state of the system, and the daily management operation in the application system is manually processed, so that the cost is high, the efficiency is low, and the real-time performance is lacked. Are not suitable for large application systems. Especially for highly clustered enterprise application management scenes, an automatic operation and maintenance management mode is indispensable.
Nowadays, a monitoring system is adopted to collect and store data of monitored objects (such as clusters, hosts, databases and the like), obtain corresponding indexes through analysis, statistics and other processing, and verify whether the indexes meet expectations in real time, and alarm if the indexes do not meet the expectations (that is, the indexes meet alarm conditions in alarm rules). For example, the alarm rule provides that an alarm is given when the CPU utilization rate of the host exceeds 95%, and if the CPU utilization rate of the host reaches 98%, the index is judged not to meet expectations or meets the alarm condition in the alarm rule, so that the alarm is triggered.
However, the monitoring system usually monitors only fixed indexes, such as CPU usage, network traffic, database connection number, and the like, and does not monitor the active session number in combination, and does not judge in combination with the historical monitoring data, so that the alarm rule is relatively simple, and false alarm may be caused to affect the work calculation of the administrator.
Disclosure of Invention
In view of the above, it is necessary to provide a session number-based operation and maintenance platform performance monitoring method, device and correlation, which can determine whether the high CPU utilization rate is caused by the active session number through cloud monitoring and different alarm modes, and is helpful for an operation and maintenance manager to quickly locate the root of the problem, so as to facilitate troubleshooting.
The first aspect of the present invention provides a session number-based operation and maintenance platform performance monitoring method, where the method includes:
acquiring the CPU utilization rate of a monitored object in a current monitoring period;
when the CPU utilization rate of the monitored object is determined to meet a first alarm condition, acquiring the number of active sessions of the monitored object in the current monitoring period;
judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition or not;
when the number of the active sessions of the monitored object in the current monitoring period is determined not to meet the second alarm condition, alarming the monitored object in a preset first alarm mode;
and when the number of active sessions of the monitored object in the current monitoring period is determined to meet the second alarm condition, alarming the monitored object in a preset second alarm mode.
According to a preferred embodiment of the present invention, the current monitoring period is set according to the CPU utilization in the historical monitoring period, and includes:
acquiring the CPU utilization rate corresponding to the historical monitoring period;
acquiring a target CPU utilization rate of which the CPU utilization rate is higher than a preset CPU utilization rate threshold value;
acquiring a monitoring time period or a monitoring time point corresponding to the target CPU utilization rate;
and setting the monitoring time period or the monitoring time point as the current monitoring period.
According to a preferred embodiment of the present invention, after the obtaining of the number of active sessions of the monitoring object in the current monitoring period, the method further includes:
judging whether the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session number threshold value or not;
and when the number of active sessions of the monitored object in the current monitoring period is determined to be smaller than the preset session number threshold, alarming the monitored object in a preset third alarming mode.
According to a preferred embodiment of the invention, the second alarm condition is: the number of active sessions of the monitored object in the current monitoring period is greater than a first proportional value of the number of active sessions of the monitored object in the previous monitoring period and is less than a second proportional value of the number of active sessions of the monitored object in the previous monitoring period, wherein the first proportional value is less than 1, and the second proportional value is greater than 1.
According to a preferred embodiment of the present invention, when it is determined that the CPU usage of the monitoring object does not satisfy the first alarm condition, the method further comprises:
and acquiring the CPU utilization rate of the monitored object in the next monitoring period.
According to a preferred embodiment of the present invention, the preset second alarm manner is: and generating a monitoring information display interface, and displaying the number of the activity sessions of the monitoring object and the corresponding activity session ID.
According to a preferred embodiment of the present invention, the obtaining the CPU utilization of the monitored object in the current monitoring period includes:
when the system is in an available state, acquiring process information of an application program currently running by a monitoring object;
calling resource information of the RocketMQ process through a TOP command according to the process identifier of the process information;
and acquiring the CPU utilization rate of the monitored object in the current monitoring period from the resource information of the RockettMQ process.
The second aspect of the present invention provides an operation and maintenance platform performance monitoring apparatus based on session numbers, the apparatus including:
the first acquisition module is used for acquiring the CPU utilization rate of the monitored object in the current monitoring period;
the first judgment module is used for judging whether the CPU utilization rate of the monitored object meets a first alarm condition;
the second obtaining module is used for obtaining the number of active sessions of the monitored object in the current monitoring period when the first judging module determines that the CPU utilization rate of the monitored object meets a first alarm condition;
the second judgment module is used for judging whether the number of the active sessions of the monitored object in the current monitoring period meets a second alarm condition;
the first alarm module is used for alarming the monitored object in a preset first alarm mode when the second judgment module determines that the number of active sessions of the monitored object in the current monitoring period does not meet the second alarm condition;
and the second alarm module is used for alarming the monitored object in a preset second alarm mode when the second judgment module determines that the number of active sessions of the monitored object in the current monitoring period meets the second alarm condition.
A third aspect of the present invention provides an electronic device, where the electronic device includes a processor and a memory, and the processor is configured to implement the session number-based operation and maintenance platform performance monitoring method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for monitoring the performance of the operation and maintenance platform based on the session number is implemented.
The embodiment of the invention provides a method, a device, electronic equipment and a storage medium for monitoring the performance of an operation and maintenance platform based on session numbers, which are used for acquiring monitoring indexes of a monitored object, wherein the monitoring indexes comprise: CPU utilization rate and active session number; when judging whether the CPU utilization rate of a monitored object in the current monitoring period meets a first alarm condition, continuously judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition; when the number of the active sessions of the monitored object in the current monitoring period is determined not to meet a second alarm condition, alarming in a first alarm mode to remind an operation manager that the CPU utilization rate is high; when the active session number of the monitored object in the current monitoring period is determined to meet a second alarm condition, the second alarm mode is used for alarming and reminding an operation manager that the higher CPU utilization rate is caused by the unreasonable active session number, and then the manager is prompted to directly check the active session number to eliminate the fault reason, so that the effect of quickly positioning the active session is achieved. Because the problem of high CPU utilization rate is caused when unreasonable active session numbers are not considered in the traditional operation and maintenance, the active session numbers are easy to be abnormal in a high-concurrency cluster, and the operation and maintenance manager can be helped to quickly locate the root of the problem directly by detecting the active session numbers of the monitored objects, so that the inspection is facilitated, the efficiency of the system operation and maintenance is improved, and the cost of manual inspection is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an operation and maintenance platform performance monitoring method based on session numbers according to an embodiment of the present invention.
Fig. 2 is a functional block diagram of an operation and maintenance platform performance monitoring apparatus based on session numbers according to a second embodiment of the present invention.
Fig. 3 is a schematic diagram of an electronic device according to a third embodiment of the present invention.
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The session number-based operation and maintenance platform performance monitoring method is applied to electronic equipment, can also be applied to a hardware environment formed by the electronic equipment and a server connected with the electronic equipment through a network, and is executed by the server and the electronic equipment together. Networks include, but are not limited to: a wide area network, a metropolitan area network, or a local area network.
The electronic device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing electronic devices.
The electronic equipment which needs to perform the session number-based operation and maintenance platform performance monitoring method can directly integrate the session number-based operation and maintenance platform performance monitoring function provided by the method of the invention on the electronic equipment, or install a client for realizing the method of the invention. For another example, the method provided by the present invention may further run on an electronic device such as a server in the form of a Software Development Kit (SDK), and an interface of the operation and maintenance platform performance monitoring function based on the session number is provided in the form of the SDK, and the server or other electronic devices may implement the operation and maintenance platform performance monitoring function based on the session number through the provided interface.
Example one
Fig. 1 is a flowchart of an operation and maintenance platform performance monitoring method based on session numbers according to an embodiment of the present invention. The operation and maintenance platform performance monitoring method based on the session number monitors the session number index and maintains the operation and maintenance platform according to the monitored session number index. The execution sequence in the flowchart may be changed and some steps may be omitted according to different requirements.
101, obtaining the CPU utilization rate of the monitored object in the current monitoring period.
In this embodiment, the CPU utilization of the monitored object in the period may be monitored by the monitoring system. The monitoring system may be a large-scale system composed of a plurality of cloud products, a service system composed of a few machines, or a stand-alone system.
The monitoring object may be any object in the monitoring system, for example, it may be a physical entity such as a cluster, a system, a device, etc., it may be a software entity such as a computer operating system, an application process, etc., and it may also be a logical entity formed by software and hardware together such as a virtual machine, etc.
The monitoring system monitors the monitoring index of one or more monitored objects. The monitoring index is also called a monitoring item and is an index for alarm judgment.
The monitoring system can simultaneously detect a plurality of monitoring indexes of a plurality of monitored objects.
In this embodiment, the monitoring index includes: in other embodiments, the monitoring index may also include memory usage rate, database connection number, and the like.
In this embodiment, the CPU utilization rate refers to a percentage of the CPU occupied by the running program, and the higher the CPU utilization rate is, the more applications running simultaneously are, the lower the CPU utilization rate is, the less applications running simultaneously are. When the CPU utilization rate is too high, the system performance is affected, the system runs slowly, and the response period is prolonged.
The monitoring period refers to a preset monitoring time period or a preset monitoring time point.
The monitoring period is a time period, for example, 10-12 am of the day, or 3-5 pm of the day, or every third hour.
The monitoring time point is a specific time point, for example, 10 am of the day, or 10 pm of the day, or each hour, such as 1 o ' clock, 2 o ' clock, 3 o ' clock, etc.
In some embodiments, a plurality of monitoring time periods, for example, 10 am to 12 pm and 3 pm to 5 pm of the day, may be preset, and the CPU usage of the monitoring object may be detected in each monitoring time period.
In some embodiments, a plurality of monitoring time points, 10 am of the day, and 10 pm of the day later, may be preset, and the CPU usage of the monitoring object may be detected at each monitoring time point.
In some embodiments, the monitoring period may further include a plurality of monitoring periods and a plurality of monitoring time points at the same time.
In this embodiment, the current monitoring period is set according to the CPU utilization in the historical monitoring period.
Specifically, the setting of the current monitoring period according to the CPU utilization in the historical monitoring period includes:
acquiring the CPU utilization rate corresponding to a historical monitoring period;
acquiring a target CPU utilization rate of which the CPU utilization rate is higher than a preset CPU utilization rate threshold value;
acquiring a monitoring time period or a monitoring time point corresponding to the target CPU utilization rate;
and setting the monitoring time period or the monitoring time point as the current monitoring period.
The method comprises the steps of setting a plurality of monitoring time periods and a plurality of monitoring time points according to the monitoring time periods or the monitoring time points with the historical too high CPU utilization rate, and detecting the CPU utilization rate of a monitored object, wherein the monitoring time periods or the monitoring time points are time periods or time points with frequent user application, and the CPU utilization rate can be too high when the user application is frequent, so that the set monitoring time periods or the monitoring time points are only needed to be detected, and the monitoring time periods or the monitoring time points are not in a monitoring state in real time to detect the CPU utilization rate of the monitored object, so that the waste of system resources is caused.
In this embodiment, the obtaining the CPU utilization of the monitored object in the current monitoring period includes:
1) And when the system is in an available state, acquiring the process information of the application program currently operated by the monitoring object.
In this embodiment, the system includes an available state and an unavailable state.
The available state refers to the state that the system is in a running state.
The unavailable state refers to the system being in a dead halt state or a shutdown state and the like.
When the system is in an available state, the system needs to monitor the CPU utilization of the monitored object in a monitoring period.
In this embodiment, the process information of the application program currently running in the monitoring object may be collected by calling an operating system command (e.g., a PS command).
The Process information of the application may include a Process Identifier (PID) of the application Process. The process identifier is used to distinguish the processes. The process identifier may be identified by a number, such as a four-digit number identifying the process identifier.
The collected process information of the application program can also comprise the application program to which the application program process belongs. An application includes an application process.
All process information for the RockketMQ may be retrieved from all running processes.
Further, the PID of the rockmq process may be obtained from the all-process information in combination with the preset keywords NamesrvStartup and brookerstartup.
For example: the system adopts a command: PID = ' ps-fC java | grep "$ INSTANCES" | egrep "NamesrvStartup | BrokerStartup" | awk ' { print $2} ', and a running result can be obtained: [ root @ SZC-L0075300 Rocklctmq ] # ps-fC java | grep "rmq _ lcloud-config-prd-ins5201" | egrep "NamesrvStartup | BrokerStartup" | awk '{ print $2}'16056, so the system can determine that the PID of the RocktMQ process is 16056.
2) And acquiring the CPU utilization rate of the monitored object in the current monitoring period according to the process information.
In this embodiment, the obtaining, according to the process information, the CPU utilization of the monitored object in the current monitoring period includes:
calling resource information of the RocketMQ process through a TOP command according to the process identifier of the process information;
and acquiring the CPU utilization rate of the monitored object in the current monitoring period from the resource information of the RocketMQ process.
The process information may include a plurality of fields, each field corresponding to a process information, each field including a field name, and the process identifier may be obtained based on the field name. For example, if the field name of the process identifier in the process information is PID, the field with the field name of PID is obtained, and the process identifier is obtained.
The resource information of the RocketMQ process may include CPU utilization, memory utilization, and the like.
102: and judging whether the CPU utilization rate of the monitored object meets a first alarm condition.
In this embodiment, the first alarm condition may include that the CPU utilization exceeds a preset CPU utilization threshold.
The determining whether the CPU usage of the monitored object satisfies the first alarm condition may be determining whether the CPU usage of the monitored object exceeds the preset CPU usage threshold.
The preset CPU usage threshold value is a preset CPU usage threshold value (e.g., 80%) that indicates an optimal state of operation of the monitoring object. When the CPU utilization rate of the monitored object exceeds a preset CPU utilization rate threshold value, the monitoring object is indicated to have more application programs currently running, the load of the monitored object is heavier, and the CPU utilization rate of the monitored object meets a first alarm condition; when the CPU utilization rate of the monitored object does not exceed the preset CPU utilization rate threshold value, the current running application program of the monitored object is moderate, the running condition of the monitored object is good, and the CPU utilization rate of the monitored object does not need to be alarmed.
When the CPU utilization rate of the monitored object is determined not to meet a first alarm condition, executing 103; otherwise, when the CPU utilization rate of the monitored object is determined to meet the first alarm condition, executing 104.
103: and acquiring the CPU utilization rate of the monitored object in the next monitoring period.
And when the CPU utilization rate of the monitored object is determined not to meet the first alarm condition, waiting for the next monitoring period, and acquiring the CPU utilization rate of the monitored object in the next monitoring period.
104: and acquiring the number of active sessions of the monitored object in the current monitoring period.
An active session refers to the time interval between an end user communicating with the interactive system, typically the time elapsed from logging into the system to logging out of the system and possibly some operating space if required. session is a server-side solution for solving the problem of stateless http protocol, which can make a series of interactive actions between a client and a server become a complete transaction and make a website become software in a true sense.
Because the standard Spring Boot product in the market usually cannot monitor the number of active sessions of the rocketMQ, unreasonable use of the active sessions can also cause high CPU utilization rate, and on the traditional host level, the CPU monitors only whether the CPU utilization rate is too high, and does not monitor whether the CPU utilization rate is high due to unreasonable use of the active sessions.
The active sessions of the monitored object in the current monitoring period can be obtained by a springboot-monitor jar package session method, and the number of the active sessions is counted.
105: and judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition.
In this embodiment, the determining whether the number of active sessions of the monitoring object in the current monitoring period meets the second alarm condition includes:
and judging whether the number of the active sessions of the monitoring object in the current monitoring period meets a second alarm condition or not according to the number of the active sessions of the monitoring object in the previous monitoring period.
And detecting the number of active sessions of the monitored object in each monitoring period, and storing the detected number of active sessions in a database according to the time sequence of the monitoring period, so that the active sessions of the monitored object in the current monitoring period can be predicted according to the number of active sessions of the monitored object detected in the previous monitoring period, and the alarm condition can be further determined.
Specifically, the second alarm condition may be: the number of active sessions of the monitored object in the current monitoring period is within a preset range of the number of active sessions of the monitored object in the previous monitoring period.
The preset range of the number of active sessions of the monitoring object is a first proportional value larger than the number of active sessions and a second proportional value smaller than the number of active sessions, wherein the first proportional value is smaller than 1, and the second proportional value is larger than 1.
When the number of active sessions of the monitored object in the current monitoring period is between a first proportional value and a second proportional value of the number of active sessions of the monitored object in the previous monitoring period, the number of active sessions of the monitored object in the current monitoring period is considered to be in line with an expectation, and a second alarm condition is not met; and when the number of active sessions of the monitored object in the current monitoring period is not between the first proportional value and the second proportional value of the number of active sessions of the monitored object in the last monitoring period, namely when the number of active sessions of the monitored object in the current monitoring period is smaller than the first proportional value of the number of active sessions of the monitored object in the last monitoring period or is larger than the second proportional value of the number of active sessions of the monitored object in the last monitoring period, considering that the number of active sessions of the monitored object in the current monitoring period does not meet the expectation, and meeting a second alarm condition.
Executing 106 when the number of the active sessions of the monitoring object in the current monitoring period is determined not to meet the second alarm condition; and executing 107 when the number of the active sessions of the monitoring object in the current monitoring period is determined to meet the second alarm condition.
106: and alarming the monitored object in a preset first alarm mode.
In this embodiment, in the current monitoring period, the CPU utilization of the monitored object exceeds the preset CPU utilization threshold, and the number of active sessions of the monitored object in the preset range of the active sessions in the previous monitoring period indicates that the CPU utilization of the monitored object in the current monitoring period is too high, and is not caused by the number of active sessions, and a preset first alarm manner is used to alarm that the CPU utilization of the monitored object is too high.
The preset first alarm mode can be modes of giving out an alarm sound, displaying an alarm picture, sending an alarm message, sending an alarm mail and the like.
107: and alarming the monitored object in a preset second alarm mode.
In this embodiment, in the current monitoring period, the CPU utilization of the monitored object exceeds the preset CPU utilization threshold, and the number of active sessions of the monitored object is not within the preset range of an active session in the previous monitoring period, which indicates that the CPU utilization of the monitored object in the current monitoring period is too high, and may be caused by the number of active sessions, and a preset second alarm manner is used to alarm that the CPU utilization of the monitored object is too high.
The preset second alarm mode may be that a monitoring information display interface is generated, and the number of active sessions and the corresponding active session ID of the monitored object are displayed in the monitoring information display interface.
The number of active sessions and the corresponding active session ID of the monitoring object can be shown through a chart. The charts may include line graphs, area graphs, thermodynamic diagrams, pie charts, tables, and the like.
In a specific embodiment, the number of active sessions of the monitoring object and the corresponding active session ID may be shown by Grafana. And transmitting the number of active sessions of the monitoring object and the corresponding active session ID to a Grafana platform for display through a Grafana acquisition agent.
Grafana is a visualization panel (Dashboard) supporting various diagrams and layout presentations, graphite, zabbix, influxDB, prometheus and OpenTSDB as data sources. Grafana has the following characteristics: flexible and rich graphical options; a variety of styles can be mixed; support for day and night modes; multiple data sources are supported.
Or, the number of active sessions and the corresponding active session ID of the monitored object may be displayed through Highcharts.
Highcharts is a diagram library written in pure JavaScript. The Highcharts can easily and conveniently add interactive charts to a web site or a web application. HIghcharts supports various illustrations of the monitoring platform.
Further, after the obtaining of the number of active sessions of the monitoring object in the current monitoring period, the method may further include:
judging whether the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session number threshold value;
and when the active session number of the monitored object in the current monitoring period is determined to be smaller than the preset session number threshold, alarming the monitored object in a preset third alarming mode.
If the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session number threshold, determining that an alarm condition is met; and if the number of the active sessions of the monitored object in the current monitoring period is greater than or equal to a preset session number threshold, determining that an alarm condition is not met.
By setting the lowest session number critical value, when the active session number of the monitored object in the monitoring period is lower than the lowest session number critical value, especially when the active session number of the monitored object in the monitoring period tends to zero, the monitored object needs to be alarmed, so that the possibility of service suspension or system abnormality caused by too low active session number is avoided.
In summary, according to the above session number-based operation and maintenance platform performance monitoring method, a monitoring index of a monitored object is obtained, where the monitoring index includes: CPU utilization rate and active session number; when judging whether the CPU utilization rate of a monitored object in the current monitoring period meets a first alarm condition, continuously judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition; when the number of the active sessions of the monitored object in the current monitoring period is determined not to meet a second alarm condition, alarming in a first alarm mode to remind an operation manager that the CPU utilization rate is high; when the active session number of the monitored object in the current monitoring period is determined to meet a second alarm condition, the second alarm mode is used for alarming and reminding an operation manager that the higher CPU utilization rate is caused by the unreasonable active session number, and then the manager is prompted to directly check the active session number to eliminate the fault reason, so that the effect of quickly positioning the active session is achieved. Because the problem of high CPU utilization rate is caused when unreasonable active session numbers are not considered in the traditional operation and maintenance, the active session numbers are easy to be abnormal in a high-concurrency cluster, and the operation and maintenance manager can be helped to quickly locate the root of the problem directly by detecting the active session numbers of the monitored objects, so that the inspection is facilitated, the efficiency of the system operation and maintenance is improved, and the cost of manual inspection is reduced.
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.
The functional modules and hardware structures of the electronic device implementing the session number-based operation and maintenance platform performance monitoring method are respectively introduced below with reference to fig. 2 to 3.
Example two
FIG. 2 is a functional block diagram of the session number-based operation and maintenance platform performance monitoring apparatus in the preferred embodiment of the present invention.
In some embodiments, the session number-based operation and maintenance platform performance monitoring apparatus 20 is operated in an electronic device. The operation and maintenance platform performance monitoring device 20 based on the session number monitors the session number index and maintains the operation and maintenance platform according to the monitored session number index. The session number based operation and maintenance platform performance monitoring apparatus 20 may include a plurality of functional modules composed of program code segments. The program code of each program segment in the session number based operation and maintenance platform performance monitoring apparatus 20 may be stored in a memory and executed by at least one processor.
In this embodiment, the operation and maintenance platform performance monitoring apparatus 20 based on the session number may be divided into a plurality of functional modules according to the functions executed by the apparatus. The functional module may include: the alarm system comprises a first obtaining module 201, a setting module 202, a first judging module 203, a second obtaining module 204, a second judging module 205, a first alarm module 206, a second alarm module 207, a third judging module 208 and a third alarm module 209. The modules referred to herein are a series of computer program segments stored in a memory that can be executed by at least one processor and that perform a fixed function. In some embodiments, the functionality of the modules will be described in greater detail in subsequent embodiments.
The first obtaining module 201 is configured to obtain a CPU utilization of a monitored object in a current monitoring period.
In this embodiment, the CPU utilization of the monitored object in the period may be monitored by the monitoring system. The monitoring system may be a large-scale system composed of a plurality of cloud products, a service system composed of a few machines, or a stand-alone system.
The monitoring object may be any object in the monitoring system, for example, it may be a physical entity such as a cluster, a system, a device, etc., it may be a software entity such as a computer operating system, an application process, etc., and it may also be a logical entity formed by software and hardware together such as a virtual machine, etc.
The monitoring system monitors monitoring indexes of one or more monitored objects. The monitoring index is also called a monitoring item and is an index for alarm judgment.
The monitoring system can simultaneously detect a plurality of monitoring indexes of a plurality of monitored objects.
In this embodiment, the monitoring index includes: number of active sessions, CPU utilization. In other embodiments, the monitoring index may also include a memory usage rate, a database connection number, and the like.
In this embodiment, the CPU utilization rate refers to a percentage of the CPU occupied by the running program, and the higher the CPU utilization rate is, the more applications running simultaneously are, the lower the CPU utilization rate is, the less applications running simultaneously are. When the CPU utilization rate is too high, the system performance is affected, the system runs slowly, and the response period is prolonged.
The monitoring period refers to a preset monitoring time period or a preset monitoring time point.
The monitoring period is a time period, for example, 10-12 am of the day, or 3-5 pm of the day, or every third hour.
The monitoring time point is a specific time point, for example, 10 am of the day, or 10 pm of the day, or each hour, such as 1 o ' clock, 2 o ' clock, 3 o ' clock, etc.
In some embodiments, a plurality of monitoring time periods, for example, 10 am to 12 pm and 3 pm to 5 pm of the day, may be preset, and the CPU usage of the monitoring object may be detected in each monitoring time period.
In some embodiments, a plurality of monitoring time points, 10 am of the day, and 10 pm of the day, may also be set in advance, and the CPU usage rate of the monitoring object is detected at each monitoring time point.
In some embodiments, the monitoring cycle may further include a plurality of monitoring time periods and a plurality of monitoring time points at the same time.
And the setting module 202 is configured to set a current monitoring period according to the CPU utilization in the historical monitoring period.
Specifically, the setting module 203 sets the current monitoring period according to the CPU utilization in the historical monitoring period, including:
acquiring the CPU utilization rate corresponding to the historical monitoring period;
acquiring a target CPU utilization rate of which the CPU utilization rate is higher than a preset CPU utilization rate threshold value;
acquiring a monitoring time period or a monitoring time point corresponding to the target CPU utilization rate;
and setting the monitoring time period or the monitoring time point as the current monitoring period.
The method comprises the steps of setting a plurality of monitoring time periods and a plurality of monitoring time points according to a monitoring time period or a monitoring time point with overhigh historical CPU utilization rate to detect the CPU utilization rate of a monitored object, wherein the monitoring time periods are interrupted or the monitoring time points are time periods or time points with frequent user application, and the CPU utilization rate can be overhigh when the user application is frequent, so that the set monitoring time periods or the monitoring time points are only needed to be detected, but the CPU utilization rate of the monitored object is detected in a monitoring state in a non-real-time manner, so that the waste of system resources is caused.
In this embodiment, the acquiring the CPU utilization of the monitored object in the current monitoring period includes:
1) And when the system is in an available state, acquiring the process information of the application program currently operated by the monitoring object.
In this embodiment, the system includes an available state and an unavailable state.
The available state refers to the state that the system is in a running state.
The unavailable state refers to the state that the system is in a dead halt state or a shutdown state and the like.
When the system is in an available state, the system needs to monitor the CPU utilization of the monitored object in a monitoring period.
In this embodiment, the process information of the application program currently running in the monitoring object may be acquired by calling an operating system command (e.g., a PS command).
The Process information of the application may include a Process Identifier (PID) of the application Process. The process identifier is used to distinguish the processes. The process identifier may be identified by a number, such as a four-digit number identifying the process identifier.
The collected process information of the application program can also comprise the application program to which the application program process belongs. An application includes an application process.
All process information for the RocketMQ may be retrieved from all running processes.
Further, the PID of the rockmq process may be obtained from the all-process information in combination with the preset keywords NamesrvStartup and brookerstartup.
For example: the system adopts a command: PID = ' ps-fC java | grep "$ INSTANCES" | egrep "NamesrvStartup | BrokerStartup" | awk ' { print $2} ', and a running result can be obtained: [ root @ SZC-L0075300 Rocklctmq ] # ps-fC java | grep "rmq _ lcloud-config-prd-ins5201" | egrep "NamesrvStartup | BrokerStartup" | awk '{ print $2}'16056, so the system can determine that the PID of the RocktMQ process is 16056.
2) And acquiring the CPU utilization rate of the monitored object in the current monitoring period according to the process information.
In this embodiment, the obtaining, according to the process information, the CPU utilization of the monitored object in the current monitoring period includes:
calling the resource information of the RocketMQ process through a TOP command according to the process identifier of the process information;
and acquiring the CPU utilization rate of the monitored object in the current monitoring period from the resource information of the RockettMQ process.
The process information may include a plurality of fields, each field corresponding to one of the process information, each field including a field name, and the process identifier may be obtained based on the field name. For example, if the field name of the process identifier in the process information is PID, the field with the field name of PID is obtained, and the process identifier is obtained.
The resource information of the RocketMQ process may include CPU utilization, memory utilization, and the like.
The first judging module 203 is configured to judge whether the CPU utilization of the monitored object meets a first alarm condition.
In this embodiment, the first alarm condition may include that the CPU utilization exceeds a preset CPU utilization threshold.
The determining whether the CPU utilization of the monitored object meets the first alarm condition may be determining whether the CPU utilization of the monitored object exceeds the preset CPU utilization threshold.
The preset CPU usage threshold value is a preset CPU usage threshold value (e.g., 80%) that indicates an optimal state of operation of the monitoring object. When the CPU utilization rate of the monitored object exceeds a preset CPU utilization rate threshold value, the monitoring object is indicated to have more application programs currently running, the load of the monitored object is heavier, and the CPU utilization rate of the monitored object meets a first alarm condition; when the CPU utilization rate of the monitored object does not exceed the preset CPU utilization rate threshold value, the current running application program of the monitored object is moderate, the running condition of the monitored object is good, and the CPU utilization rate of the monitored object does not need to be alarmed.
The first obtaining module 201 is further configured to obtain the CPU utilization of the monitored object in the next monitoring period when the first determining module 203 determines that the CPU utilization of the monitored object does not satisfy the first alarm condition.
And when the CPU utilization rate of the monitored object is determined not to meet the first alarm condition, waiting for the next monitoring period, and acquiring the CPU utilization rate of the monitored object in the next monitoring period.
A second obtaining module 204, configured to obtain the number of active sessions of the monitored object in the current monitoring period when the first determining module 204 determines that the CPU utilization of the monitored object meets the first alarm condition.
An active session refers to the time interval between an end user communicating with the interactive system, typically the time elapsed from logging into the system to logging out of the system and possibly some operating space if required. session is a server-side solution for solving the problem of stateless http protocol, which can make a series of interactive actions between a client and a server become a complete transaction and make a website become software in a true sense.
Because the standard Spring Boot product in the market usually cannot monitor the number of active sessions of the rocketMQ, unreasonable use of the active sessions can also cause high CPU utilization rate, and on the traditional host level, the CPU monitors only whether the CPU utilization rate is too high, and does not monitor whether the CPU utilization rate is high due to unreasonable use of the active sessions.
The active sessions of the monitored object in the current monitoring period can be obtained by a springboot-monitor packet session method, and the number of the active sessions is counted.
The second determining module 205 is configured to determine whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition.
In this embodiment, the determining, by the second determining module 205, whether the number of active sessions of the monitored object in the current monitoring period meets the second alarm condition includes:
and judging whether the number of the active sessions of the monitored object in the current monitoring period meets a second alarm condition or not according to the number of the active sessions of the monitored object in the previous monitoring period.
And detecting the number of active sessions of the monitored object in each monitoring period, and storing the detected number of active sessions in a database according to the time sequence of the monitoring period, so that the active sessions of the monitored object in the current monitoring period can be predicted according to the number of active sessions of the monitored object detected in the previous monitoring period, and the alarm condition can be further determined.
Specifically, the second alarm condition may be: the number of active sessions of the monitored object in the current monitoring period is within a preset range of the number of active sessions of the monitored object in the previous monitoring period.
The preset range of the number of active sessions of the monitoring object is a first proportional value which is greater than the number of active sessions and a second proportional value which is less than the number of active sessions, wherein the first proportional value is less than 1, and the second proportional value is greater than 1.
When the number of active sessions of the monitored object in the current monitoring period is between a first proportional value and a second proportional value of the number of active sessions of the monitored object in the previous monitoring period, the number of active sessions of the monitored object in the current monitoring period is considered to be in line with an expectation, and a second alarm condition is not met; and when the number of active sessions of the monitored object in the current monitoring period is not between the first proportional value and the second proportional value of the number of active sessions of the monitored object in the last monitoring period, that is, when the number of active sessions of the monitored object in the current monitoring period is smaller than the first proportional value of the number of active sessions of the monitored object in the last monitoring period or is larger than the second proportional value of the number of active sessions of the monitored object in the last monitoring period, the number of active sessions of the monitored object in the current monitoring period is considered to be not in accordance with an expectation, and a second alarm condition is met.
A first alarm module 206, configured to alarm the monitored object in a preset first alarm manner when the second determining module 205 determines that the number of active sessions of the monitored object in the current monitoring period does not satisfy the second alarm condition.
In this embodiment, in the current monitoring period, the CPU utilization of the monitored object exceeds the preset CPU utilization threshold, and the number of active sessions of the monitored object in the preset range of the active session in the last monitoring period indicates that the CPU utilization of the monitored object in the current monitoring period is too high, which is not caused by the number of active sessions, and a preset first alarm manner is used to alarm that the CPU utilization of the monitored object is too high.
The preset first alarm mode can be modes of sending out an alarm sound, displaying an alarm picture, sending an alarm message, sending an alarm mail and the like.
A second alarm module 207, configured to alarm the monitored object in a preset second alarm manner when the second determining module 205 determines that the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition.
In this embodiment, in the current monitoring period, the CPU utilization of the monitored object exceeds the preset CPU utilization threshold, and the number of active sessions of the monitored object is not within the preset range of an active session in the previous monitoring period, which indicates that the CPU utilization of the monitored object in the current monitoring period is too high, and may be caused by the number of active sessions, and a preset second alarm manner is used to alarm that the CPU utilization of the monitored object is too high.
The preset second alarm mode may be that a monitoring information display interface is generated, and the number of activity sessions and the corresponding activity session ID of the monitored object are displayed in the monitoring information display interface.
The number of active sessions and the corresponding active session ID of the monitoring object can be shown through a chart. The charts may include line graphs, area graphs, thermodynamic diagrams, pie charts, tables, and the like.
In a specific embodiment, the number of active sessions of the monitoring object and the corresponding active session ID may be shown by Grafana. And transmitting the number of active sessions of the monitoring object and the corresponding active session ID to a Grafana platform for display through a Grafana acquisition agent.
Grafana is a visualization panel (Dashboard) supporting various diagrams and layout presentations, graphite, zabbix, influxDB, prometheus and OpenTSDB as data sources. Grafana has the following characteristics: flexible and rich graphical options; a variety of styles can be mixed; support for day and night modes; multiple data sources are supported.
Or the number of active sessions and the corresponding active session ID of the monitored object may be presented through Highcharts.
Highcharts is a diagram library written in pure JavaScript. The Highcharts can easily and conveniently add interactive charts to a web site or a web application. HIghcharts supports various illustrations of the monitoring platform.
Further, the operation and maintenance platform performance monitoring apparatus based on session number may further include: a third determining module 208, configured to determine whether the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session threshold; and a third alarm module 209, configured to alarm the monitored object in a preset third alarm manner when the third determining module 208 determines that the number of active sessions of the monitored object in the current monitoring period is smaller than the preset session threshold.
If the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session number threshold, determining that an alarm condition is met; and if the number of the active sessions of the monitored object in the current monitoring period is greater than or equal to the preset session number threshold, determining that the alarm condition is not met.
By setting the lowest session number critical value, when the active session number of the monitored object in the monitoring period is lower than the lowest session number critical value, especially when the active session number of the monitored object in the monitoring period tends to zero, the monitored object needs to be alarmed, so that the possibility of service suspension or system abnormality caused by too low active session number is avoided.
In summary, according to the above session number-based operation and maintenance platform performance monitoring apparatus, a monitoring index of a monitored object is obtained, where the monitoring index includes: CPU utilization rate and number of active sessions; when judging whether the CPU utilization rate of a monitored object in the current monitoring period meets a first alarm condition, continuously judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition; when the number of the active sessions of the monitored object in the current monitoring period is determined not to meet a second alarm condition, alarming in a first alarm mode to remind an operation manager that the CPU utilization rate is high; when the number of the active sessions of the monitored object in the current monitoring period is determined to meet a second alarm condition, an alarm is given in a second alarm mode to remind an operation and maintenance manager that the higher CPU utilization rate is caused by the unreasonable number of the active sessions, so that the manager is prompted to directly check the number of the active sessions to eliminate the fault reason, and the effect of quickly positioning the active sessions is achieved. Because the problem of high CPU utilization rate is also caused when unreasonable active session numbers are not considered in the traditional operation and maintenance, the active session numbers are easy to be abnormal in a high-concurrency cluster, and the operation and maintenance manager can be helped to quickly locate the root of the problem directly by detecting the active session numbers of the monitored objects, so that the inspection is facilitated, the operation and maintenance efficiency of the system is improved, and the manual inspection cost is reduced.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer electronic device (which may be a personal computer, a dual-screen electronic device, or a network electronic device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
EXAMPLE III
Fig. 3 is a schematic diagram of an electronic device according to a third embodiment of the present invention.
The electronic device 3 includes: a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
The at least one processor 32, when executing the computer program 33, implements the steps in the above-mentioned embodiment of the method for monitoring the performance of the operation and maintenance platform based on session numbers.
Illustratively, the computer program 33 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the at least one processor 32. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 33 in the electronic device 3.
The electronic device 3 may be a computing electronic device such as a desktop computer, a notebook, a palm computer, and a cloud server. It will be appreciated by those skilled in the art that the schematic diagram 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may also include input and output electronics, network access electronics, a bus, etc.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor, etc., and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 implements various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the electronic apparatus 3, and the like. In addition, the memory 31 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In the several embodiments provided in the present invention, it should be understood that the disclosed server and method may be implemented in other ways. For example, the above-described server embodiment is only illustrative, and for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation.
In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not to denote any particular order.
Finally, it should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit of the technical solutions of the present invention.

Claims (8)

1. A session number-based operation and maintenance platform performance monitoring method is characterized by comprising the following steps:
acquiring the CPU utilization rate of a monitored object in a historical monitoring period, acquiring a target CPU utilization rate of which the CPU utilization rate is higher than a preset CPU utilization rate threshold value, acquiring a monitoring time period or a monitoring time point corresponding to the target CPU utilization rate, and setting the monitoring time period or the monitoring time point as a current monitoring period;
acquiring the CPU utilization rate of the monitored object in the current monitoring period, wherein the current monitoring period is set according to the CPU utilization rate in the historical monitoring period;
when the CPU utilization rate of the monitored object is determined to meet a first alarm condition, acquiring the number of active sessions of the monitored object in the current monitoring period;
judging whether the number of active sessions of the monitored object in the current monitoring period meets a second alarm condition;
when the number of the active sessions of the monitored object in the current monitoring period is determined not to meet the second alarm condition, alarming the monitored object in a preset first alarm mode, wherein the first alarm mode comprises the steps of sending out an alarm sound and sending an alarm mail;
when the number of active sessions of the monitored object in the current monitoring period is determined to meet the second alarm condition, alarming the monitored object in a preset second alarm mode, wherein the preset second alarm mode is as follows: and generating a monitoring information display interface, and displaying the activity session number and the corresponding activity session ID of the monitoring object.
2. The method of claim 1, wherein after said obtaining the number of active sessions of the monitored object within the current monitoring period, the method further comprises:
judging whether the number of active sessions of the monitored object in the current monitoring period is smaller than a preset session number threshold value;
and when the number of active sessions of the monitored object in the current monitoring period is determined to be smaller than the preset session number threshold, alarming the monitored object in a preset third alarming mode.
3. The method of claim 1, wherein the second alarm condition is: the number of active sessions of the monitored object in the current monitoring period is greater than a first proportional value of the number of active sessions of the monitored object in a previous monitoring period and is less than a second proportional value of the number of active sessions of the monitored object in the previous monitoring period, wherein the first proportional value is less than 1, and the second proportional value is greater than 1.
4. The method of claim 1, wherein when it is determined that the CPU usage of the monitored object does not satisfy a first alarm condition, the method further comprises:
and acquiring the CPU utilization rate of the monitored object in the next monitoring period.
5. The method according to any one of claims 1 to 4, wherein the obtaining the CPU usage rate of the monitored object in the current monitoring period comprises:
when the system is in an available state, acquiring process information of an application program currently running by a monitoring object;
calling resource information of the RocketMQ process through a TOP command according to the process identifier of the process information;
and acquiring the CPU utilization rate of the monitored object in the current monitoring period from the resource information of the RockettMQ process.
6. An operation and maintenance platform performance monitoring device based on session numbers, the device comprising:
the first obtaining module is used for obtaining the CPU utilization rate of a monitored object in a current monitoring period, wherein the current monitoring period is set according to the CPU utilization rate in a historical monitoring period, and the first obtaining module comprises the following steps: acquiring the CPU utilization rate of the monitored object in the historical monitoring period, acquiring a target CPU utilization rate of which the CPU utilization rate is higher than a preset CPU utilization rate threshold value, acquiring a monitoring time period or a monitoring time point corresponding to the target CPU utilization rate, and setting the monitoring time period or the monitoring time point as the current monitoring period;
the first judgment module is used for judging whether the CPU utilization rate of the monitored object meets a first alarm condition;
the second obtaining module is used for obtaining the number of active sessions of the monitored object in the current monitoring period when the first judging module determines that the CPU utilization rate of the monitored object meets a first alarm condition;
the second judgment module is used for judging whether the number of the active sessions of the monitored object in the current monitoring period meets a second alarm condition;
the first alarm module is used for alarming the monitored object in a preset first alarm mode when the second judgment module determines that the number of active sessions of the monitored object in the current monitoring period does not meet the second alarm condition, wherein the first alarm mode comprises the steps of sending out an alarm sound and sending an alarm mail;
a second alarm module, configured to alarm the monitored object in a preset second alarm manner when the second determination module determines that the number of active sessions of the monitored object in the current monitoring period meets the second alarm condition, where the preset second alarm manner is: and generating a monitoring information display interface, and displaying the activity session number and the corresponding activity session ID of the monitoring object.
7. An electronic device, comprising a processor and a memory, wherein the processor is configured to implement the session number based operation and maintenance platform performance monitoring method according to any one of claims 1 to 5 when executing the computer program stored in the memory.
8. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the session number-based operation and maintenance platform performance monitoring method according to any one of claims 1 to 5.
CN201811537816.2A 2018-12-15 2018-12-15 Session number-based operation and maintenance platform performance monitoring method and device and related equipment Active CN109766238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811537816.2A CN109766238B (en) 2018-12-15 2018-12-15 Session number-based operation and maintenance platform performance monitoring method and device and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811537816.2A CN109766238B (en) 2018-12-15 2018-12-15 Session number-based operation and maintenance platform performance monitoring method and device and related equipment

Publications (2)

Publication Number Publication Date
CN109766238A CN109766238A (en) 2019-05-17
CN109766238B true CN109766238B (en) 2023-02-03

Family

ID=66451809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811537816.2A Active CN109766238B (en) 2018-12-15 2018-12-15 Session number-based operation and maintenance platform performance monitoring method and device and related equipment

Country Status (1)

Country Link
CN (1) CN109766238B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515701B (en) * 2019-08-28 2020-11-06 杭州数梦工场科技有限公司 Thermal migration method and device for virtual machine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014238637A (en) * 2013-06-06 2014-12-18 株式会社日立システムズ Session monitoring system and session monitoring method
CN107590014A (en) * 2017-09-07 2018-01-16 携程旅游网络技术(上海)有限公司 Fault detection method, device, system, electronic equipment, storage medium
CN107729205A (en) * 2017-08-22 2018-02-23 国家电网公司 Fault handling method and device for operation system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941538B2 (en) * 2008-06-12 2011-05-10 International Business Machines Corporation Dynamic management of resource utilization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014238637A (en) * 2013-06-06 2014-12-18 株式会社日立システムズ Session monitoring system and session monitoring method
CN107729205A (en) * 2017-08-22 2018-02-23 国家电网公司 Fault handling method and device for operation system
CN107590014A (en) * 2017-09-07 2018-01-16 携程旅游网络技术(上海)有限公司 Fault detection method, device, system, electronic equipment, storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中山e卡通移动商务系统的设计与实现;程日初;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20120515(第05期);第6.1节 *

Also Published As

Publication number Publication date
CN109766238A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN111049705B (en) Method and device for monitoring distributed storage system
US20210184947A1 (en) Automatic capture of detailed analysis information based on remote server analysis
US10303533B1 (en) Real-time log analysis service for integrating external event data with log data for use in root cause analysis
CN111309567B (en) Data processing method, device, database system, electronic equipment and storage medium
CN105718351A (en) Hadoop cluster-oriented distributed monitoring and management system
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
CN105573824B (en) Monitoring method and system for distributed computing system
CN110046070B (en) Monitoring method and device of server cluster system, electronic equipment and storage medium
WO2020093637A1 (en) Device state prediction method and system, computer apparatus and storage medium
CN111352800A (en) Big data cluster monitoring method and related equipment
CN110995497A (en) Method for unified operation and maintenance in cloud computing environment, terminal device and storage medium
CN109840141B (en) Thread control method and device based on cloud monitoring, electronic equipment and storage medium
CN112328448A (en) Zookeeper-based monitoring method, monitoring device, equipment and storage medium
CN109800124B (en) CPU utilization monitoring method and device, electronic equipment and storage medium
CN109766238B (en) Session number-based operation and maintenance platform performance monitoring method and device and related equipment
CN114153688A (en) Distributed monitoring method and device based on cloud platform
CN110928750B (en) Data processing method, device and equipment
CN112181942A (en) Time sequence database system and data processing method and device
US20200142746A1 (en) Methods and system for throttling analytics processing
CN111274032A (en) Task processing system and method, and storage medium
CN109828885B (en) RocketMQ memory monitoring method and device, electronic equipment and storage medium
EP4209933A1 (en) Data processing method and apparatus, and electronic device and storage medium
CN110941536B (en) Monitoring method and system, and first server cluster
CN113132431B (en) Service monitoring method, service monitoring device, electronic device, and medium
CN109840179B (en) RocktMQ thread number monitoring method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant